Frequent Patterns Algorithm of Biological Sequences based on Pattern Prefix-tree
AbstractIn the application of bioinformatics, the existing algorithms cannot be directly and efficiently implement sequence pattern mining. Two fast and efficient biological sequence pattern mining algorithms for biological single sequence and multiple sequences are proposed in this paper. The concept of the basic pattern is proposed, and on the basis of mining frequent basic patterns, the frequent pattern is excavated by constructing prefix trees for frequent basic patterns. The proposed algorithms implement rapid mining of frequent patterns of biological sequences based on pattern prefix trees. In experiment the family sequence data in the pfam protein database is used to verify the performance of the proposed algorithm. The prediction results confirm that the proposed algorithms can’t only obtain the mining results with effective biological significance, but also improve the running time efficiency of the biological sequence pattern mining.
 Ben-Hur, A.; Brutlag, D. (2003). Remote homology detection: A motif based approach, Bioinformatics, 19 Suppl 1(1), 26-33, 2003.
 Benkaddour, M. K., Bounoua, A. (2017). Feature extraction and classification using deep convolutional neural networks, PCA and SVC for face recognition, Traitement du Signal, 34(1-2), 77-91, 2017.
 Cardon, L. R.; Stormo, G. D. (1992). Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments, Journal of Molecular Biology, 223(1), 159-170, 1992.
 Chen, L.; Liu, W. (2013). Frequent patterns mining in multiple biological sequences, Computers in Biology & Medicine, 43(10), 1444-1451, 2013.
 Crochemore, M.; Ilie, L. (2008). Maximal repetitions in strings, Journal of Computer & System Sciences, 74(5), 796-807, 2008.
 Dong, T. (2017). Assessment of data reliability of wireless sensor network for bioinformatics, International Journal Bioautomation, 21(1), 241-250, 2017.
 Jiang, Q.; Li, S.; Guo, S. (2011). A new model for finding approximate tandem repeats in DNA sequences, Journal of Software, 6(3), 386-394, 2011.
 Karmaker, S.; Ruhi, F. Y.; Mallick U. K. (2018). Mathematical analysis of a model on guava for biological pest control, Mathematical Modelling of Engineering Problems, 5(4), 427-440, 2018.
 Kurtz, S.; Choudhuri, J. V.; Ohlebusch, E.; Schleiermacher, C.; Stoye, J.; Giegerich, R. (2001). REPuter: the manifold applications of repeat analysis on a genomic scale, Nucleic Acids Research, 29(22), 4633-4642, 2001.
 Lawrence, C. E.; Reilly, A. A. (1990). An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences, Proteins Structure Function & Bioinformatics, 7(1), 41-51, 1990.
 Liao, C. C.; Chen, M. S. (2014). DFSP: A Depth-First SPelling algorithm for sequential pattern mining of biological sequences, Knowledge & Information Systems, 38(3), 623-639, 2014.
 Makalowski, W. (2003). Not junk after all, Science, 300(5623), 1246-1247, 2003.
 Pasquier, N.; Bastide, Y.; Taouil, R.; Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices, Information Systems, 24(1), 25-46, 1999.
 Rubin, E. M.; Lucas, S.; Richardson, P. (2004). Finishing the euchromatic sequence of the human genome, Nature, 431(7011), 931-945, 2004.
 Saqhai Maroof, M. A.; Yang, G. P.; Biyashev, R. M.; Maughan, P. J.; Zhang, Q. (1996). Analysis of the barley and rice genomes by comparative RFLP linkage mapping, Theoretical & Applied Genetics, 92(5), 541-551, 1996.
 Shapiro, J. A.; Von, S. R. (2005). Why repetitive DNA is essential to genome function, Biological Reviews, 80(2), 227-250, 2005.
 Stansfield, W.; King, R.; Mulligan, P. (2006). A dictionary of genetics, Yale Journal of Biology & Medicine, 75(4), 236-245, 2006.
 Won, J. I.; Park, S.; Yoon, J. H.; Kim, S. W. (2006). An efficient approach for sequence matching in large DNA databases, Journal of Information Science, 32(1), 88-104, 2006.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.