Due to their time and memory efficiency, AF methods are widely used in different free, paid, open and publicly available software including MEGA (Molecular Evolutionary Genetics Analysis) , MEGA7/X , CAFE (aCcelerated Alignment-FrEe sequence analysis) , Co-Phylog , etc.ĭifferent AF-based researches have been conducted on sequence similarity analysis. However, AF algorithms can solve the major limitations of the AB algorithms . For multiple and long sequences, it becomes NP-hard problem. For example, it provides better results only for homologous sequences, it works for comparatively smaller sequences and these algorithms are time and space consuming. Broadly, there are two types of sequence analysis: AB and AF where AB algorithms have several limitations. Day by day, biological information extraction from the whole genome is becoming important because of rapid expansion (approximate growth rate is doubling data in every 18 months) of biological data from the last few decades . It is obligatory for analyzing the evolutionary relationship among different living objects from whole genomes, finding gene regulatory regions, identifying virus–host interactions, detecting horizontal gene transfer, analyzing the similarity of sequences, extracting different biological information, etc. Sequence analysis is a trending research arena in the field of bioinformatics, bioinformatics engineering, and computation biology. Thus, our method can be used with the top level of authenticity for DNA sequence similarity measurement. Therefore, the comparative results of the benchmark datasets and existing studies demonstrate that our method is highly effective, efficient, and accurate. We achieve the top rank for two benchmark datasets from AFproject, 100% accuracy for two datasets (16 S Ribosomal, 18 Eutherian), and achieve a milestone for time complexity and memory consumption in comparison to the existing study datasets (HEV, HIV-1). We apply our system in six different datasets. We develop an efficient system for finding the positions of \(k-mer\) in the count matrix. We also dynamically choose the value of k for \(k-mer\). Then we shrink the matrix by analyzing the neighbors and then measure similarities using the best combinations of pairwise distance (PD) and phylogenetic tree methods. To minimize these limitations, we develop an AF algorithm using a 2D \(k-mer\) count matrix inspired by the CGR approach. But most of the existing AF methods show high time complexity and memory consumption, less precision, and less performance on benchmark datasets. However, AF algorithms can solve the major limitations of AB. AB is effective for small homologous sequences but becomes NP-hard problem for long sequences. There are two types of sequence analysis which are alignment-based (AB) and alignment-free (AF). DNA sequence similarity analysis is necessary for enormous purposes including genome analysis, extracting biological information, finding the evolutionary relationship of species.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |