StudyDaddy Article Writing

Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.

QUESTION

Aug 05, 2018

Hi, I need help with essay on Efficiency of Clustering algorithms for mining large biological data bases. Paper must be at least 2500 words. Please, no plagiarized work!Download file to see previous p

Hi, I need help with essay on Efficiency of Clustering algorithms for mining large biological data bases. Paper must be at least 2500 words. Please, no plagiarized work!

Download file to see previous pages...

They are categorized into portioning, hierarchical and graph-based techniques. The most widely used of the three algorithms are the graph-based technique, and the hierarchical technique. However, the partitioning techniques are used in other disciplines. it is less used in gene sequence clustering and as such, there is no substantial theory of whether the partitioning methods are efficient. This study analyzes four clustering mining algorithms using four large protein sequence data sets. The analysis highlights the weakness and shortcomings of the four and proposes a new algorithm based on the shortcomings of the four algorithms. Introduction Today, protein sequences are more than one million (Sasson et al., 2002) and as such, there is need in bioinformatics for identifying meaningful patterns for the purposes of understanding their functions. For a long time, protein and gene sequences have been analyzed, compared and grouped using alignment methods. According to Cai et al. (2000), alignment methods are algorithms constructed to arrange, RNA, DNA, and protein sequences to detect similarities that may be as a result of evolutionary, functional or structural sequence relationships. Mount (2002) asserts that comparing and clustering sequences is done using pair-wise alignment method, which are of two types, global and local. Consequently, local alignment algorithm proposed by Waterman and Smith (Bolten et al., 2001) is utilized in identifying amino acid patterns that have been conserved in protein sequences. The global alignment algorithm proposed by Wunsh and Needleman (Bolten et al., 2001) is used to try and align many characters of the entire sequence. It is clear from the above that. the pair-wise alignment method is expensive when it comes to comparing and clustering a large protein data set. This is because there are very many comparisons performed during computation, since every single protein in a data set is compared to all the proteins in the data set (Bolten et al., 2001). This brings into question the efficiency of the pair-wise alignment methods in comparing and clustering of large protein data sets. The pair-wise alignment method, both local and global, do not put into consideration the size of the data set, especially too large data sets that may overwhelm the computer memory. Han &amp. Kamber (2000) argues that, unsupervised learning is aimed at identifying from a data set, a sensible partition or a natural pattern with the help of a distance function. Biology and life science fields have extensively exploited clustering techniques in sequence analysis to classify similar sequences into either protein or gen families (Galperin &amp. Koonin, 2001). Currently, protein sequences can be classified in similar patterns using various, readily available sequencing and clustering methods. As had earlier been mentioned, these methods can be grouped as graph-based, partitioning and hierarchical methods. These methods, especially graph-based and hierarchical methods, have been used consecutively or together to complement each other as argued by Sasson et al. (2002), Sperisen &amp. Pagni (2005), Essoussi &amp. Fayech (2007) and Enright &amp. Ouzounis (2000). In the field of protein comparison and sequence clustering, there are very few instances in which partitioning techniques have been used.