Aggregation of Similarity Measures for Ortholog Detection: Validation with Measures Based on Rough Set Theory

Reinier Millo Sánchez, Deborah Galpert Cañizares, Gladys Casa Cardoso, Ricardo Grau Ábalo, Leticia Arco García, María Matilde García Lorenzo, Miguel Ángel Fernández Marin

Abstract


This paper presents a novel algorithm for ortholog detection that involves the aggregation of similarity measures characterizing the relationship between gene pairs of two genomes. The measures are based on the alignment score, the length of the sequences, the membership in the conserved regions as well as on the protein physicochemical profile. The clustering step over the similarity bipartite graph is performed by using the Markov clustering algorithm (MCL). A new ortholog assignment policy is applied over the homology groups obtained in the graph clustering. The classification results are validated with the Saccharomyces Cerevisiae and the Schizosaccharomyces Pombe genomes with the ortholog list of the INPARANOID 7.0 algorithm with the Adjusted Rand Index (ARI) external measure. Other validation measures based on the rough set theory are applied to calculate the quality of the classification dealing with class imbalance.


Keywords


Similarity measures; ortholog genes; mcl clustering; ortholog assignment; rough set theory; class imbalance.

Full Text: PDF (Spanish)