Composition bias and the origin of ORFan genes

Inbal Yomtovian, Nuttinee Teerakulkittipong, Byungkook Lee, John Moult, Ron Unger

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


Motivation: Intriguingly, sequence analysis of genomes reveals that a large number of genes are unique to each organism. The origin of these genes, termed ORFans, is not known. Here, we explore the origin of ORFan genes by defining a simple measure called 'composition bias', based on the deviation of the amino acid composition of a given sequence from the average composition of all proteins of a given genome. Results: For a set of 47 prokaryotic genomes, we show that the amino acid composition bias of real proteins, random 'proteins' (created by using the nucleotide frequencies of each genome) and 'proteins' translated from intergenic regions are distinct. For ORFans, we observed a correlation between their composition bias and their relative evolutionary age. Recent ORFan proteins have compositions more similar to those of random 'proteins', while the compositions of more ancient ORFan proteins are more similar to those of the set of all proteins of the organism. This observation is consistent with an evolutionary scenario wherein ORFan genes emerged and underwent a large number of random mutations and selection, eventually adapting to the composition preference of their organism over time. Contact: Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish
Article numberbtq093
Pages (from-to)996-999
Number of pages4
Issue number8
StatePublished - 15 Apr 2010

Bibliographical note

Funding Information:
Funding: This research was supported in part by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research. Partial support from NIH R01GM081511 to J.M., Fulbright Fellowship for University of Thai Chamber of Commerce, University Staff Development Program to N.T. and Israel Science Foundation 1339/08 to R.U.


Dive into the research topics of 'Composition bias and the origin of ORFan genes'. Together they form a unique fingerprint.

Cite this