The study of non-model organisms stands to benefit greatly from genetic and genomic data. For a better understanding of the molecular mechanisms driving neuronal development, and to characterize the entire leech Hirudo medicinalis central nervous system (CNS) transcriptome we combined Trinity for de-novo assembly and Illumina HiSeq2000 for RNA-Seq. We present a set of 73,493 de-novo assembled transcripts for the leech, reconstructed from RNA collected, at a single ganglion resolution, from the CNS. This set of transcripts greatly enriches the available data for the leech. Here, we share two databases, such that each dataset allows a different type of search for candidate homologues. The first is the raw set of assembled transcripts. This set allows a sequence-based search. A comprehensive analysis of which revealed 22,604 contigs with high e-values, aligned versus the Swiss-Prot database. This analysis enabled the production of the second database, which includes correlated sequences to annotated transcript names, with the confidence of BLAST best hit.
Bibliographical noteThe authors wish to thank Shahar Alon for fruitful discussions and invaluable advice, Helit Cohen for her
help with experimentation. The Helobdella robusta and Capitella teleta sequence data were produced by
the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/in collaboration with the
user community. This work was supported (in part) by the EU-FP7 People IRG Marie Curie Grants
(239482) (to O.S.) and by the Israel Science Foundation for Individual Research Grants (1403/11) (to O.S.).