Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

Miri Michaeli, Hila Noga, Hilla Tabibian-Keissar, Iris Barshack, Ram it Mehr

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

High-throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig High-Throughput Sequencing Cleaner (Ig-HTS-Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig Insertion-Deletion Identifier (Ig-Indel-Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

Original languageEnglish
Article numberArticle 386
JournalFrontiers in Immunology
Volume3
Issue numberDEC
DOIs
StatePublished - 2012

Keywords

  • B cell receptor
  • Computer programs
  • High-throughput sequencing
  • Immunoglobulin (Ig) genes
  • Insertions and deletions (indels)

Fingerprint

Dive into the research topics of 'Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing'. Together they form a unique fingerprint.

Cite this