TY - JOUR
T1 - Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
AU - Michaeli, Miri
AU - Noga, Hila
AU - Tabibian-Keissar, Hilla
AU - Barshack, Iris
AU - Mehr, Ram it
PY - 2012
Y1 - 2012
N2 - High-throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig High-Throughput Sequencing Cleaner (Ig-HTS-Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig Insertion-Deletion Identifier (Ig-Indel-Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.
AB - High-throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig High-Throughput Sequencing Cleaner (Ig-HTS-Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig Insertion-Deletion Identifier (Ig-Indel-Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.
KW - B cell receptor
KW - Computer programs
KW - High-throughput sequencing
KW - Immunoglobulin (Ig) genes
KW - Insertions and deletions (indels)
UR - http://www.scopus.com/inward/record.url?scp=84874295516&partnerID=8YFLogxK
U2 - 10.3389/fimmu.2012.00386
DO - 10.3389/fimmu.2012.00386
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 23293637
AN - SCOPUS:84874295516
SN - 1664-3224
VL - 3
JO - Frontiers in Immunology
JF - Frontiers in Immunology
IS - DEC
M1 - Article 386
ER -