Exploiting sequence similarity to validate the sensitivity of SNP arrays in detecting fine-scaled copy number variations

Gerard Wong, Christopher Leckie, Kylie L. Gorringe, Izhak Haviv, Ian G. Campbell, Adam Kowalczyk

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Motivation: High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed. Results: We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Dectecting REcurrent Copy number change using rankorder Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them. Availability: The MATLAB implementation of DRECS is available at http://ww2.cs.mu.oz.au/~gwong/DRECS/index.html. Contact: [email protected]. Supplementary information: Supplementary information is available at Bioinformatics online.

Original languageEnglish
Article numberbtq088
Pages (from-to)1007-1014
Number of pages8
JournalBioinformatics
Volume26
Issue number8
DOIs
StatePublished - 15 Apr 2010
Externally publishedYes

Bibliographical note

Funding Information:
Funding: This project is partially supported by NICTA. NICTA is funded by the Australian Government through the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.

Funding

Funding: This project is partially supported by NICTA. NICTA is funded by the Australian Government through the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.

FundersFunder number
National ICT Australia
Australian Research Council
Department of Broadband, Communications and the Digital Economy , Australian Government

    Fingerprint

    Dive into the research topics of 'Exploiting sequence similarity to validate the sensitivity of SNP arrays in detecting fine-scaled copy number variations'. Together they form a unique fingerprint.

    Cite this