Abstract
The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management including modular data processing pipelines and end-user interfaces to facilitate accurate and reliable data exchange, curation, validation, standardization, aggregation, integration, and end user access. Deep metadata annotations and the use of qualified data standards enable integration with many external resources. Here we describe the end-to-end data processing and management at the DCIC to generate a high-quality and persistent product. Our data management and stewardship solutions enable a functioning Consortium and make LINCS a valuable scientific resource that aligns with big data initiatives such as the BD2K NIH Program and concords with emerging data science best practices including the findable, accessible, interoperable, and reusable (FAIR) principles.
| Original language | English |
|---|---|
| Article number | 180117 |
| Journal | Scientific data |
| Volume | 5 |
| DOIs | |
| State | Published - 19 Jun 2018 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2018 The Author(s).
Funding
This work was supported by NIH grant U54HL127624 awarded by the National Heart, Lung, and Blood Institute through funds provided by the trans-NIH Library of Integrated Network-based Cellular Signatures (LINCS) Program (http://www.lincsproject.org/) and the trans-NIH Big Data to Knowledge (BD2K) initiative (https://commonfund.nih.gov/bd2k) and by the NIH Data Commons OT3OD025467, KC1 for the Development and Implementation Plan for Community Supported FAIR Guidelines and Metrics. We thank ChemAxon for providing free academic research licenses for JChem, Marvin, Instant JChem, JChem for Excel, which were used in curating and annotating chemical structures, and a license to use the JChem PostgreSQL Cartridge that enables LDP chemical structure search. We thank the LINCS Consortium, the members of the Data Working Group (DWG) and especially Caroline E. Shamu, Elizabeth H. Williams and Jeremy L. Muhlich from HMS_LINCS, for their input in developing and approving the LINCS Metadata Specifications. We acknowledge resources from the University of Miami Center for Computational Science for maintaining part of the computational infrastructure of the BD2K-LINCS DCIC.
| Funders | Funder number |
|---|---|
| LINCS | OT3OD025467 |
| National Institutes of Health | |
| National Heart, Lung, and Blood Institute | T32HL007824, U54HL127624 |
| trans-NIH Library of Integrated Network-based Cellular Signatures |
Fingerprint
Dive into the research topics of 'Sustainable data and metadata management at the BD2K-LINCS Data coordination and integration center'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver