PXD027467 is an
original dataset announced via ProteomeXchange.
Dataset Summary
Title | ComBat HarmonizR enables the integrated analysis of independently generated proteomic datasets through data harmonization with appropriate handling of missing values |
Description | The integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss |
HostingRepository | PRIDE |
AnnounceDate | 2022-05-23 |
AnnouncementXML | Submission_2022-05-23_14:24:49.892.xml |
DigitalObjectIdentifier | |
ReviewLevel | Peer-reviewed dataset |
DatasetOrigin | Original dataset |
RepositorySupport | Unsupported dataset by repository |
PrimarySubmitter | Hannah Voß |
SpeciesList | scientific name: Escherichia coli; NCBI TaxID: 562; scientific name: Mus musculus (Mouse); NCBI TaxID: 10090; scientific name: Homo sapiens (Human); NCBI TaxID: 9606; scientific name: Saccharomyces cerevisiae (Baker's yeast); NCBI TaxID: 4932; |
ModificationList | monohydroxylated residue; iodoacetamide derivatized residue |
Instrument | Q Exactive; Orbitrap Fusion; TripleTOF 6600 |
Dataset History
Revision | Datetime | Status | ChangeLog Entry |
0 | 2021-07-21 05:41:50 | ID requested | |
⏵ 1 | 2022-05-23 14:24:50 | announced | |
Publication List
Dataset with its publication pending |
Keyword List
submitter keyword: Data integration, metastudy, Tissue, FFPE, Fresh-Frozen, ComBat, SILAC, TMT, DIA, SWATH, DDA, Missing values, Harmonisazion |
Contact List
Prof. Dr. Hartmut Schlüter |
contact affiliation | Section of Mass Spectrometric Proteomics, University Medical Center Hamburg eppendorf |
contact email | h.schluet@uke.de |
lab head | |
Hannah Voß |
contact affiliation | University Medical Center Hamburg Eppendorf, Institute of Clinical Chemistry and Laboratory Medicine, Group of Mass Spectrometric Proteomics |
contact email | ha.voss@uke.de |
dataset submitter | |
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2022/05/PXD027467 |
PRIDE project URI |
Repository Record List
[ + ]
[ - ]
- PRIDE
- PXD027467
- Label: PRIDE project
- Name: ComBat HarmonizR enables the integrated analysis of independently generated proteomic datasets through data harmonization with appropriate handling of missing values