⮝ Full datasets listing

PXD027467

PXD027467 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitleComBat HarmonizR enables the integrated analysis of independently generated proteomic datasets through data harmonization with appropriate handling of missing values
DescriptionThe integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss
HostingRepositoryPRIDE
AnnounceDate2022-05-23
AnnouncementXMLSubmission_2022-05-23_14:24:49.892.xml
DigitalObjectIdentifier
ReviewLevelPeer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportUnsupported dataset by repository
PrimarySubmitterHannah Voß
SpeciesList scientific name: Escherichia coli; NCBI TaxID: 562; scientific name: Mus musculus (Mouse); NCBI TaxID: 10090; scientific name: Homo sapiens (Human); NCBI TaxID: 9606; scientific name: Saccharomyces cerevisiae (Baker's yeast); NCBI TaxID: 4932;
ModificationListmonohydroxylated residue; iodoacetamide derivatized residue
InstrumentQ Exactive; Orbitrap Fusion; TripleTOF 6600
Dataset History
RevisionDatetimeStatusChangeLog Entry
02021-07-21 05:41:50ID requested
12022-05-23 14:24:50announced
Publication List
Dataset with its publication pending
Keyword List
submitter keyword: Data integration, metastudy, Tissue, FFPE, Fresh-Frozen, ComBat, SILAC, TMT, DIA, SWATH, DDA, Missing values, Harmonisazion
Contact List
Prof. Dr. Hartmut Schlüter
contact affiliationSection of Mass Spectrometric Proteomics, University Medical Center Hamburg eppendorf
contact emailh.schluet@uke.de
lab head
Hannah Voß
contact affiliationUniversity Medical Center Hamburg Eppendorf, Institute of Clinical Chemistry and Laboratory Medicine, Group of Mass Spectrometric Proteomics
contact emailha.voss@uke.de
dataset submitter
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2022/05/PXD027467
PRIDE project URI
Repository Record List
[ + ]