PXD018043 is an
original dataset announced via ProteomeXchange.
Dataset Summary
Title | Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy |
Description | Top-down mass spectrometry (MS) is a powerful tool for identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. While the technique is powerful, it suffered from the complex dataset generated from top-down MS experiments, which requires sequential data processing steps for data interpretation. Deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes is a critical step in the data processing process. Multiple algorithms are currently available to deconvolute top-down mass spectra; however, each algorithm generates different deconvoluted peak lists with varied accuracy comparing to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. The random forest model outperformed the single best algorithm. This machine learning strategy could enhance the accuracy and confidence in protein identification during database search by accelerating detection of true positive peaks while filtering out false positive peaks. Thus, this method showed promises in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics. |
HostingRepository | PRIDE |
AnnounceDate | 2020-05-06 |
AnnouncementXML | Submission_2020-05-05_22:29:05.xml |
DigitalObjectIdentifier | |
ReviewLevel | Peer-reviewed dataset |
DatasetOrigin | Original dataset |
RepositorySupport | Unsupported dataset by repository |
PrimarySubmitter | Zhijie Wu |
SpeciesList | scientific name: Macaca mulatta (Rhesus macaque); NCBI TaxID: 9544; |
ModificationList | phosphorylated residue; acetylated residue; deamidated residue |
Instrument | Bruker Daltonics solarix series |
Dataset History
Revision | Datetime | Status | ChangeLog Entry |
0 | 2020-03-13 02:48:56 | ID requested | |
⏵ 1 | 2020-05-05 22:29:07 | announced | |
Publication List
McIlwain SJ, Wu Z, Wetzel M, Belongia D, Jin Y, Wenger K, Ong IM, Ge Y, Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy. J Am Soc Mass Spectrom, 31(5):1104-1113(2020) [pubmed] |
Keyword List
submitter keyword: Top-down spectra deconvolution |
Contact List
Sean J McIlwain |
contact affiliation | Department of Biostatistics and Medical Informatics and University of Wisconsin Carbone Comprehensive Cancer Center, University of Wisconsin - Madison, Madison, Wisconsin 53705, United States |
contact email | sean.mcilwain@wisc.edu |
lab head | |
Zhijie Wu |
contact affiliation | University of Wisconsin - Madison |
contact email | zwu227@wisc.edu |
dataset submitter | |
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/05/PXD018043 |
PRIDE project URI |
Repository Record List
[ + ]
[ - ]
- PRIDE
- PXD018043
- Label: PRIDE project
- Name: Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy