PXD018043

PXD018043 is an original dataset announced via ProteomeXchange.

Dataset Summary

Title	Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy
Description	Top-down mass spectrometry (MS) is a powerful tool for identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. While the technique is powerful, it suffered from the complex dataset generated from top-down MS experiments, which requires sequential data processing steps for data interpretation. Deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes is a critical step in the data processing process. Multiple algorithms are currently available to deconvolute top-down mass spectra; however, each algorithm generates different deconvoluted peak lists with varied accuracy comparing to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. The random forest model outperformed the single best algorithm. This machine learning strategy could enhance the accuracy and confidence in protein identification during database search by accelerating detection of true positive peaks while filtering out false positive peaks. Thus, this method showed promises in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics.
HostingRepository	PRIDE
AnnounceDate	2020-05-06
AnnouncementXML	Submission_2020-05-05_22:29:05.xml
DigitalObjectIdentifier
ReviewLevel	Peer-reviewed dataset
DatasetOrigin	Original dataset
RepositorySupport	Unsupported dataset by repository
PrimarySubmitter	Zhijie Wu
SpeciesList	scientific name: Macaca mulatta (Rhesus macaque); NCBI TaxID: 9544;
ModificationList	phosphorylated residue; acetylated residue; deamidated residue
Instrument	Bruker Daltonics solarix series

Dataset History

Revision	Datetime	Status	ChangeLog Entry
0	2020-03-13 02:48:56	ID requested
⏵ 1	2020-05-05 22:29:07	announced

Publication List

McIlwain SJ, Wu Z, Wetzel M, Belongia D, Jin Y, Wenger K, Ong IM, Ge Y, Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy. J Am Soc Mass Spectrom, 31(5):1104-1113(2020) [pubmed]

McIlwain SJ, Wu Z, Wetzel M, Belongia D, Jin Y, Wenger K, Ong IM, Ge Y, Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy. J Am Soc Mass Spectrom, 31(5):1104-1113(2020) [pubmed]

Keyword List

submitter keyword: Top-down spectra deconvolution

submitter keyword: Top-down spectra deconvolution

Contact List

Sean J McIlwain
contact affiliation	Department of Biostatistics and Medical Informatics and University of Wisconsin Carbone Comprehensive Cancer Center, University of Wisconsin - Madison, Madison, Wisconsin 53705, United States
contact email	sean.mcilwain@wisc.edu
lab head
Zhijie Wu
contact affiliation	University of Wisconsin - Madison
contact email	zwu227@wisc.edu
dataset submitter

Full Dataset Link List

Dataset FTP location NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/05/PXD018043
PRIDE project URI

Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/05/PXD018043

PRIDE project URI

Repository Record List

[ + ]