<<< Full experiment listing

PXD018043

PXD018043 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitleEnhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy
DescriptionTop-down mass spectrometry (MS) is a powerful tool for identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. While the technique is powerful, it suffered from the complex dataset generated from top-down MS experiments, which requires sequential data processing steps for data interpretation. Deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes is a critical step in the data processing process. Multiple algorithms are currently available to deconvolute top-down mass spectra; however, each algorithm generates different deconvoluted peak lists with varied accuracy comparing to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. The random forest model outperformed the single best algorithm. This machine learning strategy could enhance the accuracy and confidence in protein identification during database search by accelerating detection of true positive peaks while filtering out false positive peaks. Thus, this method showed promises in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics.
HostingRepositoryPRIDE
AnnounceDate2020-05-06
AnnouncementXMLSubmission_2020-05-05_22:29:05.xml
DigitalObjectIdentifier
ReviewLevelPeer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportUnsupported dataset by repository
PrimarySubmitterZhijie Wu
SpeciesList scientific name: Macaca mulatta (Rhesus macaque); NCBI TaxID: 9544;
ModificationListphosphorylated residue; acetylated residue; deamidated residue
InstrumentBruker Daltonics solarix series
Dataset History
RevisionDatetimeStatusChangeLog Entry
02020-03-13 02:48:56ID requested
12020-05-05 22:29:07announced
Publication List
McIlwain SJ, Wu Z, Wetzel M, Belongia D, Jin Y, Wenger K, Ong IM, Ge Y, Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy. J Am Soc Mass Spectrom, 31(5):1104-1113(2020) [pubmed]
Keyword List
submitter keyword: Top-down spectra deconvolution
Contact List
Sean J McIlwain
contact affiliationDepartment of Biostatistics and Medical Informatics and University of Wisconsin Carbone Comprehensive Cancer Center, University of Wisconsin - Madison, Madison, Wisconsin 53705, United States
contact emailsean.mcilwain@wisc.edu
lab head
Zhijie Wu
contact affiliationUniversity of Wisconsin - Madison
contact emailzwu227@wisc.edu
dataset submitter
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/05/PXD018043
PRIDE project URI
Repository Record List
[ + ]