PXD037803-1

PXD037803 is an original dataset announced via ProteomeXchange.

Dataset Summary

Title	Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics
Description	Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systematically varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.
HostingRepository	PRIDE
AnnounceDate	2023-01-16
AnnouncementXML	Submission_2023-01-16_02:06:42.242.xml
DigitalObjectIdentifier
ReviewLevel	Peer-reviewed dataset
DatasetOrigin	Original dataset
RepositorySupport	Unsupported dataset by repository
PrimarySubmitter	CarlosGueto-Tettay
SpeciesList	scientific name: Homo sapiens (Human); NCBI TaxID: 9606;
ModificationList	monohydroxylated residue; iodoacetamide derivatized residue
Instrument	Q Exactive HF-X

Dataset History

Revision	Datetime	Status	ChangeLog Entry
0	2022-10-28 07:25:50	ID requested
⏵ 1	2023-01-16 02:06:42	announced

Publication List

Dataset with its publication pending

Dataset with its publication pending

Keyword List

submitter keyword: monoclonal antibody, mass spectrometry,de novo sequencing, DeepNovo

submitter keyword: monoclonal antibody, mass spectrometry,de novo sequencing, DeepNovo

Contact List

LarsMalmström
contact affiliation	Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Klinikgatan 32, SE-22184 Lund, Sweden
contact email	lars.malmstrom@med.lu.se
lab head
CarlosGueto-Tettay
contact affiliation	Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University,
contact email	carlos_alberto.gueto_tettay@med.lu.se
dataset submitter

Full Dataset Link List

Dataset FTP location NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2023/01/PXD037803
PRIDE project URI

Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2023/01/PXD037803

PRIDE project URI

Repository Record List

[ + ]