PXD057120 is an
original dataset announced via ProteomeXchange.
Dataset Summary
| Title | Predictive Stool-Based Protein Biomarkers for the Classification of Crohn's Disease and Ulcerative Colitis Using a Machine Learning Approach |
| Description | Background and Aim: Crohn's disease (CD) and ulcerative colitis (UC) are the two major chronic inflammatory bowel diseases (IBD). Although their symptoms are similar, their pathological features and clinical treatments differ. Currently, distinguishing between these diseases involves invasive procedures such as colonoscopy and histopathology, causing discomfort and inconvenience to patients. The use of fecal proteins as non-invasive biomarkers offers a promising alternative due to their stability and proximity to inflamed tissues. This study focuses on using high-throughput data-independent acquisition (DIA) mass spectrometry to develop accurate biomarker signatures from complex stool samples. Methods: Stool samples obtained from 46 active CD patients and 23 active UC patients were analyzed. Using DIA-based SWATH mass spectrometry, we explored the stool proteome, identifying and quantifying approximately 1,250 proteins. The samples were divided into training and testing groups. After data processing, various feature selection algorithms were applied on training group to determine proteins that were significantly different between the CD and UC groups. Additionally, six machine learning algorithms including k-Nearest Neighbors, Naive Bayes, eXtreme Gradient Boosting, Random Forest, Support Vector Machine, and glmnet were evaluated to identify the best-performing classifiers. Results: Sixteen proteins were selected based of several feature selection algorithms and the six ML models trained based on them. According to performance metrics of each algorithm on the training dataset, Naïve Bayes model was selected. For performance validation, the final predictive model was applied to 16 prospective samples as the test dataset. Remarkably, the model achieved an AUC of 0.95 on training dataset and AUC of 0.96 on the test dataset, demonstrating its robustness and lack of overfitting. Conclusion: This study demonstrates the effectiveness of SWATH-based proteomics and machine learning in developing predictive models to classify CD and UC. Further future validation on a larger cohort using targeted MRM mass spectrometry would be served to establish the clinical utility and reliability of this approach. |
| HostingRepository | PRIDE |
| AnnounceDate | 2025-12-04 |
| AnnouncementXML | Submission_2025-12-03_19:48:44.911.xml |
| DigitalObjectIdentifier | |
| ReviewLevel | Peer-reviewed dataset |
| DatasetOrigin | Original dataset |
| RepositorySupport | Unsupported dataset by repository |
| PrimarySubmitter | Elmira Shajari |
| SpeciesList | scientific name: Homo sapiens (Human); NCBI TaxID: NEWT:9606; |
| ModificationList | carbamoylated residue |
| Instrument | TripleTOF 5600 |
Dataset History
| Revision | Datetime | Status | ChangeLog Entry |
| 0 | 2024-10-23 16:50:28 | ID requested | |
| ⏵ 1 | 2025-12-03 19:48:45 | announced | |
Publication List
| 10.14309/ctg.0000000000000925; |
| Shajari E, Gagn, é D, Bourassa F, Malick M, Roy P, No, ë, l JF, Gagnon H, Delisle M, Boisvert FM, Brunet M, Beaulieu JF, Stool-Based Proteomic Signature for the Noninvasive Classification of Crohn's Disease and Ulcerative Colitis Using Machine Learning. Clin Transl Gastroenterol, 16(11):e00925(2025) [pubmed] |
Keyword List
| submitter keyword: Crohn’s disease,Inflammatory bowel disease (IBD) subtyping |
| Protein biomarkers |
| DIA mass spectrometry |
| quantitative proteomics |
| machine learning, ulcerative colitis. |
Contact List
| Jean-Francois Beaulieu |
| contact affiliation | Laboratory of Intestinal Physiopathology, Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, J1H 5N4, Canada |
| contact email | jean-francois.beaulieu@usherbrooke.ca |
| lab head | |
| Elmira Shajari |
| contact affiliation | PhD candidate |
| contact email | elmira.shajari@usherbrooke.ca |
| dataset submitter | |
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2025/12/PXD057120 |
| PRIDE project URI |
Repository Record List
[ + ]
[ - ]
- PRIDE
- PXD057120
- Label: PRIDE project
- Name: Predictive Stool-Based Protein Biomarkers for the Classification of Crohn's Disease and Ulcerative Colitis Using a Machine Learning Approach