To unbiasedly evaluate the quantitative performance of different quantitative methods, and compare different popular proteomics data processing workflows, we prepared a benchmark dataset where the various levels of spikeed-in E. Coli proteome that true fold change (i.e. 1 fold, 1.5 fold, 2 fold, 2.5 fold and 3 fold) and true identities of positives/negatives (i.e. E.Coli proteins are true positives while Human proteins are true negatives) are known. To best mimic the proteomics application in comparison of multiple replicates, each fold change group contains 4 replicates, so there are 20 LC-MS/MS analysis in this benchmark dataset. To our knowledge, this spike-in benchmark dataset is largest-scale ever that encompasses 5 different spike level, >500 true positive proteins, and >3000 true negative proteins (2peptide criteria, 1% protein FDR), with a wide concentration dynamic range. The dataset is ideal to test quantitative accuracy, precision, false-positive biomarker discovery and missing data level.