For this manuscript, the Prochlorococcus MED4 strain shotgun proteome dataset was used for benchmarking a de novo-directed sequencing approach. De novo peptide sequencing, where the sequence of amino acids is determined directly from mass spectra rather than by comparison (or peptide spectrum matching) to a selected database. We perform a benchmarking experiment using Prochlorococcus culture data, demonstrating de novo peptides are sufficiently accurate and taxonomically specific to be useful in environmental studies. The MED4 dataset herein represents the output from peptide spectrum matching using COMET within the transproteomic pipeline (TPP). Additional MED4 data outside this manuscript are included for both trypsin and Glu-C protease digestions as well as TPP output for post-translational modification searches. De novo output data derived from Peaks Studio can be found by referencing the manuscript publication.