Thank you for using curatedMetagenomicData and thank you for installing curatedMetagenomicData version 3.0.0 – this release is a long awaited and major milestone for all of us who work on the software. We are very proud of the package and hope you will find the updates make your own work more easy and enjoyable. The curatedMetagenomicData package now contains 20,283 samples from 86 studies, all processed in a standardized way with manually curated sample metadata for each and every sample. The sample metadata is curated by humans, and, to be certain, we owe a huge debt of gratitude to our curators for their time and efforts. To our knowledge, curatedMetagenomicData represent the largest collection of its kind, and it will continue to grow into the future to enable metagenomics.
If you have been using curatedMetagenomicData for a few years you might notice we have skipped version 2.0.0 and went directly from version 1.0.0 to version 3.0.0. While this is not traditional, we had good reason to do so. The data in curatedMetagenomicData version 1.0.0 (released in 2017) came from MetaPhlAn2 and HUMAnN2; the data in curatedMetagenomicData version 3.0.0 (released in 2021) come from MetaPhlAn3 and HUMAnN3. The skipping of version 2.0.0 and release of 3.0.0 instead was done to align our version numbers with MetaPhlAn3 and HUMAnN3; from now on our major releases will reflect data processed using major versions of these software.
As for what has changed in curatedMetagenomicData version 3.0.0,
allow us to explain. First, all raw data has been reprocessed with
MetaPhlAn3 and HUMAnN3 – this includes both new and existing studies.
And speaking of new studies, 29 new studies with a total of 10,084
samples were added in this release, roughly doubling the number of
samples available in curatedMetagenomicData. Next, all data is now
TreeSummarizedExperiment objects by the refactored
curatedMetagenomicData() method. The
curatedMetagenomicData() method is now the primary (and
only) means to access data – this change allows us to keep sample
metadata up to date and pull in changes from the curation repository on
a weekly basis. With that change it was also necessary to refactor the
mergeData() method to work with the new data structures.
Perhaps most exciting though is the ability to now return samples
across studies using the
returnSamples() method. You simply
data.frame subset to
include only desired samples and metadata and a
TreeSummarizedExperiment object is returned with the
correct samples. At some point it became obvious to us that this method
was likely the most needed and probably being implemented beyond the
package frequently – sorry we did not include it sooner. Again, if you
have been using curatedMetagenomicData for a few years, you might be
asking what is
sampleMetadata object replaces the
combined_metadata object, and
combined_metadata object will be removed in the next
release – please start using
Finally, a few things have become defunct in curatedMetagenomicData
version 3.0.0, principally methods that worked on
ExpressionSet2phyloseq()) and named accessors
HMP_2012.pathcoverage.stool()). The older methods had
to become defunct with the move to
and the named accessors were very hard to maintain and document. The
package is now simpler and easier to use with the
data structures and with the
method replacing all named accessors.
Again, thank you sincerly for using curatedMetagenomicData and please don’t hesitate to reach out to us when you find issues or need help.
The curatedMetagenomicData team