Data from: Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation
We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.
We demonstrate the benefits of using multiple sequencing technologies and proximity ligation in identifying unique biological facets of the cattle rumen metagenome, and we present data that suggests that each has a unique niche in downstream analysis. Our comparison identified biases in the sampling of different portions of the community by each sequencing technology, suggesting that a single DNA sequencing technology is insufficient to characterize complex metagenomic samples. Using a combination of long-read alignments and proximity ligation, we identified putative hosts for assembled bacteriophage at a resolution previously unreported in other rumen surveys. These host-phage assignments support previous work that revealed increased viral predation of sulfur-metabolizing bacterial species; however, we were able to provide a higher resolution of this association, identify potential auxiliary metabolic genes related to sulfur metabolism, and identify phage that may target a diverse range of different bacterial species. Furthermore, we found evidence to support that these viruses have a lytic life cycle due to a higher proportion of Hi-C intercontig link association data in our analysis. Finally, it appears that there may be a high degree of mobile DNA that was heretofore uncharacterized in the rumen and that this mobile DNA may be shuttling antimicrobial resistance gene alleles among distantly related species. These unique characteristics of the rumen microbial community would be difficult to detect without the use of several different methods and techniques that we have refined in this study, and we recommend that future surveys incorporate these techniques to further characterize complex metagenomic communities.
Datasets generated and/or analyzed during the current study are available in the NCBI SRA repository under Bioproject: PRJNA507739. Assemblies, bins, and ORF predictions are available on Figshare. A description of commands, scripts, and other materials used to analyze the data in this project are available in the GitHub repository: https://github.com/njdbickhart/RumenLongReadASM and also on Zenodo.
Resources in this dataset:
Resource Title: Availability of data and materials.
File Name: Web Page, url: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1760-x#availability-of-data-and-materials
The datasets generated and/or analyzed during the current study are available in the NCBI SRA repository under Bioproject: PRJNA507739. The assemblies, bins, and ORF predictions are available on Figshare. A description of commands, scripts, and other materials used to analyze the data in this project can be found in the following GitHub repository: https://github.com/njdbickhart/RumenLongReadASM and also on Zenodo.
Funding
USDA: 5090-31000-026-00-D
USDA-NIFA: 5090-31000-026-06-I
USDA: 5090-21000-064-00-D
USDA: 3040-31000-100-00-D
USDA: 8042-32000-110-00-D
National Institute of Allergy and Infectious Diseases: R44AI122654-02A1
National Human Genome Research Institute: Intramural Research Program
USDA-NIFA: 2015-67015-23246
History
Data contact name
Smith, Timothy P. L.Data contact email
tim.smith2@usda.govPublisher
Genome BiologyIntended use
In support of the suggestion that metagenomic surveys should include a combination of different sequencing and conformational capture technologies in order to fully assess the diversity and biological functionality of a sample.Theme
- Not specified
ISO Topic Category
- biota
- farming
National Agricultural Library Thesaurus terms
antibiotic resistance genes; hosts; microbial communities; cattle; rumen microorganisms; alleles; rumen; metagenomics; sequence analysis; bacteriophages; surveys; predation; sulfur; metabolism; DNA; genome assembly; virus assembly; computer software; microbial genetics; prediction; statistical analysisPrimary article PubAg Handle
Pending citation
- No
Public Access Level
- Public