Ag Data Commons

File(s) stored somewhere else

Please note: Linked content is NOT stored on Ag Data Commons and we can't guarantee its availability, quality, security or accept any liability.

Data from: Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation

posted on 2024-02-13, 14:02 authored by Derek M. Bickhart, Mick Watson, Sergey Koren, Kevin Panke-Buisse, Laura M. Cersosimo, Maximilian O. Press, Curtis P. Van Tassell, Jo Ann S. Van Kessel, Bradd J. Haley, Seon Woo Kim, Cheryl Heiner, Garret Suen, Kiranmayee Bakshy, Ivan Liachko, Shawn T. Sullivan, Phillip R. Myer, Jay Ghurye, Mihai Pop, Paul J. Weimer, Adam M. Phillippy, Timothy P. L. Smith

We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.

We demonstrate the benefits of using multiple sequencing technologies and proximity ligation in identifying unique biological facets of the cattle rumen metagenome, and we present data that suggests that each has a unique niche in downstream analysis. Our comparison identified biases in the sampling of different portions of the community by each sequencing technology, suggesting that a single DNA sequencing technology is insufficient to characterize complex metagenomic samples. Using a combination of long-read alignments and proximity ligation, we identified putative hosts for assembled bacteriophage at a resolution previously unreported in other rumen surveys. These host-phage assignments support previous work that revealed increased viral predation of sulfur-metabolizing bacterial species; however, we were able to provide a higher resolution of this association, identify potential auxiliary metabolic genes related to sulfur metabolism, and identify phage that may target a diverse range of different bacterial species. Furthermore, we found evidence to support that these viruses have a lytic life cycle due to a higher proportion of Hi-C intercontig link association data in our analysis. Finally, it appears that there may be a high degree of mobile DNA that was heretofore uncharacterized in the rumen and that this mobile DNA may be shuttling antimicrobial resistance gene alleles among distantly related species. These unique characteristics of the rumen microbial community would be difficult to detect without the use of several different methods and techniques that we have refined in this study, and we recommend that future surveys incorporate these techniques to further characterize complex metagenomic communities.

Datasets generated and/or analyzed during the current study are available in the NCBI SRA repository under Bioproject: PRJNA507739. Assemblies, bins, and ORF predictions are available on Figshare. A description of commands, scripts, and other materials used to analyze the data in this project are available in the GitHub repository: and also on Zenodo.

Resources in this dataset:


USDA: 5090-31000-026-00-D

USDA-NIFA: 5090-31000-026-06-I

USDA: 5090-21000-064-00-D

USDA: 3040-31000-100-00-D

USDA: 8042-32000-110-00-D

National Institute of Allergy and Infectious Diseases: R44AI122654-02A1

National Human Genome Research Institute: Intramural Research Program

USDA-NIFA: 2015-67015-23246


Data contact name

Smith, Timothy P. L.

Data contact email


Genome Biology

Intended use

In support of the suggestion that metagenomic surveys should include a combination of different sequencing and conformational capture technologies in order to fully assess the diversity and biological functionality of a sample.


  • Not specified

ISO Topic Category

  • biota
  • farming

National Agricultural Library Thesaurus terms

antibiotic resistance genes; hosts; microbial communities; cattle; rumen microorganisms; alleles; rumen; metagenomics; sequence analysis; bacteriophages; surveys; predation; sulfur; metabolism; DNA; genome assembly; virus assembly; computer software; microbial genetics; prediction; statistical analysis

Primary article PubAg Handle

Pending citation

  • No

Public Access Level

  • Public

Preferred dataset citation

Bickhart, Derek M.; Watson, Mick; Koren, Sergey; Panke-Buisse, Kevin; Cersosimo, Laura M.; Press, Maximilian O.; Van Tassell, Curtis P.; Van Kessel, Jo Ann S.; Haley, Bradd J.; Kim, Seon Woo; Heiner, Cheryl; Suen, Garret; Bakshy, Kiranmayee; Liachko, Ivan; Sullivan, Shawn T.; Myer, Phillip R.; Ghurye, Jay; Pop, Mihai; Weimer, Paul J.; Phillippy, Adam M.; Smith, Timothy P. L. (2020). Data from: Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation. Genome Biology.

Usage metrics



    Ref. manager