A comparison of short- and long-read whole genome sequencing for microbial pathogen epidemiology
dataset
posted on 2025-10-22, 00:17authored byOregon State University
Whole genome sequencing provides the highest resolution for characterizing pathogen evolution, epidemiology, and diagnostics. Genome assemblies contain information on the identity and potential phenotypes of a pathogen. Likewise, variant calling can inform on transmission patterns and evolutionary relationships. Recent improvements in Oxford Nanopore long-read sequencing have made its use attractive for genomic epidemiology. However, the accuracy and optimal strategy for analysis of Nanopore reads remains to be determined. We compared the use of Illumina short reads and Oxford Nanopore long reads for genome assembly and variant calling of phytopathogenic bacteria. We generated short- and long-read datasets for diverse phytopathogenic Agrobacterium strains. We then analyzed these data using multiple pipelines designed for either short or long reads and compared the results. We found that assemblies made from long reads were more complete than those made from short-read data and contained few sequence errors. Variant calling pipelines differed in their ability to accurately call variants and infer genotypes from long reads. Results suggest that computationally fragmenting long reads can improve the accuracy of variant calling in population-level studies. Using fragmented long reads, pipelines designed for short reads were more accurate at recovering genotypes than pipelines designed for long reads. Further, short- and long-read datasets can be analyzed together with the same pipelines. These findings show that Oxford Nanopore sequencing is accurate and can be sufficient for microbial pathogen genomics and epidemiology. Ultimately, this enhances the ability of researchers and clinicians to understand and mitigate the spread of pathogens.
It is recommended to cite the accession numbers that are assigned to data submissions, e.g. the GenBank, WGS or SRA accession numbers. If individual BioProjects need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJNA1255661 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/)."