Ag Data Commons
Browse

Data from: Whole genomes reveal evolutionary relationships and mechanisms underlying gene-tree discordance in Neodiprion sawflies

dataset
posted on 2025-08-19, 02:28 authored by Catherine Linnen, Danielle Herrig, Ryan Ridenbaugh, Kim Vertacnik, Kathryn Everson, SHEINA SIMSHEINA SIM, Scott GeibScott Geib, David Weisrock
<p>Rapidly evolving taxa are excellent models for understanding the mechanisms that give rise to biodiversity. However, developing an accurate historical framework for comparative analysis of such lineages remains a challenge due to ubiquitous incomplete lineage sorting and introgression. Here, we use a whole-genome alignment, multiple locus-sampling strategies, and summary-tree and SNP-based species-tree methods to infer a species tree for eastern North American <em>Neodiprion </em>species, a clade of pine-feeding sawflies (Order: Hymenopteran; Family: Diprionidae). We recovered a well-supported species tree that—except for three uncertain relationships—was robust to different strategies for analyzing whole-genome data. Nevertheless, underlying gene-tree discordance was high. To understand this genealogical variation, we used multiple linear regression to model site concordance factors estimated in 50-kb windows as a function of several genomic predictor variables. We found that site concordance factors tended to be higher in regions of the genome with more parsimony-informative sites, fewer singletons, less missing data, lower GC content, more genes, lower recombination rates, and lower D-statistics (less introgression). Together, these results suggest that incomplete lineage sorting, introgression, and genotyping error all shape the genomic landscape of gene-tree discordance in <em>Neodiprion</em>. More generally, our findings demonstrate how combining phylogenomic analysis with knowledge of local genomic features can reveal mechanisms that produce topological heterogeneity across genomes.</p><p>All files are either in FASTA or NEXUS format. FASTA format is a standard text format for nucleotide sequences. FASTA genome files are provided for each <em>Neodiprion</em> species. Using freely available scripts (<a href="https://github.com/LinnenLab/Herrig_etal_NeodiprionPhylogeny">https://github.com/LinnenLab/Herrig_etal_NeodiprionPhylogeny</a>), these can be used to produce window-based and gene-based datasets in nexus format. Nexus is a standard format for character data for phylogenetic analysis. These can be used as input for many different phylogenetic programs. </p>

Funding

NSF: DEB-CAREER-1750946

USDA: 2016-67014-2475

USDA-ARS: 0201-88888-003-000D

USDA-ARS: 0201-88888-002-000D

NSF: DEB-1355000

History

Related Materials

Data contact name

Linnen, Catherine

Data contact email

catherine.linnen@uky.edu

Publisher

Dryad

Theme

  • Not specified

ISO Topic Category

  • biota

National Agricultural Library Thesaurus terms

genome; introgression; family; topology; genotyping errors; regression analysis; genomics; Hymenoptera; Diprionidae; biodiversity; Neodiprion

Pending citation

  • Yes

Public Access Level

  • Public

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC