Ag Data Commons
Browse

A Pangenome Reveals LTR Repeat Dynamics as a Major Driver of Genome Evolution in Chenopodium

dataset
posted on 2025-01-22, 05:15 authored by Brigham Young University
The genus Chenopodium is characterized by its wide geographic distribution and ecological adaptability. Species such as Chenopodium quinoa have served as domesticated staple crops for centuries and continued to be valued for their robust nutritional profile. Wild Chenopodium species exhibit diverse niche adaptations and function as important genetic reservoirs for beneficial traits, including disease resistance and climate hardiness. To harness the potential of the wild taxa for crop improvement, we developed a Chenopodium pangenome through the assembly and comparative analyses of 12 Chenopodium species that encompass the eight known genome types (A-H). Six of the species are new chromosome-scale assemblies and many are polyploids, thus a total of 20 genomes were included in the pan-genome analyses. We show that the genomes vary dramatically in size with the D genome being the smallest (~370 Mb) and the B genome being largest (ca. 700 Mb) and that genome size was correlated with independent expansions of the Copia and Gypsy LTR retrotransposon families, suggesting that transposable elements have played a critical role the evolution of the Chenopodium genomes. We annotated a total of 33,457 pan-Chenopodium gene families of which 65% were dispensable with only 2% being private. Phylogenetic analysis clarified the evolutionary relationships among the genome lineages, notably resolving the taxonomic placement of the F genome while highlighting the uniqueness of the A genome in the Western Hemisphere. These genomic resources are particularly important for understanding the secondary and tertiary gene pools available for the improvement of the domesticated chenopods while furthering our understanding of the evolution and complexity within the genus.

Funding

USDA: 2020-67014-30867

History

Data contact name

BioProject Curation Staff

Publisher

National Center for Biotechnology Information

Temporal Extent Start Date

2024-07-05

Theme

  • Non-geospatial

ISO Topic Category

  • biota

National Agricultural Library Thesaurus terms

genomics; sequence analysis; genome

Pending citation

  • No

Public Access Level

  • Public

Accession Number

PRJNA1132190

Preferred dataset citation

It is recommended to cite the accession numbers that are assigned to data submissions, e.g. the GenBank, WGS or SRA accession numbers. If individual BioProjects need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJNA1132190 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/)."

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC