A Pangenome Reveals LTR Repeat Dynamics as a Major Driver of Genome Evolution in Chenopodium
dataset
posted on 2025-01-22, 05:15authored byBrigham Young University
The genus Chenopodium is characterized by its wide geographic distribution and ecological adaptability. Species such as Chenopodium quinoa have served as domesticated staple crops for centuries and continued to be valued for their robust nutritional profile. Wild Chenopodium species exhibit diverse niche adaptations and function as important genetic reservoirs for beneficial traits, including disease resistance and climate hardiness. To harness the potential of the wild taxa for crop improvement, we developed a Chenopodium pangenome through the assembly and comparative analyses of 12 Chenopodium species that encompass the eight known genome types (A-H). Six of the species are new chromosome-scale assemblies and many are polyploids, thus a total of 20 genomes were included in the pan-genome analyses. We show that the genomes vary dramatically in size with the D genome being the smallest (~370 Mb) and the B genome being largest (ca. 700 Mb) and that genome size was correlated with independent expansions of the Copia and Gypsy LTR retrotransposon families, suggesting that transposable elements have played a critical role the evolution of the Chenopodium genomes. We annotated a total of 33,457 pan-Chenopodium gene families of which 65% were dispensable with only 2% being private. Phylogenetic analysis clarified the evolutionary relationships among the genome lineages, notably resolving the taxonomic placement of the F genome while highlighting the uniqueness of the A genome in the Western Hemisphere. These genomic resources are particularly important for understanding the secondary and tertiary gene pools available for the improvement of the domesticated chenopods while furthering our understanding of the evolution and complexity within the genus.
It is recommended to cite the accession numbers that are assigned to data submissions, e.g. the GenBank, WGS or SRA accession numbers. If individual BioProjects need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJNA1132190 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/)."