Gossypium barbadense cultivar:Pima-S6 Genome sequencing

dataset

posted on 2024-11-23, 22:16 authored by USDA-ARS

Gossypium barbadense also known as Pima cotton and widely recognized for producing high-premium value of superior fibers and for Fusarium wilt race 4 (FOV4) resistance remains largely unexplored at the genomic and molecular level. A novel high-quality genome assembly was performed of Pima-S6. DNA isolation targeted high-molecular weight (HMW)-DNA extraction from isolated nuclei which was used to generate three libraries (160X, 52X, and 82X). The genome was sequenced using paired-end and mate-pair libraries short reads sequencing with additional linked-reads sequencing libraries (10X genomics Chromium). The genome assembly was conducted using the DeNovoMAGICTM software platform (NRGene, Nes Ziona, Isreal). A 2.3 Gbp assembly was obtained, with 26 pseudo-chromosomes totaling 2.2 Gbp. Pima-S6 assembly revealed a better BUSCO score of completeness (> 97 percent) for the At and Dt subgenomes than previously published Pima genomes. A comparative analysis at the chromosome and protein level with other 10 Gossypium published genomes detected important structural variations (synteny, inversions, translocations, and duplications) and differences on annotated proteins with other published G. barbadense and across Gossypium assemblies. Synteny analyses validated the chromosomal rearrangements in several chromosomes between Pima-S6 and Upland (G. hirsutum L) TM-1. The final 2.3 Gb Pima-S6 assembly annotated using MAKER-P predicted 88,343 genes of which more than 75,000 genes revealed evidence of homology to known proteins and 1,965 genes evidence of expression in an RNA-seq dataset obtained from roots and leaves of Pima-S6 plants. Comparisons of Pima-S6 assembly at the chromosome and gene sequence-level to other Gossypium species suggest that the results are highly influenced by the methodologies and strategies used to sequence/assemble the genome and to annotate proteins. The Pima-S6 genome provides a new valuable genomic resource and will help us to identify/dissect genes related to important traits such as FOV4 resistance and fiber quality improvement, assisting in the processes of future breeding programs.

Funding

USDA: 3096-21000-022-000-D

History

Data contact name

BioProject Curation Staff

Data contact email

bioprojecthelp@ncbi.nlm.nih.gov

Publisher

National Center for Biotechnology Information

Temporal Extent Start Date

2022-01-18

Theme

Non-geospatial

ISO Topic Category

biota

National Agricultural Library Thesaurus terms

genomics; sequence analysis; genome

Pending citation

Public Access Level

Public

Accession Number

PRJNA798371

Preferred dataset citation

It is recommended to cite the accession numbers that are assigned to data submissions, e.g. the GenBank, WGS or SRA accession numbers. If individual BioProjects need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJNA798371 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/)."