Ag Data Commons
Browse

File(s) stored somewhere else

Please note: Linked content is NOT stored on Ag Data Commons and we can't guarantee its availability, quality, security or accept any liability.

Oncorhynchus mykiss isolate:Arlee Genome sequencing and assembly

dataset
posted on 2024-06-11, 06:21 authored by USDA/ARS
Although the most recent version of the rainbow trout genome assembly from the Swanson line has greatly improved the genome reference and is reliable for genes' predictions, it contains 420,055 spanned gaps and 7,839 un-spanned gaps (GCA_002163495.1). Hence, there is still a need to improve the contiguity and completeness of the reference assembly, which is now possible with long-read DNA sequencing technologies. Currently, we are also working towards generating a rainbow trout pan-genome reference that will better represent the genetic diversity in this species. The Arlee doubled haploid YY male line has a different genetic background from the Swanson line. It was originated from a domesticated strain that was originally collected from the northern California coast. For the Arlee genome assembly, we generated 111x genome coverage in long-read sequence data using the PacBio Sequel system. The read length distribution has N50 of about 33 kb and an average read length greater than 20 kb. Contigs were assembled using the Canu pipeline and consensus sequence was error-corrected using two iterations of Arrow with the PacBio reads followed by one iteration of Freebayes using Illumina paired-end reads. The Canu assembly contained 1,591 contigs with an N50 contig length of 9.8 Mbp, which is a major improvement in contiguity compared to the current Swanson assembly. The assembly was further improved with a Bionano optical map and Hi-C proximity ligation sequence data to produce super-scaffolds. The total length of the final assembly is ~2,33 Gbp, of which ~95% was anchored into 29 chromosome sequences using the same rainbow trout high-density genetic map that we have previously used for the Swanson reference genome assembly. The new assembly is composed primarily of 32 major scaffolds corresponding perfectly to the karyotype of the Arlee line (2N=64). Six of the Arlee acrocentric chromosomes can be perfectly aligned with three of the Swanson line metacentric chromosomes. The three Swanson chromosomes that are being divided to two acrocentric chromosomes are Omy04, 14 and 25 as we have previously described in Pearse et al. (2019).

Funding

Agricultural Research Service, 8082-31000-013-00-D

History

Data contact name

BioProject Curation Staff

Publisher

National Center for Biotechnology Information

Temporal Extent Start Date

2020-06-02

Theme

  • Non-geospatial

ISO Topic Category

  • biota

National Agricultural Library Thesaurus terms

genomics; sequence analysis; genome

Pending citation

  • No

Public Access Level

  • Public

Accession Number

PRJNA623027

Preferred dataset citation

It is recommended to cite the accession numbers that are assigned to data submissions, e.g. the GenBank, WGS or SRA accession numbers. If individual BioProjects need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJNA623027 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/)."

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC