Oncorhynchus mykiss isolate:Arlee Genome sequencing and assembly
dataset
posted on 2024-11-23, 22:04authored byUSDA/ARS
Although the most recent version of the rainbow trout genome assembly from the Swanson line has greatly improved the genome reference and is reliable for genes' predictions, it contains 420,055 spanned gaps and 7,839 un-spanned gaps (GCA_002163495.1). Hence, there is still a need to improve the contiguity and completeness of the reference assembly, which is now possible with long-read DNA sequencing technologies. Currently, we are also working towards generating a rainbow trout pan-genome reference that will better represent the genetic diversity in this species. The Arlee doubled haploid YY male line has a different genetic background from the Swanson line. It was originated from a domesticated strain that was originally collected from the northern California coast. For the Arlee genome assembly, we generated 111x genome coverage in long-read sequence data using the PacBio Sequel system. The read length distribution has N50 of about 33 kb and an average read length greater than 20 kb. Contigs were assembled using the Canu pipeline and consensus sequence was error-corrected using two iterations of Arrow with the PacBio reads followed by one iteration of Freebayes using Illumina paired-end reads. The Canu assembly contained 1,591 contigs with an N50 contig length of 9.8 Mbp, which is a major improvement in contiguity compared to the current Swanson assembly. The assembly was further improved with a Bionano optical map and Hi-C proximity ligation sequence data to produce super-scaffolds. The total length of the final assembly is ~2,33 Gbp, of which ~95% was anchored into 29 chromosome sequences using the same rainbow trout high-density genetic map that we have previously used for the Swanson reference genome assembly. The new assembly is composed primarily of 32 major scaffolds corresponding perfectly to the karyotype of the Arlee line (2N=64). Six of the Arlee acrocentric chromosomes can be perfectly aligned with three of the Swanson line metacentric chromosomes. The three Swanson chromosomes that are being divided to two acrocentric chromosomes are Omy04, 14 and 25 as we have previously described in Pearse et al. (2019).
It is recommended to cite the accession numbers that are assigned to data submissions, e.g. the GenBank, WGS or SRA accession numbers. If individual BioProjects need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJNA623027 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/)."