Ag Data Commons
Browse
- No file added yet -

Single nucleotide polymorphism (SNP) discovery in rainbow trout (Oncorhynchus mykiss) using restriction site associated DNA (RAD) sequencing of doubled haploids and assessment of polymorphism in a populations’ survey

dataset
posted on 2024-09-29, 05:11 authored by USDA-ARS-NCCCWA
Background: Our goal is to produce a high-throughput SNP genotyping platform for genomic analyses in rainbow trout that will enable fine mapping of QTL, whole genome association studies, genomic selection for improved aquaculture production traits, and genetic analyses of wild populations that aid better fisheries management. Salmonid genomes are considered to be in a semi-tetraploid state as a result of an evolutionarily recent genome duplication event, therefore complicating application of traditional molecular genetic approaches. As a result, DNA or RNA sequence assemblies often produce contigs that contain paralogous sequences. This situation complicates single nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and not simple allelic variants. In an effort to reduce the number of PSVs we sampled nearly 100% homozygous doubled haploid (DH) lines that represent a wide geographic range of rainbow trout populations. In addition, we surveyed 18 populations to characterize the degree of genetic diversity within and between multiple commercial populations, wild populations, and resource populations primarily used for research in the U.S.A. and in France. Results: We employed restriction-site associated DNA (RAD) technology to generate a large SNPs data set from deep sequencing of a panel of 11 DH lines. The dataset is composed of 145,168 high-quality putative SNPs that were genotyped in at least 9 of the 11 lines, of which 71,446 (49%) had minor allele frequencies (MAF) of at least 18% (i.e. at least 2 of the 11 lines). Approximately 16% of the RAD SNPs in this dataset are from expressed or coding rainbow trout sequences. Our populations’ survey revealed a wide range of shared RAD SNPs between every possible pair of populations ranging from 4% to 55% while the range of SNPs from any given population shared with at least one other population was between 42% and 89%. On average, 21% of the RAD SNPs found among the 11 DH lines and 30% of the SNPs with MAF≥0.18 were also found in the surveyed populations. However, some attributes of the RAD genotyping approach coupled with the large number of duplicated loci in the rainbow trout genome negatively affected this estimate and the actual percentage of SNPs shared with the resource DH panel is likely higher. Conclusions: A large dataset of high quality RAD SNPs that passed our rigorous analysis pipeline was generated from a panel of 11 DH rainbow trout lines as a resource for a SNP chip. The RADs marker system was found to be very useful for assessing genetic diversity within a single rainbow trout population or between a pair of populations, but not for comparing more than two populations. The generally low percentage of RAD SNPs shared between populations suggests that for a SNP chip to be useful for a wide range of genetic analyses in rainbow trout it will be crucial to select the most common SNPs with relatively high MAF.

Funding

USDA-NIFA-AFRI: 2011-67015-30091

History

Data contact name

BioProject Curation Staff

Publisher

National Center for Biotechnology Information

Temporal Extent Start Date

2013-02-14

Theme

  • Non-geospatial

ISO Topic Category

  • biota

National Agricultural Library Thesaurus terms

genetic variation

Pending citation

  • No

Public Access Level

  • Public

Accession Number

PRJNA189472

Preferred dataset citation

It is recommended to cite the accession numbers that are assigned to data submissions, e.g. the GenBank, WGS or SRA accession numbers. If individual BioProjects need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJNA189472 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/)."

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC