Ag Data Commons
Browse

File(s) stored somewhere else

Please note: Linked content is NOT stored on Ag Data Commons and we can't guarantee its availability, quality, security or accept any liability.

Computational detection and experimental validation of segmental duplications and associated copy number variants in river buffalo (Bubalus bubalis)

dataset
posted on 2024-06-11, 05:54 authored by Animal Genomics and improvement Lab., USDA-ARS
Duplicated sequences are the important source of gene innovation and structural variation within mammalian genomes. Using a read depth approach based on next-generation sequencing, we performed a genome-wide analysis of segmental duplications (SDs) and associated copy number variants (CNVs) in water buffalo (Bubalus bubalis). Aligning to the UMD3.1 cattle genome, we estimated 44.6 Mb (~1.73% of cattle genome) segmental duplications in the autosomes and X chromosome using the sequencing reads of Olimpia (the sequenced water buffalo). 70.3% (70/101) duplications were experimentally validated using the fluorescent in situ hybridization. We also detected a total of 1344 CNV regions across 14 additional water buffalos as well as Olimpia, amounting to 59.8Mb of variable sequence or 2.2% of the cattle genome. The CNV regions overlap 1245 genes and are significantly enriched for specific biological functions such as immune response, oxygen transport, sensory system and signalling transduction. Additionally, we performed array Comparative Genomic Hybridization (aCGH) experiments using the 14 water buffalos as test samples and Olimpia as the reference. Using a linear regression model, significant and high Pearson correlations (r = 0.781) were observed between the digital aCGH values and aCGH probe log2 ratios. We further designed Quantitative PCR assays to confirm CNV regions within or near annotated genes and found 74.2% agreement with our CNV predictions. Overall design: Whole genome high-denstiy CGH arrays manufactured by Agilent containing ~974,016 oligonucleotide probes were designed and fabricated on a single slide to provide an evenly distributed coverage on cattle UMD3.1 with an average interval of ~3.1 kb between probes. The reference animal chosen was Olimpia, an Italian Mediterranean river buffalo.

History

Data contact name

BioProject Curation Staff

Publisher

National Center for Biotechnology Information

Temporal Extent Start Date

2018-08-03

Theme

  • Non-geospatial

ISO Topic Category

  • biota

National Agricultural Library Thesaurus terms

genetic variation

Pending citation

  • No

Public Access Level

  • Public

Accession Number

PRJNA484504

Preferred dataset citation

It is recommended to cite the accession numbers that are assigned to data submissions, e.g. the GenBank, WGS or SRA accession numbers. If individual BioProjects need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJNA484504 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/)."

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC