Data from: Genotypic characterization of the U.S. peanut core collection
This collection contains supplementary data for the manuscript "Genotypic characterization of the U.S. Peanut Core Collection", which describes genotyping results for the USDA peanut core collection. Each accession was genotyped with the Arachis_Axiom2 SNP array, yielding 14,430 high-quality, informative SNPs across the collection. Additionally, a subset of the core collection was replicated genotyped in replicate, using between two and five seeds per accession to assess heterogeneity within an accession. Supplementary files include: descriptive information about the genotyped accessions, SNP genotype calls in several formats, a phylogenetic tree calculated from the genotype data, Structure analysis, PCA analysis, and comparisons with the diploid progenitors.
This research was co-funded by the National Institute of Food and Agriculture and the National Peanut Board.
Resources in this dataset:
Resource Title: Structure membership breakdown.
File Name: SF10_K5_membership.pdf
Resource Description: The proportion of accessions assigned to clusters 1-5 in a Structure analysis (manuscript Figure 3), for K=5 clusters.
Resource Title: Structure membership assignments for accessions.
File Name: SF11_K5_cluster_assignment.xlsx
Resource Description: The proportional assignments of each cluster to all accessions (relative to the Structure diagram shown in manuscript Figure 3).
Resource Title: Principal components analysis.
File Name: SF12_pca_34.pdf
Resource Description: Principal Component Analysis of 1120 samples based on 2063 unlinked SNP markers. The X-axis represents PC 3 and the Y-axis represents PC 4. Samples are colored and grouped according to: A. clade membership as defined in the phylogenetic and network analyses, B. botanical varieties, C. market type, D. growth Habit, E. pod shape, and F. collection type
Resource Title: Pod images for PI 497426.
File Name: SF14_PI497426_pods.jpg
Resource Description: Pods from accession PI 497426 (clade 4), illustrating the distinctive reticulation pattern seen in some accessions in this clade.
Resource Title: Data dictionary.
File Name: data_dictionary_KNWV.txt
Resource Description: Description of all files in this Dataset. Changes were made to this file on 4/15/202, to update some file names to indicate new versions.
Resource Title: Main descriptive information about genotyped accessions.
File Name: SF01_peanut_core_v14.xlsx
Resource Description: The main descriptive information about the genotyped accessions, including: information about replicate similarity; phylogenetic clades, geographic origin, and phenotype; and summaries of phenotypic and country information relative to clade assignments. Changes were made to this file on 4/15/2020: Added INDEX worksheet and corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: SNPs as called by the Axiom suite .
File Name: SF02_SNPs_whole_Axiom_Arachis2_txt.gz
Resource Description: The original genotype calls for the Axiom array (for poly-high resolution SNPs). Changes were made to this file on 4/15/2020: Corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: Genotyping calls in VCF format.
File Name: SF03_SNPs_whole_Axiom_Arachis2_vcf.gz
Resource Description: The Axiom array genotype calls, in VCF format. Changes were made to this file on 4/15/2020: Corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: DNA variants for all accessions, including from genome assemblies, in TSV format.
File Name: SF04_SNPs_w_4_genomes_tsv.gz
Resource Description: The predominant DNA variants at each SNP location, for all accessions, including variants inferred from four available genome assemblies: A. duranensis and A. ipaensis together, and A. hypogaea accessions Tifrunner, Shitouqi, and Fuhuasheng. The format is in a simple tab-separated table, with 14431 columns (SNP positions). Changes were made to this file on 4/15/2020: Corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: DNA variants for all accessions, including from genome assemblies, in fasta format.
File Name: SF05_SNPs_w_4_gnm_mrgd_fas.gz
Resource Description: The predominant DNA variants at each SNP location, for all accessions, including variants inferred from four available genome assemblies: A. duranensis and A. ipaensis together, and A. hypogaea accessions Tifrunner, Shitouqi, and Fuhuasheng. In fasta format. Changes were made to this file on 4/15/2020: Corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: Base-calls for selected accessions, relative to A- and B-genome progenitors.
File Name: SF06_chip_and_genome_samples_v05.xlsx
Resource Description: DNA base-calls for 16 selected, diverse accessions, with comparisons to the variants observed in the A. duranensis and A. ipaensis genomes, and inferences regarding the likely progenitor for the DNA, i.e. A-genome (A. duranensis) or B-genome (A. ipaensis). Changes were made to this file on 4/15/2020: Added INDEX worksheet and corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: Reduced fasta alignments, at 98% identity.
File Name: SF07_SNPs_w_4_gnm_mrgd_cen98_fas.gz
Resource Description: Reduced fasta alignments (relative to the complete alignment file, S5). File S7 has the centroid representatives at 98% identity. This files has 518 sequences. Changes were made to this file on 4/15/2020: Corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: Reduced fasta alignments, at 99% identity.
File Name: SF08_SNPs_w_4_gnm_mrgd_cen99_fas.gz
Resource Description: Reduced fasta alignments (relative to the complete alignment file, S5). File S8 has the centroid representatives at 99% identity. This file has 680 sequences. Changes were made to this file on 4/15/2020: Corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: Phylogenetic tree of genotype data.
File Name: SF09_SNPs_w_4_gnm_mrgd_rt3_nh_txt.gz
Resource Description: Phylogenetic tree (Newick format) calculated from the alignent in S5, and corresponding with the phylogenetic tree shown in manuscript Figure 1. Changes were made to this file on 4/15/2020: Corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: Subgenome origins of SNPs relative to the A-genome and B-genome progenitors.
File Name: SF13_chip_and_genome_GFFs.xlsx
Resource Description: Inferred subgenome origins of SNPs relative to the A-genome and B-genome progenitors (A. duranensis and A. ipaensis). This data is in GFF format, derived from S6, and used as the basis for the plots in Figure 7 (showing regions of possible subgenome invasions). Changes were made to this file on 4/15/2020: Added INDEX worksheet and corrected three peanut variety identifiers: ROL11 --> TamrunOL-11; NCL06 --> TamnutOL-06; NM309N2 --> NM309-2
Resource Title: Peruvian Moche-era peanut necklace.
File Name: SF15_Sipan_neclkace_Donnan_Einstein.jpg
Resource Description: Picture of necklace of peanuts, sculpted in gold and silver, from the Moche-era tomb at Sipán (c.AD 250) in coastal Peru. Photograph by Susan Einstein, courtesy of Christopher Donnan. Changes were made to this file on 4/15/2020: Replaced black-and-white derived image with original color image
Funding
USDA-NIFA: 2018-67013-28138
History
Data contact name
Cannon, EthyData contact email
ethy.cannon@usda.govPublisher
Ag Data CommonsIntended use
This genotype data will be useful for identifying and distinguishing peanut accessions, and for conducting genome-wide association analyses with the U.S. peanut core collection.Theme
- Not specified
Geographic location - description
Dataset derives from peanut germplasm collected worldwide, with many accessions from South America, Africa, and Asia.ISO Topic Category
- biota
- farming
- location
Ag Data Commons Group
- AgBioData
National Agricultural Library Thesaurus terms
United States; genotyping; USDA; peanuts; single nucleotide polymorphism arrays; single nucleotide polymorphism; seeds; genotype; phylogeny; diploidy; Arachis hypogaeaOMB Bureau Code
- 005:18 - Agricultural Research Service
OMB Program Code
- 005:040 - National Research
ARS National Program Number
- 301
Pending citation
- No
Public Access Level
- Public