Ag Data Commons
Browse

Data for: Variant filters using segregation information improve mapping of nectar-production genes in sunflower (Helianthus annuus L.)

dataset
posted on 2025-05-27, 15:11 authored by Ashley Barstow, James McNellie, Brian SmartBrian Smart, Kyle G. Keepers, Jarrad Prasifka, Nolan C. Kane, Brent S. Hulke
  • Genotypic Data (VCFs):
    • All VCF files contain imputed, biallelic SNPs derived from the same population but differ in the filtering strategies applied.
    • Approach1_...vcf.gz: Filtered using hard thresholds (minQ ≥ 100, max Missing ≤ 0.75, MAF ≥ 0.05, inferred single copy).
    • Approach2_...vcf.gz: Applies the same hard filters as Approach 1, with an additional Chi-Square filter (p-value ≤ 0.1).
    • Approach3_...vcf.gz: Filtered using only a Chi-Square filter (p-value ≤ 0.1) on imputed, biallelic SNPs.
  • Phenotypic Data (XLSX):
    • nectar_phenotype.xlsx: Contains phenotypic measurements for the population, including individual identifiers (ID) and nectar volume data (nectar_mm_T, nectar_mm).

Funding

National Sunflower Association Grant 22-P01

USDA-ARS: 3060-21000-047

History

Data contact name

Smart, Brian, C.

Data contact email

brian.smart@ndsu.edu

Publisher

Ag Data Commons

Intended use

This paper offers insights into optimizing variant calling strategies for genetic analysis, specifically comparing standard depth filters with a biologically informed approach based on Mendelian segregation. It is primarily useful for researchers involved in plant genetics, quantitative genetics, genomics, bioinformatics, and crop breeding, particularly those conducting QTL mapping or seeking to identify candidate genes for complex traits. The methodology and findings can guide the improvement of analytical pipelines to enhance the accuracy and power of detecting genetic variants associated with traits like nectar production in sunflower, and potentially other complex traits in various organisms.

Use limitations

This paper has no specific usage restrictions and can be freely cited and applied within relevant research and analytical contexts. It provides valuable methodological comparisons and insights for studies involving variant filtering, QTL mapping, candidate gene discovery, and understanding the genetic architecture of complex traits. It can serve as a resource for developing and validating bioinformatic pipelines in genomics and crop improvement programs.

Temporal Extent Start Date

2019-04-14

Temporal Extent End Date

2025-04-14

Theme

  • Non-geospatial

ISO Topic Category

  • farming
  • biota

National Agricultural Library Thesaurus terms

filters; nectar secretion; genes; Helianthus annuus; single nucleotide polymorphism; phenotype; nectar; genomics; sequence analysis; information management; genetic analysis; plant genetics; bioinformatics; plant breeding; quantitative trait loci; pipelines; genetic variation

OMB Bureau Code

  • 005:18 - Agricultural Research Service

OMB Program Code

  • 005:040 - National Research

ARS National Program Number

  • 301

ARIS Log Number

421372

Pending citation

  • No

Public Access Level

  • Public