Ag Data Commons
4 files

Divergence in host specificity and genetics among populations of Aphelinus certus

posted on 2024-02-15, 17:59 authored by Keith HopperKeith Hopper, Sara J. Oppenheim

These are data on variation in host specificity and genetics among 16 populations of an aphid parasitoid, Aphelinus certus, 15 from Asia and one from North America. Host range was the same for all the parasitoid populations, but levels of parasitism varied among aphid species, suggesting adaptation to locally abundant aphids. Differences in host specificity did not correlate with geographical distances among parasitoid populations, suggesting that local adaption is mosaic rather than clinal, with a spatial scale of less than 50 kilometers. Analysis of reduced representation libraries for each population showed genetic differentiation among them. Differences in host specificity correlated with genetic distances among the parasitoid populations.

Resources in this dataset:

  • Resource Title: data dictionary for Aphelinus certus population variation.

    File Name: data_dictionary_Aphelinus_certus.csv

    Resource Description: This is the data dictionary for the other files (Aphelinus_certus_host_use.csv, Aphelinus_certus_culture_data.csv) in this project.

  • Resource Title: Host specificity of Aphelinus certus populations.

    File Name: Aphelinus_certus_host_use.csv

    Resource Description: Results of no-choice experiments in the laboratory on parasitism, adult emergence rate, and progeny sex ratio for 15 populations of Aphelinus certus from China, Japan, and South Korea and one population from the US.

  • Resource Title: Culture data for Aphelinus certus populations.

    File Name: Aphelinus_certus_culture_data.csv

    Resource Description: This file gives data on the locations, dates, founding numbers, and collectors for the populations of Aphelinus certus studied in this project.

  • Resource Title: Fst and host use distances among populations of Aphelinus certus.

    File Name: A_certus_Fst_host_dist.csv

    Resource Description: We used next-generation sequencing of reduced-representation genomic libraries to genotype single nucleotide polymorphisms (SNPs) among the 16 A. certus populations. Libraries were prepared as described in Manching et al. (2017). Briefly, genomic DNA was extracted from pools of wasps from each population using Qiagen DNeasy Blood and Tissue Kits (Qiagen, Valencia, CA), following the standard protocol. The resulting DNA was digested with restriction endonucleases using one rare cutter (NgoMIV with a 6 bp recognition site) and one frequent cutter (CviQI with a 4 bp recognition site) (New England Biolabs, Inc., Ipswich, MA), which together determined the number of unique locations of fragments across the genome and the lengths of these fragments. Custom adaptors, with barcodes for each population that also served to register clusters on the Illumina HiSeq during sequencing, were ligated onto the fragments using T4 ligase (New England Biolabs, Inc., Ipswich, MA). The ligates were pooled and purified using Agencourt AMPure XP beads (Beckman Coulter, Indianapolis, IN). The purified ligate was separated into 10 aliquots that were amplified in separate PCR reactions to both increase copy number at each locus and add more adaptor sequence for sequencing. The adaptors were designed so that the only fragments that amplify would have the rare-common combination of cut sites. After PCR, the products were pooled and then size-selected (300-350 bp) using the BluePippin system (Sage Science, Beverly, MA). After quantification with qPCR, the resulting fragments were sequenced for ~100 nucleotides in single-end reads an Illumina HiSeq 2500 (Illumina, San Diego, CA) at the Delaware Biotechnology Institute.

    Sequence data were processed with a reduced-representation computational pipeline called RedRep (described in Manching et al. (2017)); the scripts and documentation for the pipeline are available under an open source MIT license at Briefly, sequences were deconvoluted by barcode using custom scripts and the FASTX-Toolkit (version 0.0.14; Custom scripts and CutAdapt (version 1.14; Martin 2011) were then used to remove adapters, trim low quality read ends, and filter out sequences that did not meet minimum length/quality standards or did not meet expectations for the restriction-site sequences. High-quality reads were mapped to the draft genome of A. certus using BWA-MEM program (version 0.7.16a; Li 2013). SNP loci were identified using the GATK HaplotypeCaller (version 3.5-0; McKenna et al. 2010). We filtered the SNP loci for read depth ≥ 50 and then for presence in all populations using BEDtools (version 2.26) and custom scripts written in R (version 3.3.3; R.Core.Team 2017). We tested the relationship between host use distance and genetic distance, as measured by FST. Because A. certus individuals were pooled within populations to make the libraries for sequencing, we used read depths to estimate allele frequencies for SNP loci. We filtered the data for SNP loci that were present in all populations and had read depth ≥ 50, and we used the numbers of individuals in each pool in calculating FST between populations with the calcPopDiff function in the polysat R package (version 1.7-2; Clark 2017). Using Mantel's permutation test, we compared the genetic and parasitism distance matrices (10,000 permutations with the mantel.randtest function in the ade4 R package).

    Clark, L. V. (2017) polysat version 1.7-2. Tools for polyploid microsatellite analysis. in. Li, H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: 1303.3997v1 [q-bio.GN]. Manching, H., Sengupta, S., Hopper, K. R., Polson, S. W., Ji, Y. and Wisser, R. J. (2017) Phased genotyping-by-sequencing enhances analysis of genetic diversity and reveals divergent copy number variants in maize. Genes Genomes Genetics, 7(7), pp. 2161-2170. Martin, M. (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. . EMBnet.journal, 17, pp. 10-12. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. and DePristo, M. A. (2010) The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), pp. 1297-1303. R.Core.Team (2017) R: A language and environment for statistical computing. in: R Foundation for Statistical Computing, Vienna, Austria.


USDA-ARS: 8010-22000-029-00D

USDA-NIFA: 2002-35302-12710

National Science Foundation: DEB-1257601


Data contact name

Hopper, Keith R.

Data contact email


Ag Data Commons

Intended use

These data were collected to measure variation host specificity and genetics of populations of Aphelinus certus Yasnosh (Hymenoptera: Aphelinidae), a parasitoid of aphids.

Use limitations

The data on host specificity are from no-choice laboratory experiments and thus do not capture all aspects of host specificity in the field.

Temporal Extent Start Date


Temporal Extent End Date



  • irregular


  • Not specified

Geographic location - description

Allentown, Pennsylvania, United States; Inner Mongolia, China; Hebei, China; Liaoning, China; Heilongjiang, China; Gyeonggi-do, South Korea; Gyeongsangbuk-Do, South Korea; Honshu, Japan;

ISO Topic Category

  • biota

National Agricultural Library Thesaurus terms

host specificity; genetics; Aphelinus; Aphidoidea; parasitoids; host range; parasitism; genetic variation; genetic distance; adults

OMB Bureau Code

  • 005:18 - Agricultural Research Service

OMB Program Code

  • 005:040 - National Research

ARS National Program Number

  • 304

Pending citation

  • No

Public Access Level

  • Public

Preferred dataset citation

Hopper, Keith R.; Oppenheim, Sara J. (2018). Divergence in host specificity and genetics among populations of Aphelinus certus. Ag Data Commons.

Usage metrics



    Ref. manager