Ag Data Commons
Browse

Agrilus planipennis community manual annotations

Download (308.72 kB)
dataset
posted on 2024-11-19, 20:14 authored by Joshua B. Benoit, Alexander Martynov, David R. Nelson, Kristen A. Panfilio, Yves Pauchet
<p dir="ltr">Manual annotation at the i5k Workspace@NAL (<a href="https://i5k.nal.usda.gov/" target="_blank">https://i5k.nal.usda.gov</a>) is the review and improvement of gene models derived from computational gene prediction. Community curators compare an existing gene model to evidence such as RNA-Seq or protein alignments from the same or closely related species and modify the structure or function of the gene accordingly, typically following the i5k Workspace@NAL manual annotation guidelines (<a href="https://i5k.nal.usda.gov/content/rules-web-apollo-annotation-i5k-pilot-project" target="_blank">https://i5k.nal.usda.gov/content/rules-web-apollo-annotation-i5k-pilot-project</a>). If a gene model is missing, the annotator can also use this evidence to create a new gene model. Because manual annotation, by definition, improves or creates gene models where computational methods have failed, it can be a powerful tool to improve computational gene sets, which often serve as foundational datasets to facilitate research on a species.</p><p dir="ltr">Here, community curators used manual annotation at the i5k Workspace@NAL to improve computational gene predictions from the dataset <i>Agrilus planipennis</i> genome annotations v0.5.3. The i5k Workspace@NAL set up the Apollo v1 manual annotation software and multiple evidence tracks to facilitate manual annotation. From 2014-10-20 to 2018-07-12, five community curators updated 263 genes, including developmental genes; cytochrome P450s; cathepsin peptidases; cuticle proteins; glycoside hydrolases; and polysaccharide lyases. For this dataset, we used the program LiftOff v1.6.3 to map the manual annotations to the genome assembly GCF_000699045.2. We computed overlaps with annotations from the RefSeq database using gff3_merge from the GFF3toolkit software v2.1.0. FASTA sequences were generated using gff3_to_fasta from the same toolkit. These improvements should facilitate continued research on <i>Agrilus planipennis</i>, or emerald ash borer (EAB), which is an invasive insect pest.</p><p dir="ltr">While these manual annotations will not be integrated with other computational gene sets, they are available to view at the i5k Workspace@NAL (https://i5k.nal.usda.gov) to enhance future research on <i>Agrilus planipennis.</i></p>

Funding

USDA-ARS

History

Related Materials

  1. 1.
    URL - Is compiled by Apollo
  2. 2.
    URL - Is compiled by LiftOff
  3. 3.
    URL - Is compiled by GFF3toolkit
  4. 4.
  5. 5.
    URL - Is supplement to Genome assembly Apla_2.0
  6. 6.

Data contact name

Poelchau, Monica, F.

Data contact email

Monica.poelchau@usda.gov

Publisher

Ag Data Commons

Temporal Extent Start Date

2014-10-20

Temporal Extent End Date

2024-01-26

Theme

  • Non-geospatial

ISO Topic Category

  • biota

Ag Data Commons Group

  • Insects - i5K

National Agricultural Library Thesaurus terms

Agrilus planipennis; genome annotation

OMB Bureau Code

  • 005:18 - Agricultural Research Service

OMB Program Code

  • 005:040 - National Research

Pending citation

  • No

Public Access Level

  • Public

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC