posted on 2024-11-19, 20:14authored byJoshua B. Benoit, Alexander Martynov, David R. Nelson, Kristen A. Panfilio, Yves Pauchet
<p dir="ltr">Manual annotation at the i5k Workspace@NAL (<a href="https://i5k.nal.usda.gov/" target="_blank">https://i5k.nal.usda.gov</a>) is the review and improvement of gene models derived from computational gene prediction. Community curators compare an existing gene model to evidence such as RNA-Seq or protein alignments from the same or closely related species and modify the structure or function of the gene accordingly, typically following the i5k Workspace@NAL manual annotation guidelines (<a href="https://i5k.nal.usda.gov/content/rules-web-apollo-annotation-i5k-pilot-project" target="_blank">https://i5k.nal.usda.gov/content/rules-web-apollo-annotation-i5k-pilot-project</a>). If a gene model is missing, the annotator can also use this evidence to create a new gene model. Because manual annotation, by definition, improves or creates gene models where computational methods have failed, it can be a powerful tool to improve computational gene sets, which often serve as foundational datasets to facilitate research on a species.</p><p dir="ltr">Here, community curators used manual annotation at the i5k Workspace@NAL to improve computational gene predictions from the dataset <i>Agrilus planipennis</i> genome annotations v0.5.3. The i5k Workspace@NAL set up the Apollo v1 manual annotation software and multiple evidence tracks to facilitate manual annotation. From 2014-10-20 to 2018-07-12, five community curators updated 263 genes, including developmental genes; cytochrome P450s; cathepsin peptidases; cuticle proteins; glycoside hydrolases; and polysaccharide lyases. For this dataset, we used the program LiftOff v1.6.3 to map the manual annotations to the genome assembly GCF_000699045.2. We computed overlaps with annotations from the RefSeq database using gff3_merge from the GFF3toolkit software v2.1.0. FASTA sequences were generated using gff3_to_fasta from the same toolkit. These improvements should facilitate continued research on <i>Agrilus planipennis</i>, or emerald ash borer (EAB), which is an invasive insect pest.</p><p dir="ltr">While these manual annotations will not be integrated with other computational gene sets, they are available to view at the i5k Workspace@NAL (https://i5k.nal.usda.gov) to enhance future research on <i>Agrilus planipennis.</i></p>