Ag Data Commons
Browse
ARCHIVE
Dcitr_OGSv1.0.tar.gz (22.48 MB)
IMAGE
Workflow_Fig3.png (23.04 kB)
1/0
2 files

Diaphorina citri Official Gene Set v1.0

dataset
posted on 2024-02-08, 20:48 authored by Surya Saha, Prashant Hosmani, Krystal Villalobos-Ayala, Sherry Miller, Teresa D. Shippy, Mirella Flores, Andrew J. Rosendale, Chris Cordola, Tracey J. Bell, Hannah Mann, Gabe DeAvila, Daniel DeAvila, Zachary Moore, Kyle Buller, Kathryn Ciolkevich, Samantha Nandyal, Robert Mahoney, Joshua Voorhis, Megan E. Dunlevy, David W. Farrow, David Hunter, Taylar Morgan, Kayla Shore, Victoria Guzman, Allison Izsak, Danielle Dixon, Andrew Cridge, Liliana Cano, Xiaolong Cao, Haobo Jiang, Nan Leng, Shannon Johnson, Brandi Cantarel, Stephen Richards, Adam English, Robert Shatters, Christopher Childers, Mei-Ju Chen, Wayne B. Hunter, Michelle Cilia, Lukas A. Mueller, Monica Munoz-Torres, David R. Nelson, Monica Poelchau, Joshua B. Benoit, Helen Wiersma-Koch, Tom D'Elia, Susan Brown

The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory, and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models.

This project was funded by the U.S. Department of Agriculture under the DEVELOPING AN INFRASTRUCTURE AND PRODUCT TEST PIPELINE TO DELIVER NOVEL THERAPIES FOR CITRUS GREENING DISEASE grant.

This Official Gene Set was generated as a merge of NCBI's Diaphorina citri Annotation Release 100 and a gff3 file resulting from manual curation efforts of the Diaphorina citri annotation community in the Apollo software (Apollo URL: https://apollo.nal.usda.gov/diacit/jbrowse/). Initially, QC of the manually curated genes was performed using the NAL's QC prototype software (description is available here: https://github.com/NAL-i5K/I5KNAL_OGS/wiki/QC-phase; software is available on request). Then, the cleaned manual annotations were merged with the protein-coding genes from the NCBI Diaphorina citri Annotation Release 100 using the NAL's Merge prototype software (description is available here:https://github.com/NAL-i5K/I5KNAL_OGS/wiki/Merge-phase; software is available on request). Non-coding RNAs from the NCBI Diaphorina citri Annotation Release 100 were added to the OGS after this merge. New consortium IDs for the OGS were generated, but Dbxref attributes referring to the original NCBI accessions were maintained when the model was not altered manually. CDS sequences for all protein-coding models, and protein and rna sequences from manually curated models were generated from the OGS gff3 file using the NAL's gff3_to_fasta.py program (available here: https://github.com/NAL-i5K/GFF3toolkit) and the underlying genome sequence. All other sequences were derived from NCBI's Diaphorina citri Annotation Release 100, primarily because some protein and rna sequences predicted by NCBI contain additional sequence not present in the genome sequence. Note and exception attributes from NCBI were ported to the OGS gff3 file when sequence not derived from the genome sequence was used for the final model.

Files included in this Official Gene Set:

  1. Gff3 file: Dcitr_OGSv1.0.gff3

  2. Protein fasta: Dcitr_OGSv1.0_pep.fa

  3. RNA fasta: Dcitr_OGSv1.0_rna.fa

  4. CDS fasta: Dcitr_OGSv1.0_cds.fa

  5. Mapping file describing the changes between the original NCBI annotations and the OGS: Dcitr_NCBI_to_OGSv1.0_id_mapFile.txt


    Resources in this dataset:

    • Resource Title: Diaphorina citri Official Gene Set v1.0.

      File Name: Dcitr_OGSv1.0.tar.gz

      Resource Description: **Files included in this Official Gene Set:**

    • Gff3 file: Dcitr_OGSv1.0.gff3

    • Protein fasta: Dcitr_OGSv1.0_pep.fa

    • RNA fasta: Dcitr_OGSv1.0_rna.fa

    • CDS fasta: Dcitr_OGSv1.0_cds.fa

    • Mapping file describing the changes between the original NCBI annotations and the OGS: Dcitr_NCBI_to_OGSv1.0_id_mapFile.txt


  • Resource Title: Curation workflow.

    File Name: Workflow_Fig3.png

  • Funding

    USDA: 2015-70016-23028

    History

    Data contact name

    Saha, Surya

    Data contact email

    ss2489@cornell.edu

    Publisher

    Ag Data Commons

    Theme

    • Not specified

    ISO Topic Category

    • biota

    Ag Data Commons Group

    • Insects - i5K

    National Agricultural Library Thesaurus terms

    genome assembly; Diaphorina citri; Asian citrus psyllid; genomics; pathogens; bacteria; genes; insect vectors; nucleotide sequences; greening disease; RNA interference; transcriptome

    OMB Bureau Code

    • 005:00 - Department of Agriculture

    OMB Program Code

    • 005:037 - Research and Education

    Pending citation

    • No

    Public Access Level

    • Public

    Preferred dataset citation

    Saha, Surya; Hosmani, Prashant; Villalobos-Ayala, Krystal; Miller, Sherry; Shippy, Teresa D.; Flores, Mirella; Rosendale, Andrew J.; Cordola, Chris; Bell, Tracey J.; Mann, Hannah; DeAvila, Gabe; DeAvila, Daniel; Moore, Zachary; Buller, Kyle; Ciolkevich , Kathryn; Nandyal , Samantha; Mahoney , Robert; Voorhis , Joshua; Dunlevy, Megan E.; Farrow, David W.; Hunter, David; Morgan, Taylar; Shore, Kayla; Guzman, Victoria; Izsak, Allison; Dixon, Danielle; Cridge, Andrew; Cano, Liliana; Cao, Xiaolong; Jiang, Haobo; Leng, Nan; Johnson, Shannon; Cantarel, Brandi; Richards, Stephen; English, Adam; Shatters, Robert; Childers, Christopher; Chen, Mei-Ju; Hunter, Wayne B.; Cilia, Michelle; Mueller, Lukas A.; Munoz-Torres, Monica; Nelson, David R.; Poelchau, Monica; Benoit, Joshua B.; Wiersma-Koch, Helen; D'Elia, Tom; Brown, Susan (2017). Diaphorina citri Official Gene Set v1.0. Ag Data Commons. https://doi.org/10.15482/USDA.ADC/1345524