Data from: Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production
Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.
For genome assembly of C. zofingiensis strain SAG 211–14, we used a hybrid approach blending short reads (Illumina), long reads (Pacific Biosciences of California), and whole-genome optical mapping (OpGen) (SI Appendix, SI Text and Datasets S1–S19, and refer to SI Appendix, Datasets Key). The combined power of these approaches yielded a high-quality haploid nuclear genome of C. zofingiensis of ∼58 Mbp distributed over 19 chromosomes (Fig. 2) in the tradition of model organism projects, as opposed to the fragmentary “gene-space” assemblies typical of modern projects using high-throughput methods and associated software. Approximately 99% of reads from the Illumina genomic libraries were accounted for, and nonplaceholder chromosomal sequence covers ∼94% of the optical map. Because no automated pipeline was found able to achieve the desired quality, methods are described in SI Appendix, SI Text.
Resources in this dataset:
Resource Title: Supporting information and datasets.
File Name: Web Page, url: http://www.pnas.org/content/suppl/2017/05/05/1619928114.DCSupplemental
Data in TXT files, MOV files, MPG files, and PDF information
Funding
USDA-NIFA: 2013-67012-21272
U.S. Department of Energy: DE-AC02-05CH11231
U.S. Department of Energy: DE-FC02-02ER63421
National Institutes of Health: 5T32HG002536–13
National Science Foundation
History
Data contact name
Niyogi, Krishna K.Data contact email
niyogi@berkeley.eduPublisher
Proceedings of the National Academy of Sciences of the United States of AmericaTheme
- Not specified
ISO Topic Category
- biota
National Agricultural Library Thesaurus terms
microalgae; energy; Chlorophyta; lipids; biofuels; functional foods; astaxanthin; nuclear genome; transcriptome; chromosomes; genes; exons; introns; models; gene expression regulation; mutants; biosynthesis; ABC transporters; cytochrome P-450; enzymes; genome assembly; mixing; California; data collection; haploidy; computer software; genomic librariesPending citation
- No
Public Access Level
- Public