Brassica rapa cultivar Chiifu Whole Genome v2.5 Assembly & Annotation

Overview
Analysis Name
Brassica rapa cultivar Chiifu Whole Genome v2.5 Assembly & Annotation
Method
SOAPDenovo2, SSPACE, GapCloser and PBjelly
Source
A combination of Illumina and PacBio sequencing reads
Date Performed
Thursday, May 12, 2016
About the assembly

The B. rapa genome was de novo assembled and largely improved upon with more Illumina and PacBio sequencing. This improved assembly, B. rapa genome (V2.0), includes a total of ∼76 Gb Illumina paired-end reads (~156×), 21 Gb mate-paired reads, and 6.5 Gb of PacBio single-molecule data (Supplemental Table 1). The data for the new genome were generated with read-through paired-end reads, 20 kb and 40 kb mate-paired reads, single-molecule data, and all of the reads used in the previous assembly of B. rapa genome V1.5 (Wang et al., 2011). The new assembly has a total size of 389.2 Mb (Supplemental Methods). It covers approximately 80.25% of the B. rapa genome, 106 Mb more than V1.5. The new assembly contains 86 986 scaffolds, with a scaffold N50 of 3.38 Mb. Ninety percent of the assembled sequences fall into 349 scaffolds larger than 26 kb (Supplemental Table 2). Statistical comparisons were performed on V2.0 before and after the inclusion of the PacBio data (Supplemental Table 3).

Two B. rapa segregating populations were used to construct high-density genetic maps to correct assembly errors and assign the scaffolds to the chromosomes of B. rapa. One genetic map was generated from an RIL (recombinant inbred lines) population (Yu et al., 2013), and the other was built on an F1DH (double haploid) population. A total of 96 589 and 6944 polymorphic SNPs were identified between the two parents of each population (Supplemental Methods). With these SNPs, 2063 and 1622 bin markers were identified and used to construct two genetic maps with total genetic distances of 1316.731 cM and 1391.516 cM, respectively (Supplemental Table 4). These newly generated genetic maps have much higher marker density than the previous version which was built with InDel and SSR markers in V1.5 (Wang et al., 2011). A total of 28 scaffolds were corrected (Supplemental Table 5). Thereafter, scaffolds were ordered along the 10 chromosomes of B. rapa based on the two maps and syntenic evidence (Supplemental Methods and Supplemental Figure 1). Approximately 85% (>138 scaffolds) of the assembly was assigned to chromosomes, which are ∼330 Mb (Supplemental Tables 4 and 6). Interestingly, three chromosomes (A05, A06, and A09) had genetic map inconsistences in F1DH (Supplemental Figure 2), indicating possible structural variation among varieties (parents of population) of B. rapa.

Genome annotation

Three methods were used to predict gene models in the updated genome: ab initio modeling, homologous gene detection, and transcript fragment mapping. For transcript-based gene prediction, large volumes of mRNA-sequencing (RNA-seq) data were used to improve gene prediction quality (Supplemental Methods). The summary statistics of the updated gene set and its comparison with previous versions are shown in Supplemental Table 7. A total of 48 826 protein-coding genes were predicted in V2.0, which is 7652 more than in V1.5. Among the annotated genes, a great increase in the number of multi-exon genes was observed in V2.0, about 4610 more than in V1.5. Furthermore, we improved the gene models by adding UTRs and alternative splicing isoforms in V2.0, which were not available in V1.5.

Reference

Cai, C., Wang, X., Liu, B., Wu, J., Liang, J., Cui, Y., ... & Wang, X. (2017). Brassica rapa genome 2.0: a reference upgrade through sequence re-assembly and gene re-annotation. Molecular plant, 10(4), 649-651.

Download

All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar link. Each data type page will provide a description of the available files and links to download. Alternatively, you can browse all available files on the HTTP download repository or here.

Assembly

The Brassica rapa cultivar Chiifu genome v2.5 assembly file:

Downloads

Brassica rapa cultivar Chiifu genome v2.5 assembly (FASTA file) BrapaV2.5_Chr.fa.gz
Gene Predictions

The Brassica rapa cultivar Chiifu genome v2.5 gene prediction files:

Downloads

Predicted Genes (GFF3 file) BrapaV2.5_Chr.gene.gff.gz
CDS sequences (FASTA file) BrapaV2.5_Chr.cds.fa.gz
Protein sequences (FASTA file) BrapaV2.5_Chr.pep.fa.gz