Brassica napus Eight Genomes v201810 and pan-genome v1 Assembly & Annotation

Overview
Analysis Name
Brassica napus Eight Genomes v201810 and pan-genome v1 Assembly & Annotation
Method
Falcon (falcon-2017.11.02–16.04-py2.7) and Canu (v1.6)
Source
A combination of PacBio sequencing, Illumina paired-end short read sequencing and Hi-C technologies
Date Performed
Tuesday, June 4, 2019
Genome assembly

Eight rapeseed accessions of three ecotypes named ZS11, Gangan, Zheyou7, Shengli, Tapidor, Quinta, Westar and No2127 were used in this study. De novo genome assembly was performed mainly using the PacBio SMRT long reads. Subreads polishing and contigs assembly was primarily carried out using Falcon (falcon-2017.11.02–16.04-py2.7) with length_cutoff_pr = 6,000. They additionally configured pa_HPCdaligner_option = -v -B128 -t32 -e.75 -h480 -l3200 -w8 -T8, ovlp_HPCdaligner_option = -v -B128 -t32 -e.96 -l2500 -T8, falcon_sense_option = –output_multi–min_idt 0.70–min_cov 3–max_n_read 300, overlap_filtering_setting = –max_diff 110–max_cov 165–min_cov 3–bestn 10 with parameters optimized for eight B. napus genomes assembly. The subreads were assembled using Canu v.1.6 after Falcon polishing with correctedErrorRate = 0.05. They mapped PacBio sequencing reads to the draft contigs acquired by Canu and Falcon using pbalign and polished the resulting contigs using Quiver with arrow as algorithm. On this basis, contigs were polished using Illumina PE reads (insertion size = 350 bp) and pilon 1.18. For the polished contigs, the unique sequences in Canu assembly while not being contained in Falcon assembly were merged to obtain final contigs.

Pseudo-chromosome construction

Pseudo-chromosome was constructed with Hi-C data using the 3D-DNA pipeline. The Hi-C reads were aligned to the polished contigs using the Juicer pipeline. The 3D-DNA pipeline was run with the following parameters: -i 1 -r 5. The results were polished using the Juicebox Assembly Tools. The Hi-C scaffolding resulted in 19 chromosome-length scaffolds. The scaffolds nomenclature was adopted for the chromosome numbering on the basis of their collinearity with 19 chromosomes of Darmor-bzh genome.

pan-reference genome construction

The potential PAV sequences of seven genomes relative to reference genome ZS11 were identified using show-diff in Mummer (v.3.23). First, sequences that intersected with the gap region in the respective genome were excluded. On the other hand, sequence with feature type ‘BRK’ was filtered out, which was considered to be non-reference sequence which aligned to the gap-start or gap-end bounder. To identify the true respective unique sequences, the candidate PAV sequence was mapped to the ZS11 genome with parameter setting ‘-x asm10’ using minimap2 and the sequence covering >80% was filtered out to obtain the final PAV region. The gene having >80% overlap with PAV region was considered to be a PAV-related gene. Further, they used BWA-MEM to align Illumina reads of ZS11 to seven genomes to rule out the effects of false positives and filtered out genes covering >50% of the genes to obtain the final PAV genes. We stepwise added the PAV sequence and PAV genes with the order ZS11, Gangan, Zheyou7, Shengli, Tapidor, Westar, No2127 and Darmor to the current genome to construct a pan-reference genome.

Reference

Song, J., Guan, Z., Hu, J. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napusNat. Plants 6, 34–45 (2020). https://doi.org/10.1038/s41477-019-0577-7

Additionally, access to the eight B. napus genomes and pan-genome are provided by BnIR.

Download

All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar link. Each data type page will provide a description of the available files and links to download. Alternatively, you can browse all available files on the HTTP download repository or here.

Assembly

The Brassica napus eight genomes v201810 and pan-genome assembly files:

Downloads

B. napus cv. ZS11 assembly zs11.genome.fa
B. napus cv. Zheyou7 assembly zheyou73.genome.fa
B. napus cv. Gangan assembly ganganF73.genome.fa
B. napus cv. Shengli assembly shengli3.genome.fa
B. napus cv. No2127 assembly no2127.genome.fa
B. napus cv. Westar assembly westar.genome.fa
B. napus cv. Quinta assembly quintaA.genome.fa
B. napus cv. Tapidor assembly tapidor3.genome.fa
B. napus pan-genome assembly panrefgenome.fa
Gene Predictions

The Brassica napus eight genomes v201810 and pan-genome v0 gene prediction files:

Downloads

B. napus cv. ZS11 predicted genes (GFF3 file) zs11.v0.gff3
B. napus cv. ZS11 CDS sequences (FASTA file) zs11.all.v0.cds
B. napus cv. ZS11 protein sequences (FASTA file) zs11.all.v0.pep
B. napus cv. Zheyou7 predicted genes (GFF3 file) zheyou73.v0.gff3
B. napus cv. Zheyou7 CDS sequences (FASTA file) zheyou73.all.v0.cds
B. napus cv. Zheyou7 protein sequences (FASTA file) zheyou73.all.v0.pep
B. napus cv. Gangan predicted genes (GFF3 file) ganganF73.v0.gff3
B. napus cv. Gangan CDS sequences (FASTA file) ganganF73.all.v0.cds
B. napus cv. Gangan protein sequences (FASTA file) ganganF73.all.v0.pep
B. napus cv. Shengli predicted genes (GFF3 file) shengli3.v0.gff3
B. napus cv. Shengli CDS sequences (FASTA file) shengli3.all.v0.cds
B. napus cv. Shengli protein sequences (FASTA file) shengli3.all.v0.pep
B. napus cv. No2127 predicted genes (GFF3 file) no2127.v0.gff3
B. napus cv. No2127 CDS sequences (FASTA file) no2127.all.v0.cds
B. napus cv. No2127 protein sequences (FASTA file) no2127.all.v0.pep
B. napus cv. Westar predicted genes (GFF3 file) westar.v0.gff3
B. napus cv. Westar CDS sequences (FASTA file) westar.all.v0.cds
B. napus cv. Westar protein sequences (FASTA file) westar.all.v0.pep
B. napus cv. Quinta predicted genes (GFF3 file) quintaA.v0.gff3
B. napus cv. Quinta CDS sequences (FASTA file) quintaA.all.v0.cds
B. napus cv. Quinta protein sequences (FASTA file) quintaA.all.v0.pep
B. napus cv. Tapidor predicted genes (GFF3 file) tapidor3.v0.gff3
B. napus cv. Tapidor CDS sequences (FASTA file) tapidor3.all.v0.cds
B. napus cv. Tapidor protein sequences (FASTA file) tapidor3.all.v0.pep
B. napus pan-genome v0 predicted genes (GFF3 file) panrefgenome.gff3
Functional Analysis

The Brassica napus eight genomes v201810 functional annotation files:

Downloads

B. napus cv. ZS11 functional annotation zs11.geneinformation.txt
B. napus cv. Zheyou7 functional annotation zheyou73.geneinformation.txt
B. napus cv. Gangan functional annotation ganganF73.geneinformation.txt
B. napus cv. Shengli functional annotation shengli3.geneinformation.txt
B. napus cv. No2127 functional annotation no2127.geneinformation.txt
B. napus cv. Westar functional annotation westar.geneinformation.txt
B. napus cv. Quinta functional annotation quintaA.geneinformation.txt
B. napus cv. Tapidor functional annotation tapidor3.geneinformation.txt