Brassica napus Eight Genomes v201810 and pan-genome v1 Assembly & Annotation
Analysis Name | Brassica napus Eight Genomes v201810 and pan-genome v1 Assembly & Annotation |
---|---|
Method | Falcon (falcon-2017.11.02–16.04-py2.7) and Canu (v1.6) |
Source | A combination of PacBio sequencing, Illumina paired-end short read sequencing and Hi-C technologies |
Date Performed | Tuesday, June 4, 2019 |
Genome assembly
Eight rapeseed accessions of three ecotypes named ZS11, Gangan, Zheyou7, Shengli, Tapidor, Quinta, Westar and No2127 were used in this study. De novo genome assembly was performed mainly using the PacBio SMRT long reads. Subreads polishing and contigs assembly was primarily carried out using Falcon (falcon-2017.11.02–16.04-py2.7) with length_cutoff_pr = 6,000. They additionally configured pa_HPCdaligner_option = -v -B128 -t32 -e.75 -h480 -l3200 -w8 -T8, ovlp_HPCdaligner_option = -v -B128 -t32 -e.96 -l2500 -T8, falcon_sense_option = –output_multi–min_idt 0.70–min_cov 3–max_n_read 300, overlap_filtering_setting = –max_diff 110–max_cov 165–min_cov 3–bestn 10 with parameters optimized for eight B. napus genomes assembly. The subreads were assembled using Canu v.1.6 after Falcon polishing with correctedErrorRate = 0.05. They mapped PacBio sequencing reads to the draft contigs acquired by Canu and Falcon using pbalign and polished the resulting contigs using Quiver with arrow as algorithm. On this basis, contigs were polished using Illumina PE reads (insertion size = 350 bp) and pilon 1.18. For the polished contigs, the unique sequences in Canu assembly while not being contained in Falcon assembly were merged to obtain final contigs.
Pseudo-chromosome construction
Pseudo-chromosome was constructed with Hi-C data using the 3D-DNA pipeline. The Hi-C reads were aligned to the polished contigs using the Juicer pipeline. The 3D-DNA pipeline was run with the following parameters: -i 1 -r 5. The results were polished using the Juicebox Assembly Tools. The Hi-C scaffolding resulted in 19 chromosome-length scaffolds. The scaffolds nomenclature was adopted for the chromosome numbering on the basis of their collinearity with 19 chromosomes of Darmor-bzh genome.
pan-reference genome construction
The potential PAV sequences of seven genomes relative to reference genome ZS11 were identified using show-diff in Mummer (v.3.23). First, sequences that intersected with the gap region in the respective genome were excluded. On the other hand, sequence with feature type ‘BRK’ was filtered out, which was considered to be non-reference sequence which aligned to the gap-start or gap-end bounder. To identify the true respective unique sequences, the candidate PAV sequence was mapped to the ZS11 genome with parameter setting ‘-x asm10’ using minimap2 and the sequence covering >80% was filtered out to obtain the final PAV region. The gene having >80% overlap with PAV region was considered to be a PAV-related gene. Further, they used BWA-MEM to align Illumina reads of ZS11 to seven genomes to rule out the effects of false positives and filtered out genes covering >50% of the genes to obtain the final PAV genes. We stepwise added the PAV sequence and PAV genes with the order ZS11, Gangan, Zheyou7, Shengli, Tapidor, Westar, No2127 and Darmor to the current genome to construct a pan-reference genome.
Reference
Song, J., Guan, Z., Hu, J. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 6, 34–45 (2020). https://doi.org/10.1038/s41477-019-0577-7
Additionally, access to the eight B. napus genomes and pan-genome are provided by BnIR.