Download James Hutton Institute Presentation Template

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transcriptional regulation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene desert wikipedia , lookup

Ridge (biology) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Gene regulatory network wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Community fingerprinting wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Molecular evolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Exome sequencing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
De novo Genome Sequencing
and Gene Prediction in Lolium
perenne, Perennial Ryegrass
Ewan Mollison
The James Hutton Institute
31st International EUCARPIA Symposium Section Fodder Crops and
Amenity Grasses: BREEDING IN A WORLD OF SCARCITY
13 – 17 September 2015. Ghent, Belgium
Methods
Source plant material
 Inbred and partially inbred Lolium perenne lines
Genome sequencing strategy
 207x by Illumina sequencing of PE, MP and LJD libraries; reduced to 105x
 Assembled using CLC Bio with k-mer length 41; scaffolded with SSPACE
Estimate of gene-space coverage
 CEGMA pipeline used to identify coverage of highly conserved genes
Gene prediction
 Ab initio gene prediction using Augustus with wheat-based model
 22 RNA-Seq experiments aligned to Lolium assembly using Tuxedo pipeline
Results
Assembly (genomic scaffolds)
Total length (Gbp)
% GC
No. scaffolds
N50
Max. scaffold (bp)
Scaffolds >= N50
1.11
44.16
424,745
25,193
274,411
10,875
CEGMA coverage estimate
239/248 (96.37%) complete coverage
246/248 (99.19%) complete or partial coverage
Gene prediction
RNA-Seq
Genomic
Mt.
Predicted genes
67,706
109
Predicted transcripts
111,464
109
Scaffolds with predictions
33,212
3
Genes / kb *
0.051
0.209
Mt. = mitochondrial; Ch. = chloroplast
* Genes / kb gene-containing scaffolds
Ch.
12
18
2
0.095
Augustus
Genomic
Mt.
188,822
20
n/a
n/a
59,900
3
0.23
0.038
Ch.
0
n/a
0
0
Discussion & conclusion
Assembly and coverage of gene-space
 Around 40% of the expected genome size has been captured by assembly
 CEGMA analysis indicates a good level of coverage of the gene-space has
been achieved
Overlapping predictions with transcripts
 44,252 predictions from genomic scaffolds and 3 from mitochondrial have
supporting evidence from RNA-Seq, based on reciprocal overlap of 20%
using BEDTools intersect
Acknowledgements
This work is funded as part of a Teagasc Walsh Fellowship PhD studentship