Download TRPGR: Sequencing the barley gene-space

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene nomenclature wikipedia , lookup

Human genetic variation wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Oncogenomics wikipedia , lookup

NUMT wikipedia , lookup

Gene expression programming wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

DNA sequencing wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene wikipedia , lookup

Transposable element wikipedia , lookup

Copy-number variation wikipedia , lookup

ENCODE wikipedia , lookup

Gene desert wikipedia , lookup

Microevolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Public health genomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human genome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Exome sequencing wikipedia , lookup

Pathogenomics wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomic library wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Metagenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Genomics wikipedia , lookup

Transcript
Chan et al., Barley sequencing
TRPGR: Sequencing the barley gene-space
PI: Agnes P. Chan (The Institute for Genomic Research, Rockville, MD).
Co-PIs: Timothy J. Close (UC, Riverside, CA); Stefano Lonardi (UC, Riverside, CA);
Gary J. Muehlbauer (UMN, St. Paul, MN); Roger Wise (USDA-ARS/ISU, Ames, IA).
Senior Personnel: Pablo D. Rabinowicz (The Institute for Genomic Research, Rockville, MD).
Service providers: Jeffrey Bennetzen (UGA, Athens, GA); Ming Cheng Luo (UC, Davis, CA).
PROJECT DESCRIPTION
RELEVANCE AND JUSTIFICATION
The grain crops in the Triticeae tribe, barley (Hordeum vulgare L.), common wheat (Triticum aestivum),
and rye (Secale cereale), are cultivated on 66 million acres in the United States with an average annual
value
of
$8
billion
(USDA-National
Agricultural
Statistics
Service,
http://www.usda.gov/nass/pubs/agr05/05_ch1.PDF). Barley is one of the major grains used in the food,
feed and beer industries throughout the world. Because barley is a true diploid, it is a natural model for
genetics and genomics for the Triticeae tribe. Highly collaborative national and international efforts have
produced a substantial body of genetic and genomic resources in the past several years, including
extensive structured populations and genetic maps, >460,000 expressed sequence tags (ESTs), the
community-designed Affymetrix 22K Barley1 GeneChip along with >130 Gb of contributed expression
data from >100 treatments, a bacterial artificial chromosome (BAC) library covering 6 genome
equivalents from the US, a second BAC library from an European/Australian effort, and a physical map
representing gene-containing contigs.
Use this:
Mission statement: The objective of the IBSC is to physically map and sequence the barley
gene space, with the near-term need being the identification the remainder of the ~50,000
genes, including the 5’ and 3’ regulatory regions, and the longer-term goal an ordered physical
map linked to the genetic map to accelerate crop improvement.
---------------News:
The International Barley Sequencing Consortium (IBSC), including the US, Germany, UK, Finland,
Australia, and Japan (http://barleygenome.org), was formalized at the 18th annual International Triticeae
Mapping Initiative (ITMI) workshop (Victor Harbor, Australia; August 27 – 31, 2006). Recent initiatives
of the consortium include:
1. 5,000 full length (FL) barley cDNAs from cv. Haruna Nijo (K. Sato, Okayama University, Japan,
nearly complete). Another 30,000 FL cDNAs from the same cDNA pool will be produced at
Tsukuba in the next three years.
2. A new integrated whole genome barley physical map funded by the Leibniz society has been
initiated at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Gatersleben,
Germany and the Australian Centre for Plant Functional Genomics (ACPFG). 350,000 HCIF
fingerprints of cv. Morex will be integrated with a minimal tiling path from the NSF project (PI
Close, Award #0321756). In addition, a pilot physical map of barley chromosome 3 (syntenic with
rice chromosome 1) has been initiated for cv. Haruna Nijo by K. Sato from Okayama University.
1
Chan et al., Barley sequencing
This also will link with French, UK, US, and Australian efforts to physically map and sequence
chromosomes 3A, 3B, and 3D from wheat.
3. 800,000 BAC-end sequences from the Morex physical map and anchoring to the genetic map has
been submitted for funding in the EU by the Scottish Crop research Institute (SCRI), the University
of Udine, and IPK. BAC-end sequencing to be done at the Arizona Genomics Institute (AGI) in the
US.4.
4. 3,000 single nucleotide polymorphisms (SNPs) on two Illumina 1,526 SNP OPAs for high
throughput mapping and genotyping from the USDA-CSREES Barley CAP project (PI Muehlbauer,
Award #2005-05128). These will be combined with the SCRI efforts to reach a total of 5,000 SNPs to
quickly leverage the genomic efforts into plant breeding.
Sequencing of grass genomes:
In addition to the existing rice genome resources and the ongoing maize genome sequencing effort,
genomic resources for other grass species are also being developed thus allowing comparative analysis
within the grass family. The Joint Genome Institute (JGI) from the US Department of Energy (DOE) will
generate ESTs and an 8X whole genome shotgun (WGS) draft of the 500 Mbp genome of Brachypodium
distachyon, a member of the Brachypodieae tribe, which, like the Triticeae, belongs to the Pooideae
subfamily
(http://www.jgi.doe.gov/sequencing/why/CSP2007/brachypodium.html).
Wheat
and
Brachypodium conserved orthologous sequences (COS) using rice as the core genome are being
developed at the John Innes centre (UK). JGI will also carry out a large-scale EST project studying
switchgrass (Panicum virgatum), a member of the Panicoideae subfamily which also includes maize
(http://www.jgi.doe.gov/sequencing/why/CSP2007/switchgrass.html). Thus, sequencing the barley gene
space will not only provide an excellent genomic resource for the Triticeae tribe but also for the grass
family (Poaceae) in general.
------------------------------
In the NSF-funded project led by T. Close, clustered EST sequences were used to design overgos to
hybridize against the barley Morex BAC library to select clones that contain expressed sequences.
Typically, EST sequencing only identifies up to 60% of the genes for a given genome because the
transcription of many genes is highly regulated {Barbazuk, 2005 #25}. Genomic-based gene-targeted
approaches, on the other hand, result in a more comprehensive gene representation than cDNA-based
approaches, reaching 90 to 95% of the total gene content {Barbazuk, 2005 #25; Bedell, 2005 #24;
Martienssen, 2004 #19; Rabinowicz, 2003 #14}. Thus, we propose to apply two gene-enrichment (GE)
methods, methylation filtration (MF) and high C0t (HC) selection to capture the remaining genes,
currently not represented by EST contigs. Experience from maize GE sequencing at TIGR has already
demonstrated that the clustering and assembly of GE reads resulted in many complete gene sequences,
including upstream and downstream regulatory sequences and introns. Preliminary results show that MF
is highly effective when applied to barley and can enrich for gene sequences up to 18-fold when
compared to a random sequence sample from WGS sequencing [5], making it an extremely efficient gene
discovery tool for the barley genome. Application of HC to other cereal genomes including maize and
wheat has shown that HC and MF result in comparable levels of gene-enrichment. In addition, analysis
from the maize GE sequencing project demonstrated that both the MF and HC methods targeted for
distinct but overlapping genic regions and are therefore complementary approaches for capturing the
gene-space from large genomes with a high repeat content, resulting in one of the most successful gene
2
Chan et al., Barley sequencing
discovery efforts in grass genomics. Thus, this barley GE sequencing initiative is the logical next step
in the US commitment to the international effort to physically map and sequence the barley “gene
space”.
Another important preliminary step towards sequencing the large genome of barley is to obtain a glimpse
to the genome structure and how it compares to other related sequenced genomes. As barley is expected
to have a low gene density (approximately 1 gene every 100 kbp), contiguous sequences in the megabase
size range are necessary to be able to perform colinearity analyses that involve several genes per region.
Sequencing of BAC contigs in maize, for example, has provided a further understanding of the structure
and evolution of this large genome and how the genome and genes expanded and contracted relative to
the rice genome (Bruggmann et al 2006 GR, in press).
The proposed project will integrate with and complement the existing barley ESTs to enable researchers
to have access to the entire barley gene set. The relationship between GE sequences and gene expression
will be possible via in silico alignment with the existing Barley1 gene sequences. Furthermore, GE
sequences not represented by ESTs will constitute a source for new overgo probes to identify additional
gene-harboring BACs for integration into the barley physical map. In the long term, these gene-rich BAC
contigs will be the foundation for eventually sequencing the barley genome, and novel genic GE
sequences will be an invaluable resource to design Barley2, the next generation 61,000-probe set
GeneChip. The GE sequences will be linked to genetically mapped markers by alignment to all SNP loci
generated from the USDA-NRI CAP project (Muehlbauer, Close, and Wise) as well as current and future
collaborating international efforts, thus, translating genomics to plant breeding. The barley GE and BAC
contig sequences generated in this project will provide the research community with a substantial amount
of complete genes, which will be annotated with state-of-the-art tools that have been successfully applied
to other plant genome sequencing projects (e.g. Arabidopsis, rice and maize). The sequences will
constitute a comprehensive catalogue of barley genes, which will be used for genome-wide and crossspecies comparative studies. It will provide the broader grass and plant research communities with the
first extensive gene-space sequencing of a member of the Triticeae tribe. The BAC contig sequences will
represent the first megabase-size contiguous draft sequences from barley that will provide the first
glimpse into the architectural landscape of the barley genome, and possible insight into other members of
the Triticeae tribe, synergizing with the other cereal genomes such as rice, maize and sorghum for
comparative genomics studies.
Our multi-disciplinary team will actively participate in undergraduate and graduate education, providing
opportunities for advanced training in genomics and bioinformatics, particularly for under-represented
groups. Thus, these activities will promote research, education, and the dissemination of our results to a
broad audience, while developing a new generation of agricultural scientists.
Deliverables:
1. Gene-enriched (GE) assemblies. The MF/HC sequencing approaches will capture gene sequences
not represented by ESTs, including 5' and 3' flanking regulatory regions and will provide the
community with novel gene sequences in a timely manner. GE assemblies not represented in EST
contigs will be used for overgo design.
2. Improved physical map by inclusion of both EST- and GE-derived gene-containing BACs. This
will enable integration of the US physical map with the new European/Australian physical map.
3. BAC contigs with assembled draft sequences. This will provide a case study for genome
architecture and evolution.
3