Download BiGCaT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Transposable element wikipedia , lookup

Gene desert wikipedia , lookup

Human genome wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

X-inactivation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

NEDD9 wikipedia , lookup

Oncogenomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Essential gene wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene wikipedia , lookup

Gene expression programming wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome (book) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome evolution wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Around the triangle
arrays
Chris Evelo
BiGCaT Bioinformatics
Maastricht May 19 2004
paths
QTLs
Involve information about
chromosome locations of traits in
expression analyses
Around the triangle
How to combine
expression data
arrays
with
known pathways
and
known quantitative trade
loci from congenic animals
paths
QTLs
From arrays to pathways
Gene expression mapping
Like what was shown in
the previous talk.
arrays
Annotate the genes
Filter array data,
normalize, filter and set a
change criterion
paths
QTLs
From arrays to QTLs
We need to get all the
genes from the QTLs
arrays
To create a QTL map
To annotate the map
backpage
And to map real
expression changes
paths
QTLs
Get all QTL genes
example blood pressure QTLs
From Ensembl (http://www.ensembl.org)
• Using Ensmart to retrieve:
• QTL range
• gene (all exon) sequence
• or all available gene ID’s
• Or use direct SQL queries to ENSEMBL database
From RGD (http://rgd.mcw.edu/)
• Retrieve QTL annotation
The high blood pressure QTLs
60
20
114
111
2
46
47
54
c15
c14
75 23
Those QTLs span
almost half the
genome!
17
32
40
8
98
c13
126c12
15
59
chromosomes
121
25
55
31
80
83
c10
104
12
571 c1
c9
10
113
53
133
82
34
22
71136 134
76
45
137
72
108
35 39
62
125
102
38 61
141
c7
119
139
5
79
11285
15
c4
21
110 118
115
122
37
c2a
116
52
123
20
81
13 99
93
64
51
73
c2b
95
65
103131
100
86 135 124
97
29 117
94
44138
89 77
88 26
101
178
74 5016
63
30
143
109
42
56
27
5 10
7
1 10
8
1.5 10
8
2 10
8
2.5 10
8
Filter QTLs
For overlapping QTLs: take the
smaller one
Selected QTLs
Basepairs
Use Mathematica procedure to proces QTL
locations and overlaps
Filtered high blood pressure QTLs
60
20
111
17
2
47
This might be the
really interesting
regions
75 23
126
15
chromosomes
121
25
31
80
c10
12
1
133
136 134
45
72
10
113
108
39
125
102
61
141
c7
119
139
100
5
135
11215
110
115
118
124
116
37
123
c2a
51
13
73
95
65
81
93
117
44
77
74
50 16
1
30
64
27
5 10
7
1 10
8
1.5 10
8
2 10
8
109
Create QTL Mapps
and map expression results
Example QTL1a
With a number of
(slightly) upregulated
genes
Initial array results
Loosing too many genes
• 15908 reporters on two arrays
• 784 with interesting regulation (>1.4 fold)
• only 127 with known Unigene ID’s
• only 63 linked to chromosomes
• 9 located within the QTL’s
How to improve the mapping?
Work in progress
• Create a BLAST database from ENSEMBL QTL
genes (use full gene and exon only)
• BLAST (or BLAT) reporter clone sequences
• Select good hits
• Combine the two sets
• Modify the QTL mapp backpages to contain
reporter IDs
• We expect to find > 60 % in the genome (that is
a 400% increase)
• And thus about 40 in the QTLs
Around the triangle
can we understand the QTLSs?
Get all QTL genes
arrays
Annotate them (with
SwissProt or trEMBL
IDs).
Assume in silico
expression of all
genes
Perform standard
mapping
paths
QTLs
Bad annotation again!
• Only a small fraction of ENSEMBL genes has
Swissprot/trEMBL annotation (or other that
can be crosslinked).
• So we need to reannotate the genes.
• Separate annotation project uses double Swall Xlinked trEMBL subdatabase.
• Still needs to be combined
Current QTL genes spread out
• Lots of genes in Mapps
• But… Most Mapps contain just a few QTL
genes
• Impossible to find most important Mapps
(except by expert knowledge)
Temporary Solution: double selection
Get pathways with many
regulated genes
arrays
Select those that also
contain QTL genes
Yields:
22 GO, 4 local Mapps
Among those:
TGFβ signaling &
Wnt signaling
paths
QTLs
Acknowledgements
• Yigal Pinto and Umesh Sharma for high blood pressure rat
array data
• Incyte Genomics for (what still is) the best microarray
platform ever
• BMT TUe MDP project students: Greetje Groenendaal, Gijs
Huisman, Sanne Reulen, Gijs Snieders, Marloes Damen,
Freek van Dooren, Thijs Hendrix and Thomas Kelder
• Stan Gaj for data mining
• Willem Ligtenberg and Joris Korbeeck for generating BLAST
databases and BLAST parser scripts
• Andra Waagmeester for SQL queries
• Rachel van Haaften for advices on mapping
• Edwin ter Voert for allowing us to think about problems
instead of computers