Download lecture _07_15_new

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

MicroRNA wikipedia , lookup

Transposable element wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

RNA silencing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA interference wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

X-inactivation wikipedia , lookup

Essential gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene therapy wikipedia , lookup

NEDD9 wikipedia , lookup

Oncogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene desert wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Minimal genome wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Ridge (biology) wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Gene expression
Gene Expression
DNA
RNA
protein
2
Gene Expression
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
mRNA gene1
mRNA gene2
mRNA gene3
3
Studying Gene Expression
1987-2013
cDNA Microarrays (first high throughput gene expression experiments)
DNA chips (High density oligonucleotide microarrays )
RNA-seq (High throughput sequencing)
4
Classical versus modern technologies
to study gene expression
Classical Methods (Microarrays)
-Require prior knowledge on the RNA transcript
Good for studying the expression of known genes
High throughput RNA sequencing
- Do not require prior knowledge
Good for discovering new transcripts
5
RNA-seq
6
What can we learn from RNAseq?
- Comparing the
expression between
two genes in the
same sample
- Comparing the
expression between
the same gene in
different samples
7
What can we learn from RNAseq?
Comparing the expression between two genes
in the same sample
PROBLEM :
* Genes of different length are expected to have different
number of reads
* The coverage is strongly dependent on the sequencing depth
8
What can we learn from RNAseq?
Possible solution: Normalizing by transcript length and the
total number of reads mapped in the experiment
RPKM =
9
Problems with Normalization
Gene B> Gene A > Gene C
Gene A> Gene B > Gene C
Warning !!! normalization by total number of reads can lead to
false detection of differentially expressed genes
10
What can we learn from RNAseq?
Comparing the expression between the same gene
in different samples
Example : Finding new markers for pluripotency
)‫(תאי גזע עובריים‬
)‫(תאים ממוינים‬
Highly Expressed
Lowly Expressed
What can we learn from RNAseq?
Comparing the expression between the same gene
in different samples
Sample X (Stem cell)
Sample Y (Fibroblasts)
Fold change (FC) = Ratio between the expression of the gene in sample
X to the expression of the gene in sample Y
Is fold change enough to evaluate the difference?
Remember: We always need to evaluate the statistical
significance of the results
Standard measure = q-value
(which is the p-value corrected for multiple testing)
Finding new markers for pluripotency
Possible candidates
for being pluripotent
markers
Expression in stem cells versus fibroblasts
13
NEXT…
Clustering the data according to expression profiles
Genes
.
Expression in different conditions
Highly Expressed
Lowly Expressed
14
WHY?
What can we learn from the
clusterers?
• Diagnostics and Therapy
– A set of genes which differs in the gene expression
can indicate a disease state
• Identify gene function
– Set of genes with similar gene expression can infer
similar function
15
A molecular signature of metastasis
in primary solid tumors
Samples were taken from
patients with adenocarcinoma.
hundreds of genes
that differentiate between
cancer tissues in different
stages of the tumor were
found.
The arrow shows an example
of a tumor cells which
were not detected correctly by
histological or other
clinical parameters.
Ramaswamy et al, 2003 Nat Genet 33:49-54
16
HOW?
Different clustering approaches
• Unsupervised
- Hierarchical Clustering
- K-means
• Supervised Methods )‫(למידה מונחית‬
-Support Vector Machine (SVM)
17
Clustering
Clustering organizes things that are close into groups.
- What does it mean for two genes to be close?
- Once we know this, how do we define groups?
What does it mean for two genes
to be close?
We need a mathematical definition of distance between the
expression pattern of two genes
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22
Gene 1
Gene 2
Gene1= (E11, E12, …, E1N)’
Gene2= (E21, E22, …, E2N)’
19
Calculating the distance between
two expression patterns
We can use many different distance measures
Gene1= (E11, E12, …, E1N)’
Gene2= (E21, E22, …, E2N)’
Euclidean distance (ED)= Sqrt of Sum of (E1i -E2i)2, i=1,…,N
X1,Y1
Distance
X2,Y2
When N is 100 we have to think abstractly
Low Euclidean Distance
High similarity
20
Calculating the distance between two
expression patterns
Pearson correlation
coefficient
High correlation coefficient
High similarity
21
Distance and correlations can produce
very different results
1400
1200
Counts
1000
800
600
400
200
0
Euclidian distance= 1740
Low similarity
Pearson correlation= 0.9
High similarity
22
Clustering the genes according to expression
Hierarchical Clustering
Generate a tree based on the distances between genes
(similar to a phylogenetic tree)
Each gene is a leaf on the tree
Distances reflect the similarity of their expression pattern
Genes
Gene Cluster
Expression in different conditions
23
Clustering the genes according to gene expression
Genes
GENE
GENE
GENE
GENE
a
b
c
d
Distance Table
1, -1, 1, 1, 1,-1,-1,-1
1, 1, -1, 1, 1, 1,-1, 1
1, -1, 1, -1, 1,-1,-1,-1
-1, 1, -1, 1, 1, 1,-1,-1
Distances (Euclidian distance)*
Dab = 4
Dac = 2
Dad = 4
Dbc = 4.47
Dbd = 2.82
Dcd = 4.47
• Can be calculated using
different distance metrics
a
b
c
d
a
b
c
0
4
2
4
4
0
4.47 2.82
2
4.47 0
d
4
2.82 4.47 0
4.47
24
Analyzing the clusters of genes
Cluster 2
Cluster 3
Cluster 4
25
What can we learn from clusters
with similar gene expression ??
Similar expression between genes
-The genes have similar function
-The genes work together in the same pathway /complex
-All genes are controlled by a common regulatory genes
26
What can we learn from clusters
with similar gene expression ??
Similar expression between genes
-The genes have similar function
-The genes work together in the same pathway /complex
-All genes are controlled by a common regulatory genes
27
Example: Identifying genes that have similar function
Pancreas
bonemarrow
WHOLEBLO…
adrenalgland
Ovary
Uterus
Prostate
testis
Heart
Lung
Liver
SkeletalMus…
SmoothMuscle
salivarygland
skin
Thyroid
Tonsil
trachea
kidney
WholeBrain
Pancreas
bonemarrow
WHOLEBLOOD
adrenalgland
Ovary
Uterus
Prostate
testis
Heart
Lung
Liver
SkeletalMuscle
SmoothMuscle
salivarygland
skin
Thyroid
Tonsil
trachea
kidney
WholeBrain
4000
3500
3000
2500
2000
1500
1000
500
0
HNRPA1
SRp40
28
HnRNPA1 and SRp40 are not clear homologs based on
blast e-value but have a very similar gene expression
pattern in different tissues
Are hnRNP A1 and SRp40 functionally homologs ??
hnRNP A1
SF
SF
SF
SF
SF
SF
SF
SF
SF
SF
SF
SF
SRP40
YES!!!!
29
What can we learn from clusters
with similar gene expression ??
• Similar expression between genes
– The genes have similar function
– The genes work together in the same pathway /complex
– All genes are controlled by a common regulatory genes
30
Example: Genes work together in the same complex
1400
1200
Counts
1000
800
600
400
200
0
Transcription Factor
Long non-coding RNA
TF
31
How can gene expression help in
diagnostics?
32
Genes
How can gene-expression help in diagnostics ?
Different patients (BRCA1 or BRCA2)
RESEARCH QUESTION
Can we distinguish BRCA1 from BRCA2– cancers based solely on their
gene expression profiles?
HERE we want to cluster the patients not the genes !!!
How can gene expression be applied for diagnostic ?
5 Breast Cancer Patient
Patient patient
1
2
Gen1
Gen2
Gen3
Gen4
Gen5
+
+
+
-
+
+
+
-
patient
3
patient
4
patient
5
+
+
+
+
+
+
-
+
+
34
How can gene expression be applied for diagnostic ?
BRCA1
Gen1
Gen3
Gen4
Gen2
Gen5
BRCA2
patinet
1
patient
2
patient
4
patient
3
patient
5
+
+
+
-
+
+
+
-
+
+
+
-
+
+
+
+
+
Two-Way clustering = clustering the patients and genes
35
How can gene expression be applied for diagnostic ?
BRCA1
Gen1
Gen3
Gen4
Gen2
Gen5
BRCA2
patinet
1
patient
2
patient
4
patient
3
patient
5
+
+
+
-
+
+
+
-
+
+
+
-
+
+
+
+
+
Informative
Genes
Two-Way clustering = clustering the patients and genes
36
Supervised approaches
for diagnostic based on expression data
Support Vector Machine
SVM
• SVM would begin with a set of samples from
patients which have been diagnosed as either
BRCA1 (red dots) or BRCA2 (blue dots).
Each dot represents a vector of the expression pattern
taken from the microarray experiment of a patient.
How do SVM’s work with expression data?
The SVM is trained on data which was classified based on
histology.
?
After training the SVM to separated the BRCA1 from BRAC2 tumors
given the expression data, we can then apply it to diagnose an
unknown tumor for which we have the equivalent expression data . 39
Projects 2015-16
Instructions for the final project
Introduction to Bioinformatics 2013-14
Key dates
7.12 lists of suggested projects published *
*You are highly encouraged to choose a project yourself or find
a relevant project which can help in your research
3.1 Final date to chose a project
10.1 Submission project overview (one page)
-Title
-Main question
-Major Tools you are planning to use to answer the questions
11.1 /18.1– meetings on projects
9.3 Poster submission
16.3 Poster presentation
2. Planning your research
After you have described the main question or questions of your
project, you should carefully plan your next steps
A. Make sure you understand the problem and read the necessary
background to proceed
B. formulate your working plan, step by step
C. After you have a plan, start from extracting the necessary data and
decide on the relevant tools to use at the first step.
When running a tool make sure to summarize the results and extract
the relevant information you need to answer your question, it is
recommended to save the raw data for your records , don't present
raw data in your final project.
Your initial results should guide you towards your next steps.
D. When you feel you explored all tools you can apply to answer your
question you should summarize and get to conclusions. Remember NO
is also an answer as long as you are sure it is NO. Also remember this is
a course project not only a HW exercise.
.
3. Summarizing final project in a poster (in pairs)
Prepare in PPT poster size 90-120 cm
Title of the project
Names and affiliation of the students presenting
The poster should include 5 sections :
Background should include description of your question (can add
figure)
Goal and Research Plan:
Describe the main objective and the research plan
Results (main section) : Present your results in 3-4 figures, describe
each figure (figure legends) and give a title to each result
Conclusions : summarized in points the conclusions of your project
References : List the references of paper/databases/tools used for
your project
Examples of posters will be presented in class