* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lecture _07_15_new
Transposable element wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
RNA silencing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
RNA interference wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Essential gene wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Gene therapy wikipedia , lookup
Oncogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene nomenclature wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene desert wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Minimal genome wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Ridge (biology) wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression Gene Expression DNA RNA protein 2 Gene Expression AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA mRNA gene1 mRNA gene2 mRNA gene3 3 Studying Gene Expression 1987-2013 cDNA Microarrays (first high throughput gene expression experiments) DNA chips (High density oligonucleotide microarrays ) RNA-seq (High throughput sequencing) 4 Classical versus modern technologies to study gene expression Classical Methods (Microarrays) -Require prior knowledge on the RNA transcript Good for studying the expression of known genes High throughput RNA sequencing - Do not require prior knowledge Good for discovering new transcripts 5 RNA-seq 6 What can we learn from RNAseq? - Comparing the expression between two genes in the same sample - Comparing the expression between the same gene in different samples 7 What can we learn from RNAseq? Comparing the expression between two genes in the same sample PROBLEM : * Genes of different length are expected to have different number of reads * The coverage is strongly dependent on the sequencing depth 8 What can we learn from RNAseq? Possible solution: Normalizing by transcript length and the total number of reads mapped in the experiment RPKM = 9 Problems with Normalization Gene B> Gene A > Gene C Gene A> Gene B > Gene C Warning !!! normalization by total number of reads can lead to false detection of differentially expressed genes 10 What can we learn from RNAseq? Comparing the expression between the same gene in different samples Example : Finding new markers for pluripotency )(תאי גזע עובריים )(תאים ממוינים Highly Expressed Lowly Expressed What can we learn from RNAseq? Comparing the expression between the same gene in different samples Sample X (Stem cell) Sample Y (Fibroblasts) Fold change (FC) = Ratio between the expression of the gene in sample X to the expression of the gene in sample Y Is fold change enough to evaluate the difference? Remember: We always need to evaluate the statistical significance of the results Standard measure = q-value (which is the p-value corrected for multiple testing) Finding new markers for pluripotency Possible candidates for being pluripotent markers Expression in stem cells versus fibroblasts 13 NEXT… Clustering the data according to expression profiles Genes . Expression in different conditions Highly Expressed Lowly Expressed 14 WHY? What can we learn from the clusterers? • Diagnostics and Therapy – A set of genes which differs in the gene expression can indicate a disease state • Identify gene function – Set of genes with similar gene expression can infer similar function 15 A molecular signature of metastasis in primary solid tumors Samples were taken from patients with adenocarcinoma. hundreds of genes that differentiate between cancer tissues in different stages of the tumor were found. The arrow shows an example of a tumor cells which were not detected correctly by histological or other clinical parameters. Ramaswamy et al, 2003 Nat Genet 33:49-54 16 HOW? Different clustering approaches • Unsupervised - Hierarchical Clustering - K-means • Supervised Methods )(למידה מונחית -Support Vector Machine (SVM) 17 Clustering Clustering organizes things that are close into groups. - What does it mean for two genes to be close? - Once we know this, how do we define groups? What does it mean for two genes to be close? We need a mathematical definition of distance between the expression pattern of two genes 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Gene 1 Gene 2 Gene1= (E11, E12, …, E1N)’ Gene2= (E21, E22, …, E2N)’ 19 Calculating the distance between two expression patterns We can use many different distance measures Gene1= (E11, E12, …, E1N)’ Gene2= (E21, E22, …, E2N)’ Euclidean distance (ED)= Sqrt of Sum of (E1i -E2i)2, i=1,…,N X1,Y1 Distance X2,Y2 When N is 100 we have to think abstractly Low Euclidean Distance High similarity 20 Calculating the distance between two expression patterns Pearson correlation coefficient High correlation coefficient High similarity 21 Distance and correlations can produce very different results 1400 1200 Counts 1000 800 600 400 200 0 Euclidian distance= 1740 Low similarity Pearson correlation= 0.9 High similarity 22 Clustering the genes according to expression Hierarchical Clustering Generate a tree based on the distances between genes (similar to a phylogenetic tree) Each gene is a leaf on the tree Distances reflect the similarity of their expression pattern Genes Gene Cluster Expression in different conditions 23 Clustering the genes according to gene expression Genes GENE GENE GENE GENE a b c d Distance Table 1, -1, 1, 1, 1,-1,-1,-1 1, 1, -1, 1, 1, 1,-1, 1 1, -1, 1, -1, 1,-1,-1,-1 -1, 1, -1, 1, 1, 1,-1,-1 Distances (Euclidian distance)* Dab = 4 Dac = 2 Dad = 4 Dbc = 4.47 Dbd = 2.82 Dcd = 4.47 • Can be calculated using different distance metrics a b c d a b c 0 4 2 4 4 0 4.47 2.82 2 4.47 0 d 4 2.82 4.47 0 4.47 24 Analyzing the clusters of genes Cluster 2 Cluster 3 Cluster 4 25 What can we learn from clusters with similar gene expression ?? Similar expression between genes -The genes have similar function -The genes work together in the same pathway /complex -All genes are controlled by a common regulatory genes 26 What can we learn from clusters with similar gene expression ?? Similar expression between genes -The genes have similar function -The genes work together in the same pathway /complex -All genes are controlled by a common regulatory genes 27 Example: Identifying genes that have similar function Pancreas bonemarrow WHOLEBLO… adrenalgland Ovary Uterus Prostate testis Heart Lung Liver SkeletalMus… SmoothMuscle salivarygland skin Thyroid Tonsil trachea kidney WholeBrain Pancreas bonemarrow WHOLEBLOOD adrenalgland Ovary Uterus Prostate testis Heart Lung Liver SkeletalMuscle SmoothMuscle salivarygland skin Thyroid Tonsil trachea kidney WholeBrain 4000 3500 3000 2500 2000 1500 1000 500 0 HNRPA1 SRp40 28 HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues Are hnRNP A1 and SRp40 functionally homologs ?? hnRNP A1 SF SF SF SF SF SF SF SF SF SF SF SF SRP40 YES!!!! 29 What can we learn from clusters with similar gene expression ?? • Similar expression between genes – The genes have similar function – The genes work together in the same pathway /complex – All genes are controlled by a common regulatory genes 30 Example: Genes work together in the same complex 1400 1200 Counts 1000 800 600 400 200 0 Transcription Factor Long non-coding RNA TF 31 How can gene expression help in diagnostics? 32 Genes How can gene-expression help in diagnostics ? Different patients (BRCA1 or BRCA2) RESEARCH QUESTION Can we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles? HERE we want to cluster the patients not the genes !!! How can gene expression be applied for diagnostic ? 5 Breast Cancer Patient Patient patient 1 2 Gen1 Gen2 Gen3 Gen4 Gen5 + + + - + + + - patient 3 patient 4 patient 5 + + + + + + - + + 34 How can gene expression be applied for diagnostic ? BRCA1 Gen1 Gen3 Gen4 Gen2 Gen5 BRCA2 patinet 1 patient 2 patient 4 patient 3 patient 5 + + + - + + + - + + + - + + + + + Two-Way clustering = clustering the patients and genes 35 How can gene expression be applied for diagnostic ? BRCA1 Gen1 Gen3 Gen4 Gen2 Gen5 BRCA2 patinet 1 patient 2 patient 4 patient 3 patient 5 + + + - + + + - + + + - + + + + + Informative Genes Two-Way clustering = clustering the patients and genes 36 Supervised approaches for diagnostic based on expression data Support Vector Machine SVM • SVM would begin with a set of samples from patients which have been diagnosed as either BRCA1 (red dots) or BRCA2 (blue dots). Each dot represents a vector of the expression pattern taken from the microarray experiment of a patient. How do SVM’s work with expression data? The SVM is trained on data which was classified based on histology. ? After training the SVM to separated the BRCA1 from BRAC2 tumors given the expression data, we can then apply it to diagnose an unknown tumor for which we have the equivalent expression data . 39 Projects 2015-16 Instructions for the final project Introduction to Bioinformatics 2013-14 Key dates 7.12 lists of suggested projects published * *You are highly encouraged to choose a project yourself or find a relevant project which can help in your research 3.1 Final date to chose a project 10.1 Submission project overview (one page) -Title -Main question -Major Tools you are planning to use to answer the questions 11.1 /18.1– meetings on projects 9.3 Poster submission 16.3 Poster presentation 2. Planning your research After you have described the main question or questions of your project, you should carefully plan your next steps A. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by step C. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records , don't present raw data in your final project. Your initial results should guide you towards your next steps. D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise. . 3. Summarizing final project in a poster (in pairs) Prepare in PPT poster size 90-120 cm Title of the project Names and affiliation of the students presenting The poster should include 5 sections : Background should include description of your question (can add figure) Goal and Research Plan: Describe the main objective and the research plan Results (main section) : Present your results in 3-4 figures, describe each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your project References : List the references of paper/databases/tools used for your project Examples of posters will be presented in class