* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Non-coding DNA wikipedia , lookup
Biochemical cascade wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Paracrine signalling wikipedia , lookup
Western blot wikipedia , lookup
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
Gene desert wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Epitranscriptome wikipedia , lookup
Transcription factor wikipedia , lookup
Proteolysis wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Community fingerprinting wikipedia , lookup
Expression vector wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Structural alignment wikipedia , lookup
Gene nomenclature wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Point mutation wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Homology modeling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene regulatory network wikipedia , lookup
Transcriptional regulation wikipedia , lookup
BIOINFORMATIK I UEBUNG 2 http://icbi.at/bioinf mRNA processing splicing Spliceosome assembly U2 A U1 GU U4 U2AF YAG U6 U5 hnRNP U1 U4 U2 A U6 U5 YAG SR proteins kinases and phosphatases U1 U4 U2 A RNA helicases U6 + ~200 non-snRNP proteins U5 YAG Cyclophilins Different levels of regulation Regulation of transcription ChIP procedure E/F E/F PPAR RXR A/B A/B C C PPRE PPRE AACTAGGTCAAAGGTCA Farnham, Nature Rev Genetics, 2009 DNA microRNAs http://www.mirbase.org/ Ensembl BioMart UCSC Table Browser UCSC Table Browser Notepad++ and regular expressions begin of line > any symbol 0 or more times ^ > . * \r \n carriage return (CR) line feed (LF) Notepad++ and regular expressions character meaning \ escape; used to make specials non-special () group; you can retrieve its contents e.g. with \1 for the first occurrence [] any character inside is considered a match . matches any character * match the previous character 0 or more times + match the previous character 1 or more times {n} match the previous character n times ^ if the first character in the regex, means “beginning of line”; inside [] means “not” $ last character in the regex, means “end of line” \s any space character (space, tab) \t tab (-->) \r carriage return (CR) \n line feed (LF) Notepad++ and regular expressions ^>.*\r\n ^[ACGT].*\r\n ^(.{20}).*\r\n replace with replace with replace with \1\r\n \r\n replace with > replace with \r\n> repeatMasking=none replace with \r\n ^>.*\r\n replace with .*(.{20})$ replace with \1 Sequence Logo http://icbi.at/logo KEGG Protein domains Uniprot, Prosite, Interpro, Pfam, CD, SMART Gene Ontology The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. 3 organizing principles • cellular component (e.g. mitochondrium) • biological process (e.g. lipid metabolism) • molecular function (e.g. hydrolase activity) Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a GO term Evidence code ISS IEP IMP IGI IPI IDA RCA TAS NAS IC ND Inferred from Sequence Similarity Inferred from Expression Pattern Inferred from Mutant Phenotype Inferred from Genetic Interaction Inferred from Physical Interaction Inferred from Direct Assay Inferred from Reviewed Computational Analysis Traceable Author Statement Non-traceable Author Statement Inferred by Curator No biological Data available Directed acyclic graph (DAG) with different levels and 2 relations (part_of, is_a) Orthologs Protein A Homologs: A – B – C Orthologs: B1 – C1 Paralogs: C1 – C2 –C3 Inparalogs: C2 – C3 Outparalogs: B2 – C1 Xenologs: A1 – AB1 Orthologous prediction Ortholog databases • YOGY (eukarYotic OrtholoGY) is a web-based resource and integrates 5 independent resources (Sanger) • COG Cluster of ortholog groups of proteins and KOG for 7 eukaryotic genomes (NCBI), • Inparanoid (Center Stockholm Bioinformatics) • HomoloGene (NCBI) • OrthoMCL use Markov Clustering algorithm (University of Pennsylvania) Multiple sequence alignment (CLUSTALW) Progressive tree alignment Jalview Exercise 2-1: REGULATORY GENOMICS Pyruvate Carboxylase as example Ensembl Biomart 1.1 For the human transcript NM_000920 (pyruvate carboxylase) find official gene symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3'UTR sequence as fasta file, length of 3'UTR microRNA target prediction 1.2 Is there a complementary sequence within the 3'UTR of PC to postion 2-8 in the sequence of microRNA hsa-mir-182. UCSC genome browser 1.3 Position of transcript start site and transcription end of Pyruvate carboxylase (NM_000920) in hg19 assembly Exercise 2-1: REGULATORY GENOMICS Find splicing signals 1.4 Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders from pyruvate carboxylase using UCSC table browser and Notepad++ 1.5 Construct in both cases sequence logo and frequency plot. Can you identify (regulatory) sequence motifs? Regulatory motifs (transcription factor binding sites) 1.6 We know from Chromatin immunoprecipitation (ChIP-seq) experiments in a mouse cell line that the transcription factor Pparg is binding near the pyruvate carboxylase gene and hence potentially regulate its transcription (ppar.wig). Show binding region as custom track in UCSC genome browser and extract sequence. Exercise 2-2: PROTEIN FUNCTION Identify function /processes/pathways for a protein 2.1 What is the function of pyruvate carboxylase and in which pathways and processes this enzyme is involved? Show pathway maps and find Enzyme ID (EC) using KEGG Identify functional domains and Gene Ontology Annotation of the protein sequence using Uniprot, Prosite, Pfam Find orthologs and perform multiple sequence alignment 2.2 Find ortholog protein sequences in Mus musculus, Rattus norvegicus, Saccharomyces cervisiae, perform multiple sequence alignment using ClustalW, and visualize with Jalview.