* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Non-coding DNA wikipedia , lookup
Biochemical cascade wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Paracrine signalling wikipedia , lookup
Western blot wikipedia , lookup
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
Gene desert wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Epitranscriptome wikipedia , lookup
Transcription factor wikipedia , lookup
Proteolysis wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Community fingerprinting wikipedia , lookup
Expression vector wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Structural alignment wikipedia , lookup
Gene nomenclature wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Point mutation wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Homology modeling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene regulatory network wikipedia , lookup
Transcriptional regulation wikipedia , lookup
BIOINFORMATIK I UEBUNG 2
http://icbi.at/bioinf
mRNA processing
splicing
Spliceosome assembly
U2
A
U1
GU
U4
U2AF
YAG
U6
U5
hnRNP
U1
U4
U2
A
U6
U5
YAG
SR proteins
kinases and phosphatases
U1
U4
U2
A
RNA helicases
U6
+ ~200 non-snRNP
proteins
U5
YAG
Cyclophilins
Different levels of regulation
Regulation of transcription
ChIP procedure
E/F
E/F
PPAR RXR
A/B
A/B
C
C
PPRE
PPRE
AACTAGGTCAAAGGTCA
Farnham, Nature Rev Genetics, 2009
DNA
microRNAs
http://www.mirbase.org/
Ensembl BioMart
UCSC Table Browser
UCSC Table Browser
Notepad++ and regular expressions
begin of line
>
any symbol
0 or more times
^ > . * \r \n
carriage return (CR)
line feed (LF)
Notepad++ and regular expressions
character
meaning
\
escape; used to make specials non-special
()
group; you can retrieve its contents e.g. with \1 for the first occurrence
[]
any character inside is considered a match
.
matches any character
*
match the previous character 0 or more times
+
match the previous character 1 or more times
{n}
match the previous character n times
^
if the first character in the regex, means “beginning of line”; inside [] means “not”
$
last character in the regex, means “end of line”
\s
any space character (space, tab)
\t
tab (-->)
\r
carriage return (CR)
\n
line feed (LF)
Notepad++ and regular expressions
^>.*\r\n
^[ACGT].*\r\n
^(.{20}).*\r\n
replace with
replace with
replace with
\1\r\n
\r\n
replace with
>
replace with
\r\n>
repeatMasking=none
replace with
\r\n
^>.*\r\n
replace with
.*(.{20})$
replace with
\1
Sequence Logo
http://icbi.at/logo
KEGG
Protein domains
Uniprot, Prosite, Interpro, Pfam, CD, SMART
Gene Ontology
The Gene Ontology project provides a controlled vocabulary to describe gene and gene
product attributes in any organism.
3 organizing principles
• cellular component (e.g. mitochondrium)
• biological process (e.g. lipid metabolism)
• molecular function (e.g. hydrolase activity)
Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a GO term
Evidence code
ISS
IEP
IMP
IGI
IPI
IDA
RCA
TAS
NAS
IC
ND
Inferred from Sequence Similarity
Inferred from Expression Pattern
Inferred from Mutant Phenotype
Inferred from Genetic Interaction
Inferred from Physical Interaction
Inferred from Direct Assay
Inferred from Reviewed Computational Analysis
Traceable Author Statement
Non-traceable Author Statement
Inferred by Curator
No biological Data available
Directed acyclic graph (DAG) with different levels and 2 relations (part_of, is_a)
Orthologs
Protein A
Homologs: A – B – C
Orthologs: B1 – C1
Paralogs: C1 – C2 –C3
Inparalogs: C2 – C3
Outparalogs: B2 – C1
Xenologs: A1 – AB1
Orthologous prediction
Ortholog databases
• YOGY (eukarYotic OrtholoGY) is a web-based resource and
integrates 5 independent resources (Sanger)
• COG Cluster of ortholog groups of proteins and KOG for 7
eukaryotic genomes (NCBI),
• Inparanoid (Center Stockholm Bioinformatics)
• HomoloGene (NCBI)
• OrthoMCL use Markov Clustering algorithm (University of
Pennsylvania)
Multiple sequence alignment (CLUSTALW)
Progressive tree alignment
Jalview
Exercise 2-1: REGULATORY GENOMICS
Pyruvate Carboxylase as example
Ensembl Biomart
1.1 For the human transcript NM_000920 (pyruvate carboxylase) find official gene
symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3'UTR sequence as
fasta file, length of 3'UTR
microRNA target prediction
1.2 Is there a complementary sequence within the 3'UTR of PC to postion 2-8 in the
sequence of microRNA hsa-mir-182.
UCSC genome browser
1.3 Position of transcript start site and transcription end of Pyruvate carboxylase
(NM_000920) in hg19 assembly
Exercise 2-1: REGULATORY GENOMICS
Find splicing signals
1.4 Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders
from pyruvate carboxylase using UCSC table browser and Notepad++
1.5 Construct in both cases sequence logo and frequency plot. Can you identify
(regulatory) sequence motifs?
Regulatory motifs (transcription factor binding sites)
1.6 We know from Chromatin immunoprecipitation (ChIP-seq) experiments in a mouse
cell line that the transcription factor Pparg is binding near the pyruvate carboxylase
gene and hence potentially regulate its transcription (ppar.wig). Show binding region as
custom track in UCSC genome browser and extract sequence.
Exercise 2-2: PROTEIN FUNCTION
Identify function /processes/pathways for a protein
2.1 What is the function of pyruvate carboxylase and in which pathways and
processes this enzyme is involved?
Show pathway maps and find Enzyme ID (EC) using KEGG
Identify functional domains and Gene Ontology Annotation of the protein sequence
using Uniprot, Prosite, Pfam
Find orthologs and perform multiple sequence alignment
2.2 Find ortholog protein sequences in Mus musculus, Rattus norvegicus,
Saccharomyces cervisiae, perform multiple sequence alignment using ClustalW, and
visualize with Jalview.
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            