Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-coding DNA wikipedia , lookup

Biochemical cascade wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Paracrine signalling wikipedia , lookup

Western blot wikipedia , lookup

Magnesium transporter wikipedia , lookup

Interactome wikipedia , lookup

Gene desert wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Epitranscriptome wikipedia , lookup

Transcription factor wikipedia , lookup

Proteolysis wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene wikipedia , lookup

Expression vector wikipedia , lookup

RNA-Seq wikipedia , lookup

Protein structure prediction wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Structural alignment wikipedia , lookup

Gene nomenclature wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Point mutation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene expression wikipedia , lookup

Homology modeling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Transcript
BIOINFORMATIK I UEBUNG 2
http://icbi.at/bioinf
mRNA processing
splicing
Spliceosome assembly
U2
A
U1
GU
U4
U2AF
YAG
U6
U5
hnRNP
U1
U4
U2
A
U6
U5
YAG
SR proteins
kinases and phosphatases
U1
U4
U2
A
RNA helicases
U6
+ ~200 non-snRNP
proteins
U5
YAG
Cyclophilins
Different levels of regulation
Regulation of transcription
ChIP procedure
E/F
E/F
PPAR RXR
A/B
A/B
C
C
PPRE
PPRE
AACTAGGTCAAAGGTCA
Farnham, Nature Rev Genetics, 2009
DNA
microRNAs
http://www.mirbase.org/
Ensembl BioMart
UCSC Table Browser
UCSC Table Browser
Notepad++ and regular expressions
begin of line
>
any symbol
0 or more times
^ > . * \r \n
carriage return (CR)
line feed (LF)
Notepad++ and regular expressions
character
meaning
\
escape; used to make specials non-special
()
group; you can retrieve its contents e.g. with \1 for the first occurrence
[]
any character inside is considered a match
.
matches any character
*
match the previous character 0 or more times
+
match the previous character 1 or more times
{n}
match the previous character n times
^
if the first character in the regex, means “beginning of line”; inside [] means “not”
$
last character in the regex, means “end of line”
\s
any space character (space, tab)
\t
tab (-->)
\r
carriage return (CR)
\n
line feed (LF)
Notepad++ and regular expressions
^>.*\r\n
^[ACGT].*\r\n
^(.{20}).*\r\n
replace with
replace with
replace with
\1\r\n
\r\n
replace with
>
replace with
\r\n>
repeatMasking=none
replace with
\r\n
^>.*\r\n
replace with
.*(.{20})$
replace with
\1
Sequence Logo
http://icbi.at/logo
KEGG
Protein domains
Uniprot, Prosite, Interpro, Pfam, CD, SMART
Gene Ontology
The Gene Ontology project provides a controlled vocabulary to describe gene and gene
product attributes in any organism.
3 organizing principles
• cellular component (e.g. mitochondrium)
• biological process (e.g. lipid metabolism)
• molecular function (e.g. hydrolase activity)
Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a GO term
Evidence code
ISS
IEP
IMP
IGI
IPI
IDA
RCA
TAS
NAS
IC
ND
Inferred from Sequence Similarity
Inferred from Expression Pattern
Inferred from Mutant Phenotype
Inferred from Genetic Interaction
Inferred from Physical Interaction
Inferred from Direct Assay
Inferred from Reviewed Computational Analysis
Traceable Author Statement
Non-traceable Author Statement
Inferred by Curator
No biological Data available
Directed acyclic graph (DAG) with different levels and 2 relations (part_of, is_a)
Orthologs
Protein A
Homologs: A – B – C
Orthologs: B1 – C1
Paralogs: C1 – C2 –C3
Inparalogs: C2 – C3
Outparalogs: B2 – C1
Xenologs: A1 – AB1
Orthologous prediction
Ortholog databases
• YOGY (eukarYotic OrtholoGY) is a web-based resource and
integrates 5 independent resources (Sanger)
• COG Cluster of ortholog groups of proteins and KOG for 7
eukaryotic genomes (NCBI),
• Inparanoid (Center Stockholm Bioinformatics)
• HomoloGene (NCBI)
• OrthoMCL use Markov Clustering algorithm (University of
Pennsylvania)
Multiple sequence alignment (CLUSTALW)
Progressive tree alignment
Jalview
Exercise 2-1: REGULATORY GENOMICS
Pyruvate Carboxylase as example
Ensembl Biomart
1.1 For the human transcript NM_000920 (pyruvate carboxylase) find official gene
symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3'UTR sequence as
fasta file, length of 3'UTR
microRNA target prediction
1.2 Is there a complementary sequence within the 3'UTR of PC to postion 2-8 in the
sequence of microRNA hsa-mir-182.
UCSC genome browser
1.3 Position of transcript start site and transcription end of Pyruvate carboxylase
(NM_000920) in hg19 assembly
Exercise 2-1: REGULATORY GENOMICS
Find splicing signals
1.4 Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders
from pyruvate carboxylase using UCSC table browser and Notepad++
1.5 Construct in both cases sequence logo and frequency plot. Can you identify
(regulatory) sequence motifs?
Regulatory motifs (transcription factor binding sites)
1.6 We know from Chromatin immunoprecipitation (ChIP-seq) experiments in a mouse
cell line that the transcription factor Pparg is binding near the pyruvate carboxylase
gene and hence potentially regulate its transcription (ppar.wig). Show binding region as
custom track in UCSC genome browser and extract sequence.
Exercise 2-2: PROTEIN FUNCTION
Identify function /processes/pathways for a protein
2.1 What is the function of pyruvate carboxylase and in which pathways and
processes this enzyme is involved?
Show pathway maps and find Enzyme ID (EC) using KEGG
Identify functional domains and Gene Ontology Annotation of the protein sequence
using Uniprot, Prosite, Pfam
Find orthologs and perform multiple sequence alignment
2.2 Find ortholog protein sequences in Mus musculus, Rattus norvegicus,
Saccharomyces cervisiae, perform multiple sequence alignment using ClustalW, and
visualize with Jalview.