Download 沒有投影片標題 - Academia Sinica

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction of
Genome Research
Bioinformatics Research Center
Institute of Biomedical Sciences
ACADEMIA SINICA
莊樹諄
www.sinica.edu.tw/~trees/bioinformatics
E-mail: [email protected]
90/4/9 pm
中研院生物資訊中心 (BRC)
1
Outline




90/4/9 pm
Introduction
Some Research Topics
Related Links and Resources
Bioinformation Research Center
(BRC)
中研院生物資訊中心(BRC)
2
Chromosome
90/4/9 pm
中研院生物資訊中心(BRC)
3
Introduction
DNA Sequence
Gene
5‘
3’
Exon(coding regions)
Intron

mRNA



 
cDNA
Complement DNA
RNA
Protein
Function
5‘UTR
90/4/9 pm
DNA
3’UTR
ORF
4
Introduction
 Phosphoric acid(磷酸)
 Deoxyribose (去氧核糖)
DNA  nucleotide acid (核苷酸)
 Nitrogenous base
(含氮鹽基)
Nitrogenous base
(含氮鹽基)
 Purines :
 Pyrimidine :
Adenine (A, 腺嘌呤)
Guanine (G, 鳥糞嘌呤)
Cytosine (C, 胞嘧啶)
Thymine (T, 胸腺嘧啶)
DNA sequence: A, C, G, T --- 4 letters
RNA sequence: A, C, G, U (Uracil, (U), 尿嘧啶) --- 4 letters
90/4/9 pm
中研院生物資訊中心 (BRC)
5
5‘
3‘
ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA
TGGCACACCGTCACGTGTCCATAAACCGGTATCTGT
3‘
5‘
Codon
ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA
Amino acid
90/4/9 pm
43 = 64  20
中研院生物資訊中心(BRC)
6
Introduction
DNA sequence: A, C, G, T --- 4 letters
RNA sequence: A, C, G, U --- 4 letters
Amino acid sequence: --- 20 letters
First
Position (5’)
U
C
A
G
90/4/9 pm
Second position
U
C
A
G
Third
Position (3’)
Phe (F)
Phe (F)
Leu (L)
Leu (L)
Leu (L)
Leu (L)
Leu (L)
Leu (L)
Ile (I)
Ile (I)
Ile (I)
Met (M)
Ser (S)
Ser (S)
Ser (S)
Ser (S)
Pro (P)
Pro (P)
Pro (P)
Pro (P)
Thr (T)
Thr (T)
Thr (T)
Thr (T)
Tyr (Y)
Tyr (Y)
Stop
Stop
His (H)
His (H)
Gln (Q)
Gln (Q)
Asn (N)
Asn (N)
Lys (K)
Lys (K)
Cys (C)
Cys (C)
Stop
Trp (W)
Arg (R)
Arg (R)
Arg (R)
Arg (R)
Ser (S)
Ser (S)
Arg (R)
Arg (R)
U
C
A
G
Val
Val
Val
Val
Ala
Ala
Ala
Ala
Asp (D)
Asp (D)
Glu (E)
Glu (E)
Gly (G)
Gly (G)
Gly (G)
Gly (G)
U
C
A
G
(V)
(V)
(V)
(V)
(A)
(A)
(A)
(A)
中研院生物資訊中心
U
C
A
G
U
C
A
G
7
Introduction
 6-frame
translations
5'3' Frame 1 aagctgatcgatcgattttagatagagaaaaaact
K L I D R F - I E K K
5'3' Frame 2 aagctgatcgatcgattttagatagagaaaaaact
S - S I D F R - R K N
5'3' Frame 3 aagctgatcgatcgattttagatagagaaaaaact
A D R S I L D R E K T
3'5' Frame 1 agttttttctctatctaaaatcgatcgatcagctt
S F F S I - N R S I S
3'5' Frame 2 agttttttctctatctaaaatcgatcgatcagctt
V F S L S K I D R S A
3'5' Frame 3 agttttttctctatctaaaatcgatcgatcagctt
F F L Y L K S I D Q L
90/4/9 pm
中研院生物資訊中心
8
Introduction
Gene : Exon & Intron
cDNA Database




EST (Expressed Sequence Tags) DB
HGI (Human Gene Index) DB
UniGene DB
90/4/9 pm
中研院生物資訊中心 (BRC)
9
Introduction
Human Genome Sequencing
(2/11/2001)
Draft
61.0 %
Finished
32.5%
Total
90/4/9 pm
93.5 %
中研院生物資訊中心
10
Chromosome
gap
90/4/9 pm
中研院生物資訊中心
12
Introduction
Genome Database -- 3×109
HTGS (High Throughput Genomic Sequences)
Phase 0: Single-few pass reads of a single clone (not contigs)
Phase 1: Unfinished, may be unordered, unoriented contigs,
with gaps.
Phase 2: Unfinished, ordered, oriented contigs, with or without
gaps.
Phase 3: Finished, no gaps (with or without annotations).
90/4/9 pm
中研院生物資訊中心 (BRC)
12
Introduction
Size range (kb)
Contigs
Aggregate size (kb)
<30 kb
44
666
0.1%
30-100
479
32172
4.9%
100-250
1628
260933
39.9%
250-500
421
144518
22.1%
500-1000
145
98623
15.1%
>1000 kb
43
116557
17.8%
2760
653471
100.0%
total
90/4/9 pm
中研院生物資訊中心 (BRC)
Percent of total
13
Outline




90/4/9 pm
Introduction
Some Research Topics
Related Links and Resources
Bioinformation Research Center
(BRC)
中研院生物資訊中心
14
Gene number of human
 Early estimate: 60,000~100,000
 By Ch22: ~45,000
 By EST: ~140,000
 By Ch22 & HGI-5.0: ~120,000 (1.38-fold gene
rich and extremely cleaning and assemble process)
 By 2/16/2001 Science: ~ 30,000
 There are many more genes awaiting discovery
within the sequence
90/4/9 pm
中研院生物資訊中心
15
Some Research Topics
Alternative Splicing
Genome Annotation
Gene Signature
Human Diversity
90/4/9 pm
中研院生物資訊中心 (BRC)
16
Human Genome: 3x109 bp
Genomic Sequence
Variations
Single Nucleotide Polymorphism (SNP)
106-107
gSNP
Gene
Coding Region
cSNP
Non-coding Region
rSNP
iSNP
Inter-genic Region
nSNP
Functional Variants (5%)
90/4/9 pm
中研院生物資訊中心
17
Gene-based SNPs
Gene 2
Gene 1
exon
P1
P2
Intron
cSNP iSNP
90/4/9 pm
nSNP
rSNP
中研院生物資訊中心
18
Human Diversity
SNP (Single Nucleotide Polymorphism)
cSNP (Coding SNP)
acccgctcgtcgct tgt cggctaattgcgcgaat
C
c g
C H
Synonymous
(tgt tgc C)
Silent
90/4/9 pm
tat  Y
Non-synonymous
Y: polar
(tgt C, tgg W)
C: polar W: nonpolar (Conservative)
(Non-conservative)
中研院生物資訊中心 (BRC)
19
Human Diversity
SNP (Single Nucleotide Polymorphism)
cSNP (Coding SNP)
Purines (A/G) & Pyrimidines (C/T)
Transition: A
G,
Transversion: A/G
C
T
C/T
CD-CV: common diseases - common variants.
90/4/9 pm
中研院生物資訊中心 (BRC)
20
Pseudogene
Ch22: 134 pseudogenes (134/679  19%)
Pseudogene
Processed pseudogene (cDNAgenebank, 82% of 134
pseudogenes)
a)
Single block
b)
Lack characteristic intron – exon structure
Spliced pseudogene – segments of duplicated gene families
90/4/9 pm
中研院生物資訊中心 (BRC)
21
Repetitive Sequence
Centromere
Telomere
Tandem Repeats
Mini Satellite (Variable Number Tandem Repeats (VNTR)): 15~100 bp
Micro Satellite (Short Tandem Repeats (STR)): 2~5 bp
α-Satellite: at centromere
Telomere Repeats
Interspersed Repeats
SINEs (Short Interspersed Elements): Alu, MIR, MER, LTR, PTR, 
LINEs (Long Interspersed Elements): LINE1, LINE2, 
90/4/9 pm
中研院生物資訊中心 (BRC)
22
Outline





90/4/9 pm
Introduction
Some Research Topics
Related
Related Links
Links and
and Resources
Resources
Bioinformation Research Center
(BRC)
中研院生物資訊中心
23
Related Links and Resources
 TIGR(The Institute for Genomic Research)
http://www.tigr.org/
 NCBI (National Center for Biotechnology Information)
http://www.ncbi.nlm.nih.gov/
 Sanger --- http://www.ensembl.org/
 Japan Science and Technology Corporation - Advanced
Lifescience Information System JST - ALIS )
http://www-alis.tokyo.jst.go.jp/HGS/top.pl
90/4/9 pm
中研院生物資訊中心 (BRC)
24
Related Links and Resources
 Gene
Prediction Programs
 http://www.bork.embl-heidelberg.de/genepredict.html
 http://linkage.rockefeller.edu/wli/gene/programs.html
 ExPASy_Traslate Tool
http://expasy.nhri.org.tw/tools/dna.html
Bioinformatics Research Center, Academia Sinica
http://www.sinica.edu.tw/~trees/bioinformatics/bioinformatics.
html
90/4/9 pm
中研院生物資訊中心 (BRC)
25
Outline




90/4/9 pm
Introduction
Some Research Topics
Related Links and Resources
Bioinformation Research Center
(BRC)
中研院生物資訊中心
26
Firewall
Local Server
Lab. 1
90/4/9 pm
Lab. 2
Lab. 3
中研院生物資訊中心
27
CRASA:
Complexity Reduction Algorithm for Sequence Analysis
 Genome Annotation
 Alternative Splicing
 SNP (Single Nucleotide Polymorphism)
cDNA
database
90/4/9 pm
Genome Sequences:
Chromosome1~22,
X,Y
中研院生物資訊中心 (BRC)
28
CRASA:
Complexity Reduction Algorithm for Sequence Analysis
Environment




PC Clustering: 10 PC (PIII-667), 1 Server
Win2000 (NT)
HD: IDE support RAID
DB2
Algorithm




Progressive Processing: Pyramid Structure
Pattern Match
Direct Search
Parallel Processing
90/4/9 pm
中研院生物資訊中心 (BRC)
29
Parallel Processing
Sorting & assembling:
CPU bound
Server
Network I/O bound
HD I/O bound
p1
p2
p3
query
90/4/9 pm
中研院生物資訊中心
30
Bioinformatics
?
Computer Science
Biology
Related documents