Download Research Plan for Data Mining in Bioinformatics at Ewha CSE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining for BioInformatics at Ewha CSE
Dec. 14, 2001
Hwan-Seung Yong
(Gene: ACTGAAAGGGCTCTCAAA)
Dept. of Computer Science & Engineering
Ewha Womans Univ.
BioInformatics and Computer Science
• Computer: 2진법 시스템(0/1) designed by Human
• Living things: 4진법(A/G/C/T) designed by Nature
• 컴퓨터 기술의 발전
–
–
–
–
데이터 분석 + 데이타베이스 = 데이터 마이닝 (At present)
고성능 병렬 컴퓨터 기술
분산 처리 및 웹/X ML 기술
지식관리(Knowledge Management) 기술의 등장
For BioInformatics
• 인간이 컴퓨터를 만든 이유
– 4진법속에 담긴 생명의 비밀을 찾아서
– 신의 영역에 도전
BioInformatics and Computer Science
• BioInformatics
– DNA 코드 Reader(biotechnology) 및 Alignment 기술 개발
• 유전자의 전체 시퀀스를 겨우 만든 상태
– 이것으로 부터 의미(유전자 등)를 찾는 것.
– Binary Object로 부터 Source Code를 찾는 기술
• Disassembler와 Reverse Engineering 기술 전문가가 필요
– 데이타마이닝이 중요한 적용 기술임.
Computer System
Binary Code
Assembly Code
Source Code
DNA Sequence
유전자
단백질
Living Things: Nature
Why Ewha CSE is appropriate for
BioInformatics
• Recent focus of CSE’s Research Area
–
–
–
–
–
–
As a BK Project Plan: Knowledge Engineering Framework
Data Warehousing and OLAP
Data Mining
XML Technology
Knowledge Engineering Enabling Technology
Knowledge Engineering Application
• Electronic Commerce
• BioInformatics
• 본교 관련 연구기관
– 분자생명과학대학원 (BK)
– 한국과학재단 SRC(세포신호전달센터)
– 정통부 컴퓨터 그래픽스/가상현실 연구센터
• 기존의 관련연구(직접)
– 검찰청 유전자 검색 및 자동분석 프로그램 개발
– 국립과학수사연구소 유전자 정보 관리 시스템 개발
유전자 자동분석 프로그램
유전밴드 인식, 코드 등록
프로그램
DNA Locus Registration Interface
Data Warehousing, OLAP and Data Mining
• Data Warehousing and OLAP
–
–
–
–
–
–
ETL Methodology (Extraction, Transformation and Loading)
Data Warehouse Architecture
OLAP Server Development
Multidimensional Data Processing
Metadata Handling
Data Quality Control
• Data Mining
–
–
–
–
–
–
Classification and Analysis of Data Minig Technique
Clustering Algorithm
Association Algorithm
Classification Algorithm
CRM Appliation based on Web Log Mining
Text Mining for XML Data
XML and Supporting Technology
• XML Related Area
– XML Server Development
• Query Processing and Storage System
– XML document Mining
• Knowledge Enabling Technology
–
–
–
–
–
–
Multimedia Highspeed Network
Component based Software Engineering
Security
Multimedia DBMS
Natural Language Processing
Computer Graphics and Virtual Reality
Research Requirement for BioInformatics
• Large Volume of Data including multimeia data
• High Performace Computing System
– Massively Parallel Processing Hardware and Software
• XML related work is important
– For exchange of bio data
– Gene Annotation
• Web based collaborative system
– Require web based interoperable application and standard
– Distributed processing technique
• CORBA, SOAP, Microsoft .NET framework
• Data Mining
– For Gene Prediction, Functional Genomics
Bio Data Mining Research
• XML Standard for Bio Data
• Graphical User Interface for XML Data
• Data Converter to XML
– Convert Existing Bio Data to XML Standard
– Convert between Some XML Standard
• Integration Methodology with Existing DB
– SOAP(Simple Object Access Protocol)
– WSDL(Web Service Description Language)
XML Standard for Bio Data
• Before
– FASTA format, GenBank format, GFF(General Feature Format)
• XML Format
– AGAVE (Architecture for Genomic Annotation, Visualization and
Exchange)
•
•
•
•
•
Developed by Double Twist, Inc.
Released in June 2000
Open Source licence in August 2001.
AGAVE 3.2 version with Prophecy 3.0 in Sept. 2001
Refer http://www.agavexml.org
• Genome XML Viewer by Labbook
– BSML
XML standard for Bio Data
• BioXML Standard and GAME
– an open-source/free software organization dedicated to providing a
set of standard xml formats for the exchange of biological data
• GAME(Genomic Annotation Markup Language)
–
–
–
–
Created at BDGP (Berkeley Drosophila Genome Project)
Current Version 1.1 released in March 2000
http://www.bioxml.org
Follow WikiWeb scheme
• collaborative web site that can be edited by anyone
• Community documentation system
• Everyone can edit sharing web pages
컴퓨터이론 및 보안 연구실
Whole genome sequence
annotation
Known gene
Unknown gene
• Sequence similarity
• Neural networks
• Hidden Markov models
Unknown gene prediction
Microarray data analysis
Phylogenetic prediction
Phylogeny inference
Phylogenetic analysis
Comparative genomics
Data mining tools
Two samples comparison
Phylogenetic Tree Visualization
• Tree drawing algorithms
• Graph drawing algorithms
Clustering
classification tools
Multiple samples comparison
New algorithm design
•Simulated annealing
•Other optimization techniques
Open Source Project
• Open BioInformatics Foundation
– http://www.open-bio.org
– Umbralla group for various bio*.org group
• bioxml.org, bioperl.org, biopython.org, biojava.org, biocorba.org
• biopathways.org
• bio-ensembl.org
– Annotation for human genome
– The First Bioinformatics Open Source Conference
(BOSC'2001) was held, August 2001 at San Diego.
– Many Open System Activities
Vision and Future Prediction
• Ewha will
– Contribute something in Bio Data Mining Area
– Have Bio Informatics Institute or Research Center
– Have strong bio-industry relationship
• Closing Comment
ATGCCGTCGGGCCCCGGGGC
=> Thank You를 4진법으로 표현
Related documents