Download Lesson01Intro

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Doug Raiford
Lesson 1


Biologists and Computer Scientists
Note the word “Scientists”
5/23/2017
Introduction
2

Wikipedia
 Computational biology encompasses bioinformatics
 Bioinformatics applies algorithms and statistical techniques to
the interpretation, classification and understanding of
For the purposes of this
biological datasets

NCBI

course we are treating the
Bioinformatics: Research, development or application of
computational
tools and
for expanding the use of
terms
as approaches
synonymous
biological, medical, behavioral or health data, including those to
acquire, store, organize, archive, analyze or visualize such data.
 Computational Biology: The development and application of
data-analytical and theoretical methods, mathematical
modeling and computational simulation techniques to the
study of biological, behavioral, and social systems
5/23/2017
Introduction
3


It’s All About the Data
Virtually every biological experiment requires
a processor and software
5/23/2017
Introduction
4


Genetic material
comprised of 3 billion
base-pairs
The sheer volume of data
requires the involvement
of computational and
storage techniques in
order to analyze
5/23/2017
Introduction
5




Can now identify which
genes are affected by a
disease or treatment
Thousands of genes per
experiment
Multiple experiments per
time-point
Multiple time-points
5/23/2017
Introduction
6



Data growing exponentially
Thousands of complete genomes
Each genome results in thousands of
experiments
5/23/2017
Introduction
7



Vast amounts of data
More data coming in daily
Sophisticated computational
techniques required






5/23/2017
Clustering
Searches
Optimizations
Data mining
Pattern recognition
Classification
Introduction
8

A little about me
 Work
 School
5/23/2017
Introduction
9

Moodle is the
primary page
 Weekly schedule
▪ When homeworks are
due
▪ When projects are due
▪ Links to quizzes,
projects, and
homeworks

Instructor website
 Syllabus
 Slides
5/23/2017
Introduction
10

Bioinformatics: Sequence
and Genome Analysis

Beginning Perl for
Bioinformatics
5/23/2017
Introduction
11





3:30 to 5:00 Tuesday and Thursday
Or by appointment
Social Science 412
Phone 406-243-5605
Email
 [email protected]

A little about myself
5/23/2017
Introduction
12

Try to get assignments in on time
 Letter grade for each day late
90 - 100
87 - 89
Component
Undergrad
80 - 86
Homework
10%
77 - 79
Quizzes
25%
Exams (3 of them) 70 - 76
30%
67 - 69
Projects
35%
60 - 66
Grad Project
NA
00 - 59
5/23/2017
Introduction
A
B+Graduate
B 8%
C+21%
C 25%
D+29%
D 17%
F
13
Your work in this class needs to be
your own
 Overly similar work (to that of your
classmates or to content from the
web) will be considered to be the
result of copying

 First offense will result in a zero on the
assignment
 Second will be referred to the Dean of
Academic Affairs

Student Conduct Code
 http://life.umt.edu/vpsa/student_cond
uct.php
5/23/2017
Introduction
14

Let me know of any special
needs during this first week
 Letter from Disability Services for
Students (DSS)


Religious observances
Officially sanctioned, scheduled
University extracurricular activity
 opportunity to make up class
assignments or other graded
assignments
5/23/2017
Introduction
15


Improve the computer scientist’s understanding
of biological systems and problems
Improve the biologist’s understanding of the
science of computing and provide the
beginnings of a CS skill-set
5/23/2017
Introduction
16

Four Distinct Audiences
Computer Scientists
Undergrad



Grad
Biologists etc.
Undergrad
Grad
Computer scientists all about the algorithms, implementations,
programming languages, design, etc.
Biologists mostly just want an introduction to programming
Undergrads
 High-level overview

Graduate Students
 Specific tools and skills that will aid them in research
5/23/2017
Introduction
17

Undergraduates






Some algorithms (even implement some)
New language: Perl and R
Introduce programming concepts
Lots of practice programming (8 projects)
Lots of guidance from me
Graduate students
 Practice writing a grant (a draft and a final version)
 Practice writing a paper (a draft and a final version)
 Practice using several actual Bio Tools

All
 Team projects
5/23/2017
Introduction
18
5/23/2017
Introduction
19



Computer science wise
Not really anything new
More of an application of
existent techniques
 Dynamic programming
techniques
 Hidden Markov Models
 Exploratory data analysis
▪ Clustering
▪ Multivariate analysis
▪ Clustering
▪ Principal components analysis
5/23/2017
Introduction
20

Research
 Ph.D. generating publications

Employee in a company
 Drug company
 Genomics lab
5/23/2017
Introduction
21

Bioinformatician
 www.simplyhired.com
5/23/2017
Introduction
22

Techniques that are successful
in bioinformatics are the same
that are successful in other
data-intensive fields
5/23/2017
Introduction
23



5/23/2017
Hunger, need for clean water
Global warming
Disease
Introduction
24

Genetically engineered
crops
 Disease resistant
 Greater yields

Water treatment
 Genetically engineered
microbes
▪ Sewage treatment—
purification
▪ Clean oil spills
5/23/2017
Introduction
25

Plants consume CO2 and release O2
 But the carbon is released back into the
atmosphere over a period of
time
 Genetically engineered plants
could convert into stable form
5/23/2017
Introduction
26

Genetically enhanced
microbes convert back to
fuel
 Methanococcus jannaschii
 Takes CO2 and converts it to
methane
5/23/2017
Introduction
27


Test for increased risk
of certain cancers
Personalize medicine
 Leukemia
▪ Genetic profile resistant
to certain chemotherapy
 Increased risk of drug
reactions
5/23/2017
Introduction
28


Many drugs bind
to protein active
sites
Computational
techniques for
predicting drug
performance
5/23/2017
Introduction
29


Actually alter our genetic
code to treat genetic
disorder
Or simply add
disembodied gene to our
complement
5/23/2017
Introduction
30


What does it have to do with informatics?
Where do computer scientists fit in this
picture?
Role of computers and
computer scientists
5/23/2017
Introduction
31

Why biologists would attend
5/23/2017
Introduction
32




CS types good at the
data analysis
Must understand what
the data means
Don’t know what to look
for—what questions to
ask
Don’t speak the lingo
5/23/2017
Introduction
Haploid
Hypertonic
Hypotonic
Erythematous Cilia Cell membrane
Nucleus Lytic cycle Gene Biotic factors
Nulliparity
Hyperosmotic
Natural
selection Fluid mosaic model Solute
Homologous chromosome Ribosome
Mitochondria
Diffusion
Leucocytes
Photosynthesis
Genetic variation
Organism Plasma membrane Cytoplasm
Wagners disease
Meiosis
Habitat
Diploid Cell Youpon Concentration
gradient
Ecosystem
Homeostasis
Mitosis
Osmosis
Allele
Enzyme
Autotrophic Egestion Mitochondrion
Gamete Organisms Nucleotide Aminoacyl Gene expression Point mutation
Duplication event
33




Biologists understand
the data
Don’t know how to
formulate the problem
in CS terms
Don’t know what magic
the CS types can bring
to the table
Don’t speak the lingo
5/23/2017
Introduction
Acyclic graph Heap sort Huffman coding
Adjacency-matrix
Admissible vertex
Abstract data type Algorithm All pairs
shortest path Euclidean distance Hash
Tree Linked list Heap Complexity analysis
Recursion Dynamic programming Graph
Hamiltonian path Heuristic Hidden
Markov Model Principal components
analysis Isomorphic Simplex algorithm
Mahalanobis distance Discrete event
simulation
NP-complete
Big
O
Optimization problem Polymorphism
Polynomial time Clustering Classifying
Stack Queue Stochastic modeling Tail
recursion Binary tree Self organizing map
Shortest common string Minimum
spanning tree Singular matrix Trie Vertex
cover
34


Won’t be a full-fledged bioinformatician
Will be able to contribute given
 close guidance
 practice and continued training and guidance
5/23/2017
Introduction
35


Biologists
perform all
steps

 Might involve data retrieval if utilizing repository data


5/23/2017
Determine problem to be solved given data
Determine which tool to utilize
Manually Format data for input to tool
Run tool
Analyze results
Introduction
36
Biologist
Computer
Scientist




Determine problem to be solved given data
Develop algorithmic approach
Implement algorithm (write code)
Format data for input to algorithm
 Might involve data retrieval if utilizing repository data
Biologist
5/23/2017


Run code
Analyze results
Introduction
37
Biologist
Computer
Scientist




Determine problem to be solved given data
Develop algorithmic approach
Implement algorithm (write code)
Format data for input to algorithm
 Might involve data retrieval if utilizing repository data
Biologist
5/23/2017


Run code
Analyze results
Introduction
38

CS types
 Provide beginnings of a biology
background
 Introduce some existing tools,
sources of data, and analysis
techniques

Biologists
 Introduce some existing tools,
sources of data, and analysis
techniques
 Provide some programming
essentials
5/23/2017
Introduction
39