Download Class Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Abstraction (computer science) wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

Error detection and correction wikipedia , lookup

Data-intensive computing wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Introduction to Computational
Thinking
Vicky Chen
Fundamental Theorem of Informatics
Friedman C P J Am Med Inform Assoc 2009;16:169-170
What Informatics Is Not
Friedman C P J Am Med Inform Assoc 2009;16:169-170
Computational Thinking
Computational thinking is a way of solving
problems, designing systems, and understanding
human behavior that draws on concepts
fundamental to computer science. To flourish in
today's world, computational thinking has to be
a fundamental part of the way people think and
understand the world.
http://www.cs.cmu.edu/~CompThink/
Computational Thinking
• Analyzing and logically organizing data
• Data modeling, data abstractions, and
simulations
• Formulating problems so computers may assist
• Identifying, testing, and implementing possible
solutions
• Automating solutions via algorithmic thinking
• Generalizing and applying this process to other
problems
Algorithm
• A finite list of instructions that describe all
required steps to perform a computation,
written in general language
Programming Steps
• Specification
– What the code should do
• Design
– Pseudocode
• Implement
– Programming
• Test
– Debugging
Data Type / Data Structure
•
•
•
•
•
Integer
Floating point
Boolean
Character
String
• List
• Dictionary
• Hash Table
Data Types
List
Dictionary / Hash Table
Exercise 1
We have a matrix with mutation information for
different tumor samples.
How can this data be represented?
List of Lists
• Data is a sparse matrix
• Stores a lot of extra uninformative information
Dictionary
Opening Files
• Mutation matrix contains data on 2337 genes
and 779 samples
• Inputting data by hand is not feasible
• Data usually read in and processed from files
Opening Files
Input and print
For Loops
While Loops
Conditional Statements
Conditional Statements
•
•
•
•
If, else if, else
and
or
not
Exercise 2
We have a dictionary that contains tumor
sample mutation information.
We want to print out a list of tumor samples
after receiving a mutated gene of interest from
the user.
Opening Files Revisited
Opening Files Revisited
Data Extraction from Files
• Many files will contain extra information
• Focus on extracting only pertinent data
• Applicable to many types of data
– Natural language documents (e.g. articles)
– Sequence data (e.g. FASTA files)
– Files from databases (e.g. NCBI Gene, TCGA)
– Etc.
Regular Expressions
Reusing Code
• Some code can be useful in multiple situations
• It is possible to just rewrite (or copy) the code
each time
– Less efficient
– Multiple locations to fix when debugging
Functions
Exercise 3
We have a document containing human gene
information downloaded from NCBI.
We want to extract and store the Ensembl ID of
each gene with its corresponding gene symbol.