Download MBG404_LS_02

Document related concepts

RSTS/E wikipedia , lookup

Commodore DOS wikipedia , lookup

MTS system architecture wikipedia , lookup

Plan 9 from Bell Labs wikipedia , lookup

Library (computing) wikipedia , lookup

Spring (operating system) wikipedia , lookup

OS 2200 wikipedia , lookup

Burroughs MCP wikipedia , lookup

Computer file wikipedia , lookup

Unix security wikipedia , lookup

VS/9 wikipedia , lookup

CP/M wikipedia , lookup

Transcript
Computational Biology
Dr. Jens Allmer
MBG404 Lecture Slides Week 2
What is a computer?
What is a program?
What is a Computer?
A manipulator of data
 Four operations computers perform:
•
input of data
•
•
•
processing of data
output of data
storage of data
 Computers do not rate data; they manipulate it
 Garbage in = garbage out
The Basics
What is a computer?
Input of data
Storage of data
Processing of data
Output of data
The Basics
Terminology:
 Hardware: physical components of a computer; what
you can touch and feel
 Software: instructions the hardware uses
 Some hardware includes software
(drivers, firmware)
 Program: a set of instructions
 Operating system: a program that allows applications
to communicate with the computer; the interface
between the human and the computer
 Applications: program that runs cooperatively with
an operating system. What we use to “compute”
Operating Systems
What is an Operating System?
 Interprets input from mouse and keyboard
 Creates the Graphical User Interface
(GUI)
 Finds files and directories on the hard disk
(HDD)
 Creates the monitor images
 Gives meanings to the buttons
 Knows where components are and sends information
to the right place
 OS: software for your computer
Operating Systems
What an operating system does:
You
Applications
“Programs”
Operating System
Your
Computer’s
Hardware
How the OS Works
A program is a collection of pieces:
 Primary piece is called the kernel
 Buttons in applications are objects
 “I must know my part of the process”
Save
OpenOffice
Writer
Linux
Save
Installing Applications
Loaded software:
 Typically stored in C:\Program Files
 An application is in pieces: kernel and supporting
 Microsoft Office: 8 applications, 3,286 files
Microsoft
Office
Program
Files
Terminology
»
»
»
»
Program
Application
Software
Operating System
Programs in Biology
» Sequence Alignment
– BLAST
– FASTA
– CUSTALW
– TEACOFFEE
– ...
» MS/MS Proteomics
– MASCOT
– OMSSA
– Sequest
– PEAKS
– ...
» Other more Targeted Purposes
Servers and Websites
» Example: NCBI
» Provide processing power (many computing nodes)
» Provide programs for you to use
» Provide GUI
» Is shared with other users (Queue)
Locally Run Programs
» Example: BLAST
» Only your PC computes
» Programs available on your computer (downloaded
installed)
» Some with some without GUI
» Not shared
» Additional features may be activated different
environment variables (e.g.: sequence db) may be used
KNIME || RapidMiner || Orange
» Datamining
» Data visualization
» Data analysis
» Machine Learing
Microsoft Access or Open Office Base
» Databases
» Store and manipulate large amount of data
» Organize and relate data
» Create reports
End Theory I
» Mind mapping 5 min
» Break 10 min
Practice I
» The command line interface
» Console
Console Programs
» Input
– Typing (responding to questions)
– Files
– Parameters
» Control
– Switches
– Parameters
– INI files
» Output
– Echoed to screen
– Files
Console Programs
» Program path/Program Title.Program Extention
– C:\test.exe
– C:/test.bat
– C:\folder1\folder2\test.exe
» Giving information to the program
– Any input on the command line follows the program
– C:/test.bat input goes here
– Input is separated by spaces
– Input can consist of
• parameters
• switches and
• switches with parameter
Input / Control
» Parameters
– Separated by spaces
– Following the program name
– C:\test.exe param1 param2
– May be anything
– Can contain file paths
• C:\test.exe c:\input.txt
– Paths with spaces need to be escaped
• C:\test.exe “c:\document and settings\input.txt”
» Some operating systems or programs want to receive
switches before parameters are given
Control
» Switches
– Separated by spaces
– Often introduced with a prefix (/,-)
– Example
• C:\test.exe /?
(This could display help
information)
– /X –X
(/?, -?, -h, -help)
– Example
• C:\test.exe -s1 -s2 (test.exe should take s1 and s2
into account)
Bring up Console
» Start – Run ‘cmd’ – OK
» Start – Programs – Accessories – Command Prompt
Console Commands
» X: + Enter
– Changes to the specified drive
» cd path .. || \ || ../ || ..\
– Changes the current directory
» copy from to
– Copies a file to a different location
– From may be a list of files concatenated
by +
– Copy f1+f2+f3 destination
Console Commands
» dir
– Lists files and directories
– Switches
• A[DHSRA-],
• B,C,D,L,N,O[N,E,S,E,D,G,-],
• P,Q,S,T[C,A,W],W,X,4
» Playtime
– Sort the directory listing by time
latest one first
Console Commands
» md title
– Creates a directory named ‘title’
» del file_path
– Deletes the file specified by the path
» deltree folder_path || rmdir folder_path
– Deletes the directory specified in
folder path
Playtime
» Create a new directory
» Delete the newly created directory
Console Commands
» fc || comp
– What does it do?
– How can you use it?
» find
– How does it work?
– How can you control it?
Theory II
» Some cli commands and quirks
Output Redirection (Win,nix)
• Stores the output in a file
– Creates a new file ‘> file path’
– Appens to an existing or new file ‘>> file path’
• Example
– C:\ipconfig /all > res.txt
– C:\route print >> res.txt
– Output from ipconfig and route will now be in c:\res.txt
• Pipes
– Standard out (stdout) 1>
– Standard error (stderr) 2>
– B | A redirects stdout of B to stdin of A
Absolute Addressing
• Specify the complete path starting with the drive letter
• Example
– C:\windows\system32\ipconfig.exe
– ?>ipconfig /all >> “C:\Documents and Settings\jens\res.txt”
Relative Addressing
• Specify directions to reach the file
• Current directory is specified before >
• Example
– C:\Documents and Settings\jens> (abbrev. to ?> where ‘?’ can be
any path)
• Directions
– Start from current directory
– ../ go to parent directory
– /dir go into a child directory
• Example
– ?>ipconfig /all >> “../New Folder/test.txt”
Addressing
• You can mix
– Relative addressing
– Absolute addressing
• Make sure to quote parameters that contain whitespace
– E.g.: “C:\Documents and Settings”
• The redirection of output (pipe) can be anywhere on the
commandline
– E.g.: ipconfig /all >> res.txt
– E.g.: ipconfig >> res.txt /all
Running JAVA Programs
• JAVA programs are not executed directly
– They need the Java Virtual Machine (JVM)
– Thus Java needs to be started instead of the program
• Java programs often come as jar files
– This can be passed to the JVM as a parameter
• Example
– Java.exe –jar DNATranslator.jar
– Download from bioinformatics.allmer.de/tools
• Programs available in JAVA
– Many in all areas of biology
Running JAVA Programs
• Java jar files sometimes don’t have a main class
– Then they can be run using class paths
– Java –cp test
– Java –cp FastaEditor.jar fastaeditor.FastaEditorFrm
Other Executables
• Perl has long been popular
–
–
–
–
Full and easy support in LINUX (also in the terminal)
Windows needs activeperl installed (http://www.activestate.com/activeperl)
If installed similar to run as java
perl script.pl
• Python
– Similar to java and perl
– Needs a python interpreter which must be installed
– python script.py
• Ruby, PHP, etc
– Similar to above
Interpreter Languages
• Java, Python, Ruby, Perl, ...
Operating System
exe
cute
s
Interpreter
executes
Program
End Theory II
» Mind mapping 5 min
» 10 min break
Practice II
» CMD
<
• Create a file from some console output
– İpconfig /all > ipout.txt
• Use the find command to find something in the output
– Find /I «searchstring» < ipout.txt
|
• Forward the information from one command to the next
(piping)
– dir | find /I «searchstring»
• Create two text files with some small differences
– fc t1.txt t2.txt | find «searchstring»
Console Commands
• <, >, >>
• Pipe character: |
• PATH variable
PATH
• PATH = %PATH%;<PATHYOUNEED>
JAVA on the Path?
• Is Java on the path?
• If not put it on the path.
• Find out java version (installed)
• Find out java version (jar file)
Run a Class File
•
•
•
•
Download runClass.class from mbg404
Open it in notepad
Search for runClass (name of the file without .class)
The part ‘mbg404/’ in front of the first ‘runClass’ entry, you
find, is the package (folder/directory) it must be in so you
are able to run the class
• Create a folder mbg404
• Move runClass.class into mbg404 folder
• Execute the class
– java mbg404.runClass
– What is the output?
Running Java Programs
• Download the DNATranslator from:
– http://www.biolnk.com
• Run the DNATranslator by
– Double clicking
– Creating a shortcut to it (Right click create shortcut ..)
– Running it from the console
Batch
• Batch Files
– Use notepad to write a text file
– Change the text file to the extension .bat
– How?
• On the console using rename
–
–
–
–
–
Open the file in notepad and just type: dir
Save and close the file
Execute the file by double clicking
Execute the file from the console
Write a script that
•
•
•
•
Clears the screen
creates a new folder
Stores the output of ipconfig in the new folder
Searches for «10.» in the output file
End of Practice II
Term Project
• Groups (3 persons);
– Register the groups with the assistants now
Term Project
• MSA Tools
– Choose 10 tools (preferably try to find new ones not in the list)
•
•
•
•
•
•
Tcoffee, muscle, mafft, kalign, clustalo, clustalw
Multalin, mview, dialign, probcons, clustalw2
Dbclustal, ugene, geneious, clustalx, webprank
Marna, fsa, compass, marna, paganmsa, phylogibbs
Mummals, alignm, amap, cobalt, mavid
Phylo, mcoffee, decipher, baliphy, saga
– Deadline: 05.03.2017
– We will assign three tools on a fcfs basis to each group
Term Project
• 1st real task
• Three page review about your groups 3 MSA tools
•
12 pt Arial or similar
• Read the rough writing guidelines before
– It would be good to learn more about writing in addition
» Ensure to use Grammarly
– http://gram.ly/y6RL
• Deadline: 19.03.2017
Term Project
• 2nd real task
• Research article about the results you achieved with the
given data
• Reread the rough writing guidelines before
– Apply our suggestions for the review
• Deadline: 23.04.2017
Term Project
• Presentation
• 16.05.2017
• All group must present to get credit
• Each group has 10 min max
• Concentrate on methodology, results, and discussion
Term Project
• Final Term Paper
• Deadline: 28.05.2017
• Make sure to incorporate all criticism we gave during the
presentation
Data
• DNA and Protein reference data
• Each group has to run all tools on all data.
• Therefore
• At least one program for DNA
• At least one program for protein
• One program (must) for protein and DNA alignment
Comparison
• SuiteMSA scoring will be used
• Download tutorial and give a try on example datasets of
SuiteMSA
• One program output vs Reference alignment (This is true
alignment that you know. Think it like marker in SDS gel)
• Compute scores and compare
Other Options to CMD and Batch
• Windows scripting host
– http://en.wikipedia.org/wiki/Windows_Script_Host
• Powershell (real programming)
– http://en.wikipedia.org/wiki/Windows_PowerShell
• Too much for this course, but if you are interested ...