Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational Biology Dr. Jens Allmer MBG404 Lecture Slides Week 2 What is a computer? What is a program? What is a Computer? A manipulator of data Four operations computers perform: • input of data • • • processing of data output of data storage of data Computers do not rate data; they manipulate it Garbage in = garbage out The Basics What is a computer? Input of data Storage of data Processing of data Output of data The Basics Terminology: Hardware: physical components of a computer; what you can touch and feel Software: instructions the hardware uses Some hardware includes software (drivers, firmware) Program: a set of instructions Operating system: a program that allows applications to communicate with the computer; the interface between the human and the computer Applications: program that runs cooperatively with an operating system. What we use to “compute” Operating Systems What is an Operating System? Interprets input from mouse and keyboard Creates the Graphical User Interface (GUI) Finds files and directories on the hard disk (HDD) Creates the monitor images Gives meanings to the buttons Knows where components are and sends information to the right place OS: software for your computer Operating Systems What an operating system does: You Applications “Programs” Operating System Your Computer’s Hardware How the OS Works A program is a collection of pieces: Primary piece is called the kernel Buttons in applications are objects “I must know my part of the process” Save OpenOffice Writer Linux Save Installing Applications Loaded software: Typically stored in C:\Program Files An application is in pieces: kernel and supporting Microsoft Office: 8 applications, 3,286 files Microsoft Office Program Files Terminology » » » » Program Application Software Operating System Programs in Biology » Sequence Alignment – BLAST – FASTA – CUSTALW – TEACOFFEE – ... » MS/MS Proteomics – MASCOT – OMSSA – Sequest – PEAKS – ... » Other more Targeted Purposes Servers and Websites » Example: NCBI » Provide processing power (many computing nodes) » Provide programs for you to use » Provide GUI » Is shared with other users (Queue) Locally Run Programs » Example: BLAST » Only your PC computes » Programs available on your computer (downloaded installed) » Some with some without GUI » Not shared » Additional features may be activated different environment variables (e.g.: sequence db) may be used KNIME || RapidMiner || Orange » Datamining » Data visualization » Data analysis » Machine Learing Microsoft Access or Open Office Base » Databases » Store and manipulate large amount of data » Organize and relate data » Create reports End Theory I » Mind mapping 5 min » Break 10 min Practice I » The command line interface » Console Console Programs » Input – Typing (responding to questions) – Files – Parameters » Control – Switches – Parameters – INI files » Output – Echoed to screen – Files Console Programs » Program path/Program Title.Program Extention – C:\test.exe – C:/test.bat – C:\folder1\folder2\test.exe » Giving information to the program – Any input on the command line follows the program – C:/test.bat input goes here – Input is separated by spaces – Input can consist of • parameters • switches and • switches with parameter Input / Control » Parameters – Separated by spaces – Following the program name – C:\test.exe param1 param2 – May be anything – Can contain file paths • C:\test.exe c:\input.txt – Paths with spaces need to be escaped • C:\test.exe “c:\document and settings\input.txt” » Some operating systems or programs want to receive switches before parameters are given Control » Switches – Separated by spaces – Often introduced with a prefix (/,-) – Example • C:\test.exe /? (This could display help information) – /X –X (/?, -?, -h, -help) – Example • C:\test.exe -s1 -s2 (test.exe should take s1 and s2 into account) Bring up Console » Start – Run ‘cmd’ – OK » Start – Programs – Accessories – Command Prompt Console Commands » X: + Enter – Changes to the specified drive » cd path .. || \ || ../ || ..\ – Changes the current directory » copy from to – Copies a file to a different location – From may be a list of files concatenated by + – Copy f1+f2+f3 destination Console Commands » dir – Lists files and directories – Switches • A[DHSRA-], • B,C,D,L,N,O[N,E,S,E,D,G,-], • P,Q,S,T[C,A,W],W,X,4 » Playtime – Sort the directory listing by time latest one first Console Commands » md title – Creates a directory named ‘title’ » del file_path – Deletes the file specified by the path » deltree folder_path || rmdir folder_path – Deletes the directory specified in folder path Playtime » Create a new directory » Delete the newly created directory Console Commands » fc || comp – What does it do? – How can you use it? » find – How does it work? – How can you control it? Theory II » Some cli commands and quirks Output Redirection (Win,nix) • Stores the output in a file – Creates a new file ‘> file path’ – Appens to an existing or new file ‘>> file path’ • Example – C:\ipconfig /all > res.txt – C:\route print >> res.txt – Output from ipconfig and route will now be in c:\res.txt • Pipes – Standard out (stdout) 1> – Standard error (stderr) 2> – B | A redirects stdout of B to stdin of A Absolute Addressing • Specify the complete path starting with the drive letter • Example – C:\windows\system32\ipconfig.exe – ?>ipconfig /all >> “C:\Documents and Settings\jens\res.txt” Relative Addressing • Specify directions to reach the file • Current directory is specified before > • Example – C:\Documents and Settings\jens> (abbrev. to ?> where ‘?’ can be any path) • Directions – Start from current directory – ../ go to parent directory – /dir go into a child directory • Example – ?>ipconfig /all >> “../New Folder/test.txt” Addressing • You can mix – Relative addressing – Absolute addressing • Make sure to quote parameters that contain whitespace – E.g.: “C:\Documents and Settings” • The redirection of output (pipe) can be anywhere on the commandline – E.g.: ipconfig /all >> res.txt – E.g.: ipconfig >> res.txt /all Running JAVA Programs • JAVA programs are not executed directly – They need the Java Virtual Machine (JVM) – Thus Java needs to be started instead of the program • Java programs often come as jar files – This can be passed to the JVM as a parameter • Example – Java.exe –jar DNATranslator.jar – Download from bioinformatics.allmer.de/tools • Programs available in JAVA – Many in all areas of biology Running JAVA Programs • Java jar files sometimes don’t have a main class – Then they can be run using class paths – Java –cp test – Java –cp FastaEditor.jar fastaeditor.FastaEditorFrm Other Executables • Perl has long been popular – – – – Full and easy support in LINUX (also in the terminal) Windows needs activeperl installed (http://www.activestate.com/activeperl) If installed similar to run as java perl script.pl • Python – Similar to java and perl – Needs a python interpreter which must be installed – python script.py • Ruby, PHP, etc – Similar to above Interpreter Languages • Java, Python, Ruby, Perl, ... Operating System exe cute s Interpreter executes Program End Theory II » Mind mapping 5 min » 10 min break Practice II » CMD < • Create a file from some console output – İpconfig /all > ipout.txt • Use the find command to find something in the output – Find /I «searchstring» < ipout.txt | • Forward the information from one command to the next (piping) – dir | find /I «searchstring» • Create two text files with some small differences – fc t1.txt t2.txt | find «searchstring» Console Commands • <, >, >> • Pipe character: | • PATH variable PATH • PATH = %PATH%;<PATHYOUNEED> JAVA on the Path? • Is Java on the path? • If not put it on the path. • Find out java version (installed) • Find out java version (jar file) Run a Class File • • • • Download runClass.class from mbg404 Open it in notepad Search for runClass (name of the file without .class) The part ‘mbg404/’ in front of the first ‘runClass’ entry, you find, is the package (folder/directory) it must be in so you are able to run the class • Create a folder mbg404 • Move runClass.class into mbg404 folder • Execute the class – java mbg404.runClass – What is the output? Running Java Programs • Download the DNATranslator from: – http://www.biolnk.com • Run the DNATranslator by – Double clicking – Creating a shortcut to it (Right click create shortcut ..) – Running it from the console Batch • Batch Files – Use notepad to write a text file – Change the text file to the extension .bat – How? • On the console using rename – – – – – Open the file in notepad and just type: dir Save and close the file Execute the file by double clicking Execute the file from the console Write a script that • • • • Clears the screen creates a new folder Stores the output of ipconfig in the new folder Searches for «10.» in the output file End of Practice II Term Project • Groups (3 persons); – Register the groups with the assistants now Term Project • MSA Tools – Choose 10 tools (preferably try to find new ones not in the list) • • • • • • Tcoffee, muscle, mafft, kalign, clustalo, clustalw Multalin, mview, dialign, probcons, clustalw2 Dbclustal, ugene, geneious, clustalx, webprank Marna, fsa, compass, marna, paganmsa, phylogibbs Mummals, alignm, amap, cobalt, mavid Phylo, mcoffee, decipher, baliphy, saga – Deadline: 05.03.2017 – We will assign three tools on a fcfs basis to each group Term Project • 1st real task • Three page review about your groups 3 MSA tools • 12 pt Arial or similar • Read the rough writing guidelines before – It would be good to learn more about writing in addition » Ensure to use Grammarly – http://gram.ly/y6RL • Deadline: 19.03.2017 Term Project • 2nd real task • Research article about the results you achieved with the given data • Reread the rough writing guidelines before – Apply our suggestions for the review • Deadline: 23.04.2017 Term Project • Presentation • 16.05.2017 • All group must present to get credit • Each group has 10 min max • Concentrate on methodology, results, and discussion Term Project • Final Term Paper • Deadline: 28.05.2017 • Make sure to incorporate all criticism we gave during the presentation Data • DNA and Protein reference data • Each group has to run all tools on all data. • Therefore • At least one program for DNA • At least one program for protein • One program (must) for protein and DNA alignment Comparison • SuiteMSA scoring will be used • Download tutorial and give a try on example datasets of SuiteMSA • One program output vs Reference alignment (This is true alignment that you know. Think it like marker in SDS gel) • Compute scores and compare Other Options to CMD and Batch • Windows scripting host – http://en.wikipedia.org/wiki/Windows_Script_Host • Powershell (real programming) – http://en.wikipedia.org/wiki/Windows_PowerShell • Too much for this course, but if you are interested ...