Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Python for Biologists Katerina Taškova1 1 Faculty Jean-Fred Fontaine1,2 of Biology, Johannes Gutenberg-Universität Mainz, Mainz, Germany 2 Genomics and Computational Biology, Kernel Press, Mainz, Germany https://cbdm.uni-mainz.de/mb17 March 21, 2017 Introduction to Python for Biologists – Table of Contents Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 2 Introduction to Python for Biologists – Introduction Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 3 Introduction to Python for Biologists – Introduction What is Python? Python is a general-purpose programming language Python design philosophy created by Guido van Rossum (1991) high-level (abstraction from the details of the computer) interpreted (needs an interpreter software) code readability syntax brevity Python is widely used for Biology March 21, 2017 rich built-in features powerful scientific extensions plotting capabilities Johannes Gutenberg-Universität Mainz Taškova & Fontaine 4 Introduction to Python for Biologists – Introduction Structured programming I Instructions are executed sequentially, one per line Conditional statements allow selective execution of code blocks Loops allow repeated execution of code blocks Functions allow on-demand execution of code blocks March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 5 Introduction to Python for Biologists – Introduction Structured programming II 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 instruction 1 # 1 s t i n s t r u c t i o n ( hashtag # s t a r t s comments ) # blank l i n e r e p e a t 20 t i m e s # 2nd i n s t r u c t i o n ( l o o p s t a r t s a b l o c k ) i n s t r u c t i o n a # b l o c k d e f i n e d by i n d e n t a t i o n ( spaces o r t a b s ) i n s t r u c t i o n b # 2nd i n s t r u c t i o n i n b l o c k # blank l i n e i f n>10 # 3rd i n s t r u c t i o n ( C o n d i t i o n a l statement ) i n s t r u c t i o n a # 1 st i n s t r u c t i o n in block i n s t r u c t i o n b # 2nd i n s t r u c t i o n i n b l o c k # blank l i n e # blank l i n e # backslashs j o i n l i n e s instruction 3 \ # 3rd i n s t r u c t i o n , p a r t 1 instruction 3 # 3rd i n s t r u c t i o n , p a r t 2 # blank l i n e # Expressions i n ( ) , {} , o r [ ] can span m u l t i p l e l i n e s i n s t r u c t i o n 4 (1 , 2 , 3 # 4th instruction , part 1 4 , 5 , 6) # 4 th i n s t r u c t i o n , p art 2 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 6 Introduction to Python for Biologists – Introduction Namespace Variables are names associated with data Functions are names associated to specific code blocks e.g. a=2 assigns value 2 to variable a built-in functions are available (see list on slide 100) e.g. print(a) will display ’2’ on the screen The user namespace is the set of names available to the user March 21, 2017 users can define new names of variables and functions in their namespace imported modules can add names of variables and functions in the user namespace Johannes Gutenberg-Universität Mainz Taškova & Fontaine 7 Introduction to Python for Biologists – Introduction Object-oriented programming Data is organized in classes and objects a class is a template defining what objects can store and do an object is an instance of a class objects have attributes to store data and methods to do actions object namespaces are different from user namespace Example class ”Human” is defined as: 1 2 3 4 March 21, 2017 has a name (an attribute ”name”) has an age (an attribute ”age”) can introduce itself (a method ”who”) example with 1 existing Human object P1: P1 . name = ” Mary ” P1 . age = 26 P1 . who ( ) who ( ) # # # # a s s i g n s v a l u e t o a t t r i b u t e name a s s i g n s v a l u e t o a t t r i b u t e age d i s p l a y s ”My name i s Mary I am 2 6 ! ” e r r o r ! n o t i n t h e user namespace Johannes Gutenberg-Universität Mainz Taškova & Fontaine 8 Introduction to Python for Biologists – Introduction Modules Modules can add functionalities to Python Example of available modules: 1 2 3 4 e.g. classes and functions NumPy for scientific computing Matplotlib for plotting BioPython for Biology Modules have to be imported into the code # i m p o r t d a t e t i m e module i n i t s own namespace import datetime d a t e t i m e . date . today ( ) # 2017−03−16 today ( ) # error ! 5 6 7 8 9 # i m p o r t f u n c t i o n s l o g 2 and log10 from module math # i n c u r r e n t namespace from math i m p o r t log2 , log10 log10 ( 1 ) # equal 0 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 9 Introduction to Python for Biologists – Running code Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 10 Introduction to Python for Biologists – Running code Running code I 1 2 3 4 From a terminal by using the interactive Python shell $ python3 a=2 b=3 exit () 2 opens Python s h e l l assigns 2 to a assigns 3 to b c l o s e s Python s h e l l From a terminal by running a script file 1 # # # # e.g. let say myscript.py is a script file (simple text file) and it contains: print(”hello world!”) $ python3 m y s c r i p t . py h e l l o world ! March 21, 2017 # runs python3 and t h e s c r i p t # r e s u l t o f t h e s c r i p t on t h e t e r m i n a l Johannes Gutenberg-Universität Mainz Taškova & Fontaine 11 Introduction to Python for Biologists – Running code Running code II From Jupyter Notebook March 21, 2017 web-based graphical interface manage cells of code or text see execution results on the same notebook save/open notebooks Johannes Gutenberg-Universität Mainz Taškova & Fontaine 12 Introduction to Python for Biologists – Running code Documentation and messages I Documentation and help: https://docs.python.org/3 use the built-in help() function e.g. help(print) to display help for function print() see help menu or Google it Examples of error messages 1 2 3 4 5 6 # F o r g e t t i n g quotes p r i n t ( Hello world ) # F i l e ”< s t d i n >” , l i n e 2 # p r i n t ( Hello world ) # ˆ # SyntaxError : i n v a l i d syntax March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 13 Introduction to Python for Biologists – Running code Documentation and messages II 1 2 3 4 5 1 2 3 4 5 6 7 # S p e l l i n g mistakes p r i n ( ” Hello world ” ) # Traceback ( most r e c e n t c a l l l a s t ) : # F i l e ”< s t d i n >” , l i n e 2 , i n <module> # NameError : name ’ p r i n ’ i s n o t d e f i n e d # Wrong l i n e break w i t h i n a s t r i n g p r i n t ( ” Hello World ” ) # F i l e ”< s t d i n >” , l i n e 2 # p r i n t ( ” Hello # ˆ # S y n t a x E r r o r : EOL w h i l e scanning s t r i n g l i t e r a l March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 14 Introduction to Python for Biologists – Literals and variables Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 15 Introduction to Python for Biologists – Literals and variables Numeric and strings literals I 1 2 3 4 # Numeric l i t e r a l s 12 −123 1 . 6E3 # means 1600 5 6 7 8 9 10 11 12 13 # Strings l i t e r a l s ’A s t r i n g ’ ’A ” s t r i n g ” ’ ”A ’ s t r i n g ’ ” ’ ’ ’ Three s i n g l e quotes ’ ’ ’ ” ” ” Three double quotes ” ” ” ’A \ ’ s t r i n g \ ’ ’ r ’A \ ’ s t r i n g \ ’ ’ # # # # # # # A string A ” string ” A ’ string ’ Three s i n g l e quotes Three double quotes A ’ s t r i n g ’ ( backslash escape sequence ) A \ ’ s t r i n g \ ’ ( raw s t r i n g ) Python stores literals in objects of corresponding classes (class int for integers, float for floatting point, and str for strings) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 16 Introduction to Python for Biologists – Literals and variables Numeric and strings literals II Printing numeric and strings literals 1 2 p r i n t ( 1 2 ) # 12 p r i n t (1+2) # 3 3 4 p r i n t ( ’ H e l l o World ’ ) # H e l l o World 5 6 7 8 p r i n t ( ’ H e l l o World ’ , 1+2) p r i n t ( ’ H e l l o World ’ , 1+2 , sep= ’− ’ ) p r i n t ( ’ H e l l o World ’ , 1+2 , sep= ’ \ t ’ ) 9 # # # # H e l l o World 3 H e l l o World−3 H e l l o World 3 ( \ t : tab , \n : n e w l i n e ) 10 11 12 p r i n t ( ’AB ’ , end= ’ ’ ) # AB ( a v o i d n e w l i n e a t t h e end ) p r i n t ( ’CD ’ ) # ABCD 13 14 p r i n t ( ’ Max i s ’ , 12 , ’ and Min i s ’ , 3 ) # Max i s 12 and Min i s 3 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 17 Introduction to Python for Biologists – Literals and variables Variables I Variables are names used to access objects first letter is a character (not a digit) no space characters allowed case-sensitive (variable name var is not Var) prefer alphanumeric characters (e.g. abc123) avoid accents, non-alphanumeric, non English underscores may be used (e.g. abc 123) The following keywords can not be used as variable names and, assert, break, class, continue def, del, elif, else, except, exec, finally, for, from global, if, import, in, is, lambda, not, or, pass print, raise, return, try, while, yield March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 18 Introduction to Python for Biologists – Literals and variables Variables II 1 2 3 4 5 6 7 8 # Numeric t y p e s a=2 # a i s assigned an i n t o b j e c t o f v a l u e 2 p r i n t ( a ) # p r i n t s t h e o b j e c t assigned t o a ( 2 ) b=a # b i s assigned t h e same o b j e c t as a ( 2 ) print (b) # 2 a=5 # a i s assigned a new o b j e c t o f v a l u e 5 print (a) # 5 p r i n t ( b ) # 2 ( b i s s t i l l assigned t o o b j e c t o f v a l u e 2 ) 9 10 11 12 13 # Strings c1= ’ a ’ p r i n t ( c1 ) # ’ a ’ myName125 = ’ abc ’ March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 19 Introduction to Python for Biologists – Numeric types Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 20 Introduction to Python for Biologists – Numeric types Numeric types I 1 2 3 type ( 7 ) # <c l a s s ’ i n t ’> ( i n t e g e r number ) type ( 8 . 2 5 ) # <c l a s s ’ f l o a t ’> ( f l o a t i n g p o i n t ) t y p e ( 4 . 5 2 e−3) # <c l a s s ’ f l o a t ’> ( f l o a t i n g p o i n t ) 4 5 6 7 8 9 10 11 12 # Operators ( s p e c i a l b u i l t −i n f u n c t i o n s ) 1 + 3 # 4 ( addition ) 4 − 1 # 3 ( substraction ) 3 ∗ 2 # 6 ( multiplication ) 9 / 2 # 4.5 ( d i v i s i o n ) 9 // 2 # 4 ( integer division ) 9 % 2 # 1 ( i n t e g e r d i v i s i o n remainder ) 2∗∗3 # 8 ( exponent ) 13 14 15 16 17 18 # Lowest t o h i g h e s t o p e r a t o r s precedence ( equal i f on same l i n e ) +,− # Addition , Subtraction ∗ , / , / / , % # M u l t i p l i c a t i o n , D i v i s i o n s , Remainder +x , −x # P o s i t i v e , Negative ∗∗ # Exponentiation March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 21 Introduction to Python for Biologists – Numeric types Numeric types II 1 2 3 # B u i l t −i n f u n c t i o n s abs ( −2.58) # 2.58 ( a b s o l u t e v a l u e o f x ) round ( 2 . 5 ) # 2 ( round t o c l o s e s t i n t e g e r ) 4 5 6 7 8 9 10 # a b c d d With v a r i a b l e s = 1 # 1 = 1 + 1 # 2 = a + b # 3 = a+c∗b # 7 ( precedence o f ∗ over +) = ( a+c ) ∗b # 8 ( use parentheses t o break precedence ) 11 12 13 14 # S h o r t n o t a t i o n s ( v a l i d f o r + , −, ∗ , / , a += 1 # a = a + 1 a ∗= 5 # a = a ∗ 5 ...) 15 16 17 18 # Special f l o a t values f l o a t ( ’NaN ’ ) # nan ( Not a Number ) f l o a t ( ’ I n f ’ ) # i n f : I n f i n i t e p o s i t i v e ; −i n f : I n f i n i t e negative March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 22 Introduction to Python for Biologists – Strings Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 23 Introduction to Python for Biologists – Strings Sequence types Text sequence type: Strings: immutable sequences of characters Basic sequence types: Lists: mutable sequences Tuples: immutable sequences Ranges: immutable sequence of numbers Sequence operations: All sequence types support common sequence operations (slide 98) Mutable sequence types support specific operations (slide 99) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 24 Introduction to Python for Biologists – Strings Strings I 1 2 3 4 5 6 # Quotes ’A s t r i n g ’ ’A ” s t r i n g ” ’ ”A ’ s t r i n g ’ ” ’ ’ ’ Three s i n g l e quotes ’ ’ ’ ” ” ” Three double quotes ” ” ” # # # # # A string A ” string ” A ’ string ’ Three s i n g l e quotes Three double quotes 7 8 9 10 11 12 # Escape sequences ( see annexes ) ” A s i n g l e quote ’ ” # A s i n g l e quote ’ ’A s i n g l e quote \ ’ ’ # A s i n g l e quote ’ ”A t a b u l a t i o n \t ” ”A newline \n ” See other escape sequences in slide 97 Triple quoted strings may span multiple lines - all associated whitespace will be included in the string literal March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 25 Introduction to Python for Biologists – Strings Strings II 1 2 3 4 5 # Operators ’ pipe ’ + ’ t t e ’ ’A ’ ∗7 ’A ’ ∗3 + ’C ’ ∗2 ’A ’ + s t r ( 2 . 0 ) # # # # = ’ p i p e t t e ’ ( concatenation ) = ’AAAAAAA ’ ( r e p l i c a t i o n ) = ’AAACC ’ = ’A2 . 0 ’ ( c o n v e r t number then concatenate ) 6 7 8 9 # B u i l t −i n f u n c t i o n s l e n ( ’A s t r i n g o f c h a r a c t e r s ’ ) # 22 ( l e n g t h i n c h a r a c t e r s ) t y p e ( ’ a ’ ) # <c l a s s ’ s t r ’> ( s t r i n g ) 10 11 12 13 14 15 16 # S l i c e s [ s t a r t : end : s t e p ] ( 0 i s i n d e x o f f i r s t c h a r a c t e r ) ”ABCDEFG” [ 2 : 5 ] # ’CDE ’ ( F a t i n d e x 5 excluded ) ”ABCDEFG” [ : 5 ] # ’ABCDE ’ ( from b e g i n i n g ) ”ABCDEFG” [ 5 : ] # ’FG ’ ( t o t h e end ) ”ABCDEFG” [ − 2 : ] # ’FG ’ (−2 from t h e end : t o t h e end ) ”ABCDEFG” [ 0 : 5 : 2 ] # ’ACE ’ ( every second l e t t e r w i t h s t e p =2) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 26 Introduction to Python for Biologists – Strings Strings methods I Strings are immutable: new objects are created for changes 1 seq = ”ACGtCCAgTnAGaaGT” 2 3 4 5 6 7 8 # Case seq . c a p i t a l i z e ( ) seq . c a s e f o l d ( ) seq . l o w e r ( ) seq . swapcase ( ) seq . upper ( ) # # # # # ’ Acgtccagtnagaagt ’ ’ a cg t c c a g t n a g a a g t ’ ( e s z e t t => ” ss ” ) ’ a cg t c c a g t n a g a a g t ’ ( e s z e t t => e s z e t t ) ’ acgTccaGtNagAAgt ’ ’ACGTCCAGTNAGAAGT ’ 9 10 11 12 13 14 15 16 17 # Search and r e p l a c e seq . count ( ’ a ’ ) seq . count ( ’G ’ , 0 , 4 ) seq . endswith ( ’GT ’ ) seq . endswith ( ’G ’ , 0 , 4 ) seq . f i n d ( ’ GtC ’ ) seq . r e p l a c e ( ” aa ” , ” t t ” ) seq . r e p l a c e ( ” A ” , ” x ” , 2 ) March 21, 2017 # # # # # # # 2 ( case s e n s i t i v e ) 1 ( s l i c e s t a r t and end indexes ) True False ( s l i c e s t a r t and end indexes ) 2 ( 1 s t h i t index , −1 o t h e r w i s e ) ’ ACGtCCAgTnAGttGT ’ ( case s e n s i t i v e ) ’ xCGtCCxgTnAGaaGT ’ ( 2 f i r s t h i t s o n l y ) Johannes Gutenberg-Universität Mainz Taškova & Fontaine 27 Introduction to Python for Biologists – Strings Strings methods II 1 seq = ”ACGtCCAgTnAGaaGT” 2 3 4 5 6 7 8 9 # Is functions seq . isalnum ( ) seq . i s a l p h a ( ) seq . i s l o w e r ( ) seq . i s n u m e r i c ( ) seq . i s s p a c e ( ) seq . i s u p p e r ( ) # # # # # # True True False False False False ( Are ( Are ( Are ( Are ( Are ( Are all all all all all all c h a r a c t e r s alphanumeric ? ) characters alphabetic ?) c h a r a c t e r s lowercase ? ) numeric c h a r a c t e r s ? ) whitespace c h a r a c t e r s ? ) c h a r a c t e r s uppercase ? ) 10 11 12 13 14 15 16 # J o i n and s p l i t ”−” . j o i n ( [ ” A ” , ” B ” ] ) ”−” . j o i n ( seq ) seq . p a r t i t i o n ( ” aa ” ) seq . s p l i t ( ” aa ” ) ’ 1\n2 ’ . s p l i t l i n e s ( ) March 21, 2017 # # # # # ’ A−B ’ ’ A−C−G−t −C−C−A−g−T−n−A−G−a−a−G−T ’ ( ’ ACGtCCAgTnAG ’ , ’ aa ’ , ’GT ’ ) : a t u p l e [ ’ ACGtCCAgTnAG ’ , ’GT ’ ] : a list [ ’ 1 ’ , ’ 2 ’ ] ( s p l i t a t l i n e boundaries \ r , \n ) Johannes Gutenberg-Universität Mainz Taškova & Fontaine 28 Introduction to Python for Biologists – Strings Strings methods III 1 seq = ”ACGtCCAgTnAGaaGT” 2 3 4 5 6 # Deleting seq . l s t r i p ( ) seq . r s t r i p ( ) seq . s t r i p ( ) # remove l e a d i n g whitespace c h a r a c t e r s # remove t r a i l i n g whitespace c h a r a c t e r s # remove whitespace c h a r a c t e r s from both ends 7 8 9 10 11 seq . l s t r i p ( ”AC” ) # ’GtCCAgTnAGaaGT ’ ( remove C ’ s o r A ’ s ) seq . l s t r i p ( ”CA” ) # ’GtCCAgTnAGaaGT ’ ( remove C ’ s o r A ’ s ) seq . l s t r i p ( ”C” ) # ’ACGtCCAgTnAGaaGT ’ ( no impact ) # same f o r r s t r i p b u t from t h e r i g h t and s t r i p from both ends 12 13 14 # Simple p a r s i n g o f t e x t l i n e s from CSV f i l e s l i n e . s t r i p ( ) . s p l i t ( ’ , ’ ) # remove n e w l i n e and s p l i t CSV ( \ t i f TSV) 15 16 17 18 # t r a n s l a t e ( case s e n s i t i v e ) t a b l e = seq . maketrans ( ’ a t c g ’ , ’ t a g c ’ ) # map c h a r a c t e r s by i n d e x seq . l o w e r ( ) . t r a n s l a t e ( t a b l e ) # ’ tgcaggtcantcttca ’ March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 29 Introduction to Python for Biologists – – Exercise– Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 30 Introduction to Python for Biologists – – Exercise– Exercise Create the following directory structure Dokumente python notebooks data Jupyter Notebook File: Literals.ipynb URL: https://cbdm.uni-mainz.de/mb17 Download the file into the notebooks folder March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 31 Introduction to Python for Biologists – Lists, tuples and ranges Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 32 Introduction to Python for Biologists – Lists, tuples and ranges Sequence types Text sequence type: Strings: immutable sequences of characters Basic sequence types: Lists: mutable sequences Tuples: immutable sequences Ranges: immutable sequence of numbers Sequence operations: All sequence types support common sequence operations (slide 98) Mutable sequence types support specific operations (slide 99) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 33 Introduction to Python for Biologists – Lists, tuples and ranges Lists I A List is an ordered collection of objects 1 L i s t 1 = [ ] # an empty l i s t 2 3 4 5 6 7 List1 = [ ’b ’ , ’a ’ , List1 [0] # ’b ’ List1 [1] # ’a ’ L i s t 1 [ −1] # ’ F ’ L i s t 1 [ −2] # ’ dog ’ 1 , ’ cat ( access ( access ( access ( access ’ , ’K ’ , ’ dog ’ , ’ F ’ ] item of index 0) item of index 1) the l a s t item ) t h e second l a s t i t e m ) 8 9 10 11 12 13 14 # Slices [ s t a r t List1 [2:5] # List1 [ : 5 ] # List1 [ 5 : ] # L i s t 1 [ −2:] # List1 [0:5:2] # March 21, 2017 : end : s t e p ] [ 1 , ’ c a t ’ , ’K ’ ] ( i n d e x 5 excluded ) [ ’ b ’ , ’ a ’ , 1 , ’ c a t ’ , ’K ’ ] [ ’ dog ’ , ’ F ’ ] [ ’ dog ’ , ’ F ’ ] [ ’ b ’ , 1 , ’K ’ ] Johannes Gutenberg-Universität Mainz Taškova & Fontaine 34 Introduction to Python for Biologists – Lists, tuples and ranges Lists II 1 2 3 4 5 6 # B u i l t −i n f u n c t i o n s List2 = [1 , 2 , 3 , 4 , 5] len ( List1 ) # 5 ( length = 7 items ) max ( L i s t 2 ) # 5 min ( L i s t 2 ) # 1 sum ( L i s t 2 ) # 15 7 8 9 10 11 12 13 14 15 16 17 # L i s t methods List2 = [ ] L i s t 2 . append ( 1 ) L i s t 2 . append ( ’A ’ ) L i s t 2 . extend ( [ ’B ’ , 2 ] ) L i s t 2 . pop ( 2 ) L i s t 2 . i n s e r t ( 3 , ’A ’ ) L i s t 2 . i n d e x ( ’A ’ ) L i s t 2 . count ( ’A ’ ) L i s t 2 . reverse ( ) March 21, 2017 # # # # # # # # # empty l i s t [1] [ 1 , ’A ’ ] [ 1 , ’A ’ , ’B ’ , 2 ] [ 1 , ’A ’ , 2 ] [ 1 , ’A ’ , 2 , ’A ’ ] ( i n s e r t 1 ( i n d e x o f t h e 1 s t ’A ’ ) 2 ( number o f ’A ’ ) [ ’ A ’ , 2 , ’A ’ , 1 ] Johannes Gutenberg-Universität Mainz Taškova & Fontaine ’A ’ a t i n d e x 3 ) 35 Introduction to Python for Biologists – Lists, tuples and ranges Lists III 1 2 3 4 5 6 # sorting List3 = [5 , 3 , 4 , 1 , 2] sorted ( List3 ) # [1 , 2 , 3 , 4 , List3 # [5 , 3 , 4 , 1 , L i s t 3 . s o r t ( ) # modifies the List3 # [1 , 2 , 3 , 4 , 5 ] ( b u i l d a new s o r t e d l i s t ) 2 ] ( L i s t 3 n o t changed ) l i s t i n−p l a c e 5 ] ( . s o r t ( ) d i d modify L i s t 3 ! ) 7 8 9 10 11 12 13 14 15 # nested l i s t / myList = [ [ ’ b ’ [ 1 myList [ 0 ] myList [ 0 ] [ 0 ] myList [ 0 ] [ 1 ] myList [ 1 ] myList [ 1 ] [ 0 ] = 1 0 March 21, 2017 2D l i s t s / t a b l e s , ’a ’ ] , , ’ cat ’ ] ] # a l i s t of 2 l i s t s # r e t u r n s the f i r s t l i s t [ ’ b ’ , ’ a ’ ] # ’ b ’ (1 s t item of the 1 s t l i s t ) # ’ a ’ ( 2 nd i t e m o f t h e 1 s t l i s t ) # r e t u r n s t h e 2nd l i s t [ 1 , ’ c a t ’ ] # [ [ ’ b ’ , ’a ’ ] , [10 , ’ cat ’ ] ] Johannes Gutenberg-Universität Mainz Taškova & Fontaine 36 Introduction to Python for Biologists – Lists, tuples and ranges Lists IV 1 2 myList = [ [ ’ b ’ , ’ a ’ ] , [ 1 , ’ cat ’ ] ] 3 4 5 6 7 8 9 10 f o r s u b l i s t i n myList : f o r value i n s u b l i s t : p r i n t ( value ) # b # a # 10 # cat # l o o p over s u b l i s t s # l o o p over v a l u e s # p r i n t 1 v a l u e per l i n e 11 12 13 14 15 16 f o r s u b l i s t i n myList : # l o o p over s u b l i s t s n e w s u b l i s t = map( s t r , s u b l i s t ) # c o n v e r t each i t e m t o s t r i n g p r i n t ( ’\ t ’ . join ( new sublist ) ) # p r i n t as TSV t a b l e # b a # 10 c a t March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 37 Introduction to Python for Biologists – Lists, tuples and ranges Tuples and ranges A Tuple is an ordered collection of objects 1 2 Tuple1 = ( ) # empty t u p l e Tuple1 = ( ’ b ’ , ’ a ’ , 1 , ’ c a t ’ , ’K ’ , ’ dog ’ , ’ F ’ ) # d e f i n e d t u p l e 3 4 5 Tuple1 [ 0 ] Tuple1 [ 1 : 3 ] # ’b ’ # ( ’ a ’ , 1 ) ( i n d e x 3 excluded ) Ranges 1 2 3 4 5 6 7 # Range ( s t a r t , s t o p [ , s t e p ] ) range ( 1 0 ) # range ( 0 , 10) => no n i c e p r i n t method l i s t ( range ( 1 0 ) ) # [0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9] l i s t ( range ( 0 , 30 , 5 ) ) # [ 0 , 5 , 10 , 15 , 20 , 2 5 ] l i s t ( range ( 0 , −5, −1) ) # [ 0 , −1, −2, −3, −4] l i s t ( range ( 0 ) ) # [] l i s t ( range ( 1 , 0 ) ) # [] March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 38 Introduction to Python for Biologists – Sets and dictionaries Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 39 Introduction to Python for Biologists – Sets and dictionaries Sets I A Set is a mutable unordered collection of objects 1 2 3 4 5 S0 = s e t ( ) S0 = { ’ a ’ , 1} S1 = { ’ a ’ , 1 , ’ b ’ , ’R ’ } S2 = { ’ a ’ , 1 , ’ b ’ , ’S ’ } l e n ( S0 ) # # # # # an empty s e t a new s e t o f 2 i t e m s a new s e t o f 4 i t e m s a new s e t o f 4 i t e m s 2 6 7 8 9 10 11 12 13 14 15 16 17 # Operators ’R ’ i n S1 ’R ’ n o t i n S2 S1 − S2 S1 | S2 S1 & S2 S1 ˆ S2 S0 <= S1 S1 >= S2 S1 >= S0 S0 . i s d i s j o i n t ( S1 ) March 21, 2017 # # # # # # # # # # True True i n S1 i n S1 i n S1 i n S1 S0 i s S1 i s True False b u t n o t i n S2 => { ’R ’ } o r i n S2 => {1 , ’ a ’ , ’S ’ , ’R ’ , ’ b ’ } and i n S2 => {1 , ’ b ’ , ’ a ’ } o r i n S2 b u t n o t i n both => { ’R ’ , ’S ’ } subset o f S2 => True s u p e r s e t o f S2 => False Johannes Gutenberg-Universität Mainz Taškova & Fontaine 40 Introduction to Python for Biologists – Sets and dictionaries Sets II 1 2 3 4 5 6 7 # Methods S0 . copy ( ) S0 . add ( i t e m ) S0 . remove ( i t e m ) S0 . d i s c a r d ( i t e m ) S0 . pop ( ) S0 . c l e a r ( ) March 21, 2017 # # # # # # r e t u r n a new s e t w i t h a s h a l l o w copy o f S0 add element i t e m t o t h e s e t remove element i t e m from t h e s e t remove element i t e m from t h e s e t i f p r e s e n t remove and r e t u r n an a r b i t r a r y element remove a l l elements from t h e s e t Johannes Gutenberg-Universität Mainz Taškova & Fontaine 41 Introduction to Python for Biologists – Sets and dictionaries Dictionaries I A Dictionary is a mutable indexed collection of objects (indexed by unique keys) 1 2 3 4 5 6 7 d = {} # empty d i c t i o n a r y d = { ’A ’ : ” ALA ” , ’C ’ : ”CYS” } # d i c t i o n a r y w i t h 2 i t e m s d [ ’A ’ ] # ’ ALA ’ d [ ’C ’ ] # ’CYS ’ d [ ’H ’ ] = ” HIS ” # add new i t e m d # { ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’ , ’A ’ : ’ ALA ’ } d e l d [ ’A ’ ] # { ’C ’ : ’CYS ’ , ’H ’ : ’ HIS ’ } 8 9 10 ’C ’ i n d # True ( key ’C ’ i s i n d ) ’A ’ n o t i n d # True ( key ’A ’ i s n o t i n d anymore ) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 42 Introduction to Python for Biologists – Sets and dictionaries Dictionaries II d[key] d[key] = val del d[key] d.clear() len(d) d.copy() d.keys() d.values() d.items() d.update(d2) d.get(key [, val]) d.setdefaults(key [, val]) pop(key[, default]) d.popitem() get value by key set value by key delete item by key delete all items number of items make a shallow copy return a view of all keys return a view of all values return a view of all items (key,value) add all items from dictionary d2 get value by key if exists, otherwise val like d.get(k,val), also set d[k]=val if k not in d remove key and return its value, return default otherwise. remove a random item and returns it as tuple Table: Functions for dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 43 Introduction to Python for Biologists – Convert and copy Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 44 Introduction to Python for Biologists – Convert and copy Converting types I Many Python functions are sensitive to the type of data. For example, you cannot concatenate a string with an integer: 1 2 3 s i g n = ’ You are ’ + 21 + ’−years−o l d ’ # e r r o r ! ! s i g n = ’ You are ’ + s t r ( 2 1 ) + ’−years−o l d ’ # OK s i g n # ’ You are 21−years−o l d ’ 4 5 6 7 # c o n v e r t t o i n t ( from s t r o r f l o a t ) i n t ( ’ 2014 ’ ) # from a s t r i n g i n t ( 3 . 1 4 1 5 9 2 ) # from a f l o a t 8 9 10 11 # c o n v e r t t o f l o a t ( from s t r o r i n t ) f l o a t ( ’ 1.99 ’ ) # from a s t r i n g float (5) # from an i n t e g e r March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 45 Introduction to Python for Biologists – Convert and copy Converting types II 1 2 3 # c o n v e r t t o s t r ( from i n t , f l o a t , l i s t , t u p l e , d i c t and s e t ) s t r (3.141592) # ’3.141592 ’ str ([1 ,2 ,3 ,4]) # ’[1 , 2, 3, 4] ’ 4 5 6 7 8 9 10 # convert a # ( str , l i s t new set = new tuple = new set = new list = March 21, 2017 sequence t y p e t o a n o t h e r , t u p l e , and s e t f u n c t i o n s ) set ( o l d l i s t ) # l i s t to set tuple ( o l d l i s t ) # l i s t to tuple set ( ” Hello ” ) # s t r i n g t o s e t { ’H ’ , ’ o ’ , ’ e ’ , ’ l ’ } l i s t ( ” Hello ” ) # s t r i n g to l i s t [ ’H ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ ] Johannes Gutenberg-Universität Mainz Taškova & Fontaine 46 Introduction to Python for Biologists – Convert and copy Copy I 1 2 3 4 5 6 Assignments (=) do not copy objects, they create bindings between a target and an object. # a b b a b Numeric = 1 = a = b + 1 t y p e s ( immutable ) # a binds the o b j e c t 1 # b binds the o b j e c t 1 # b b i n d s a new o b j e c t c r e a t e d by t h e sum # 1 # 2 # a b a a b S t r i n g s ( immutable ) = ” Hello ” # a binds the o b j e c t ” Hello ” = a # b binds the o b j e c t ” Hello ” = a . r e p l a c e ( ’ o ’ , ’ o World ! ’ ) # a b i n d s a new o b j e c t # ’ H e l l o World ! ’ # ’ Hello ’ 7 8 9 10 11 12 13 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 47 Introduction to Python for Biologists – Convert and copy Copy II 1 2 3 4 5 6 For collections that are mutable or contain mutable items, a shallow copy is sometimes needed so one can change one copy without changing the other. # D i c t i o n a r y ( mutable ) d1 = { ’A ’ : ” ALA ” , ’C ’ : ”CYS” } # d1 b i n d s d2 = d1 # d2 b i n d s d2 [ ’H ’ ] = ” HIS ” # add i t e m t o t h e o b j e c t d1 # { ’A ’ : ’ ALA ’ , ’H ’ : ’ HIS d2 # { ’A ’ : ’ ALA ’ , ’H ’ : ’ HIS the o b j e c t the o b j e c t ’ , ’C ’ : ’ , ’C ’ : ’CYS ’ } ’CYS ’ } 7 8 9 10 11 d2 = d1 . copy ( ) # d2 b i n d s a s h a l l o w copy o f t h e o b j e c t d2 [ ’P ’ ] = ”PRO” # add i t e m t o t h e copied o b j e c t d1 # { ’A ’ : ’ ALA ’ , ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’ } d2 # { ’A ’ : ’ ALA ’ , ’H ’ : ’ HIS ’ , ’P ’ : ’PRO ’ , ’C ’ : March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine ’CYS ’ } 48 Introduction to Python for Biologists – Convert and copy Copy III 1 2 3 4 5 6 # l1 l2 l2 l1 l2 L i s t ( mutable ) = [ ’A ’ , ’H ’ , ’C ’ ] = l1 . append ( ’P ’ ) # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ ] # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ ] l2 l2 l1 l2 = l1 [ : ] . append ( # [ ’A ’ , # [ ’A ’ , 7 8 9 10 11 March 21, 2017 # s h a l l o w copy by a s s i g n i n g a s l i c e o f t h e a l l ’V ’ ) ’H ’ , ’C ’ , ’P ’ ] ’H ’ , ’C ’ , ’P ’ , ’V ’ ] Johannes Gutenberg-Universität Mainz Taškova & Fontaine list 49 Introduction to Python for Biologists – Convert and copy Copy IV 1 2 3 4 1 2 3 Convert types to get copies new new new new list dict set tuple = = = = list ( oldlist ) dict ( olddict ) set ( o l d l i s t ) tuple ( o l d l i s t ) # # # # s h a l l o w copy s h a l l o w copy copy l i s t as a s e t copy l i s t a t u p l e The copy module i m p o r t copy x . copy ( ) # s h a l l o w copy o f x x . deepcopy ( ) # deep copy o f x , i n c l u d i n g embedded o b j e c t s March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 50 Introduction to Python for Biologists – Loops Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 51 Introduction to Python for Biologists – Loops For loop I 1 2 3 4 5 6 # For i t e m s i n a l i s t f o r person i n [ ’ I s a b e l ’ , ’ Kate ’ , ’ Michael ’ ] : p r i n t ( ” Hi ” , person ) # Hi I s a b e l # Hi Kate # Hi Michael 7 8 9 10 11 12 13 # For i t e m s i n a d i c t i o n a r y seq = ’ ’ # d = { ’A ’ : ” ALA ” , ’C ’ : ”CYS” } # f o r k i n d . keys ( ) : # seq += d [ k ] # p r i n t ( seq ) # March 21, 2017 an empty s t r i n g a d i c t i o n a r y w i t h 2 keys l o o p over t h e keys append v a l u e t o seq ’CYSALA ’ Johannes Gutenberg-Universität Mainz Taškova & Fontaine 52 Introduction to Python for Biologists – Loops For loop II 1 2 3 4 5 6 # For i t e m s i n a s t r i n g f o r c i n ’ abc ’ : print (c) # a # b # c 7 8 9 10 11 12 13 # For i t e m s i n a range f o r n i n range ( 3 ) : print (n) # 0 # 1 # 2 14 15 16 17 # For i t e m s from any i t e r a t o r for n in i t e r a t o r : print (n) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 53 Introduction to Python for Biologists – Loops Enumerate 1 2 3 4 5 6 7 # l o o p g e t t i n g i n d e x and v a l u e RNAs = [ ’miRNA ’ , ’ tRNA ’ , ’mRNA ’ ] f o r i , rna i n enumerate (RNAs) : p r i n t ( i , rna ) # 0 miRNA # 1 tRNA # 2 mRNA 8 9 10 11 12 13 14 15 16 # l o o p over 2 l i s t s RNAtypes = [ ’ micro ’ , ’ t r a n s f e r ’ , ’ messenger ’ ] f o r i , t i n enumerate ( RNAtypes ) : r = RNAs [ i ] print ( i , t , r ) # 0 micro miRNA # 1 t r a n s f e r tRNA # 2 messenger mRNA March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 54 Introduction to Python for Biologists – Loops While loop 1 2 3 4 5 6 7 8 9 10 11 12 i =0 v a l u e =1 w h i l e value <200: i +=1 v a l u e ∗= i p r i n t ( i , value ) # 1 1 # 2 2 # 3 6 # 4 24 # 5 120 # 6 720 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 55 Introduction to Python for Biologists – – Exercise – Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 56 Introduction to Python for Biologists – – Exercise – Exercise URL https://cbdm.uni-mainz.de/mb17 Jupyter Notebook File: Sequences.ipynb Download the file into the notebooks folder Data file File: shrub dimensions.csv Download the file into the data folder March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 57 Introduction to Python for Biologists – Functions Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 58 Introduction to Python for Biologists – Functions Functions I 1 from random i m p o r t c h o i c e # import f u n c t i o n ’ choice ’ # Simple f u n c t i o n d e f kmerFixed ( ) : p r i n t ( ”ACGTAGACGC” ) # d e f i n e f u n c t i o n kmerFixed # p r i n t predefined s t r i n g kmerFixed ( ) # d i s p l a y ’ACGTAGACGC ’ # Returning a value d e f kmer10 ( ) : seq= ” ” f o r count i n range ( 1 0 ) : seq += c h o i c e ( ”CGTA” ) r e t u r n ( seq ) # # # # # newKmer = kmer10 ( ) p r i n t ( newKmer ) # get r e s u l t of f u n c t i o n i n t o v a r i a b l e # c a l l t h e f u n c t i o n e . g . ’ACGGATACGC ’ 2 3 4 5 6 7 8 9 10 11 12 13 14 d e f i n e f u n c t i o n kmer10 d e f i n e an empty s t r i n g r e p e a t 10 t i m e s add 1 random n t t o s t r i n g return string 15 16 17 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 59 Introduction to Python for Biologists – Functions Functions II 1 2 3 4 5 6 # One parameter d e f kmer ( k ) : # d e f i n e kmer w i t h 1 param . k seq= ” ” f o r count i n range ( k ) : # k i s used t o d e f i n e t h e range seq+= c h o i c e ( ”CGTA” ) r e t u r n ( seq ) 7 8 9 10 11 print print print print ( kmer ( k =4) ) ( kmer ( 2 0 ) ) ( kmer ( 0 ) ) ( kmer ( ) ) 12 March 21, 2017 # e . g . ’TACC ’ # e . g . ’CACAATGGGTACCCCGGACC ’ # # TypeError : kmer ( ) m i s s i n g 1 r e q u i r e d # p o s i t i o n a l argument : ’ k ’ Johannes Gutenberg-Universität Mainz Taškova & Fontaine 60 Introduction to Python for Biologists – Functions Functions III 1 2 3 4 5 6 # Parameters w i t h more parameters and d e f a u l t v a l u e s d e f g e n e r i c k m e r ( a l p h a b e t = ”ACGT” , k =10) : seq= ” ” f o r count i n range ( k ) : seq+= c h o i c e ( a l p h a b e t ) r e t u r n ( seq ) 7 8 9 10 11 12 generic generic generic generic generic March 21, 2017 k m e r ( ” AB12 ” , 15) # e . g . ’112AA1A12AA1121 ’ k m e r ( ” AB12 ” ) # e . g . ’ 1AA1B1BA2A ’ k m e r ( k =20) # e . g . ’GTGGGCTTGTGCCCTGCACT ’ kmer ( ) # e . g . ’CTTGCCGGGA ’ k m e r ( k =8 , a l p h a b e t = ” #$%&” ) # e . g . ’ $$#&%$%$ ’ Johannes Gutenberg-Universität Mainz Taškova & Fontaine 61 Introduction to Python for Biologists – Functions Name spaces I 1 Variable and function names defined globally can be seen in functions: this is the global namespace a = 10 # global variable 2 3 4 def my function ( ) : print (a) # w i l l use t h e g l o b a l v a r i a b l e 5 6 7 my function ( ) print (a) March 21, 2017 # 10 ( t h e g l o b a l a ) # 10 ( t h e g l o b a l a ) Johannes Gutenberg-Universität Mainz Taškova & Fontaine 62 Introduction to Python for Biologists – Functions Name spaces II 1 Names defined within a function can not be seen outside: the function has its own namespace. a = 10 # global variable 2 3 4 5 6 def my function ( ) : a = 1 # l o c a l v a r i a b l e d e f i n e d by assignment b = 2 # l o c a l v a r i a b l e d e f i n e d by assignment print (a) 7 8 9 10 my function ( ) print (a) print (b) March 21, 2017 # 1 ( the l o c a l a ) # 10 ( t h e g l o b a l a ) # NameError : name ’ b ’ i s n o t d e f i n e d Johannes Gutenberg-Universität Mainz Taškova & Fontaine 63 Introduction to Python for Biologists – Functions Name spaces III 1 Use parameters and returned values to get and set variables outside the name space a = 10 # global variable def my function ( val ) : b = 2 val = val + b return ( val ) print (a) p r i n t ( my function ( a ) ) print (a) # local variable val # 10 ( t h e g l o b a l a ) # 12 # 10 ( t h e g l o b a l a unchanged ) c = my function ( a ) print (c) print (a) # s e t v a l t o 10 and a s s i g n 10+2 t o c # 12 ( g l o b a l a was changed ) # 10 ( g l o b a l a was unchanged ) a = my function ( a ) # change g l o b a l a w i t h v a l u e 10+2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 64 Introduction to Python for Biologists – Branching Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 65 Introduction to Python for Biologists – Branching Truth Value Testing I Any object can be tested for truth value. The following values are considered false (other values are considered True): None False zero value: e.g. 0 or 0.0 an empty sequence or mapping: e.g. ’ ’, (), [ ], { }. Operations and built-in functions that have a Boolean result always return 0 for False and 1 for True March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 66 Introduction to Python for Biologists – Branching Boolean Operations I A Boolean is equal to True or False a and b (true if a and b are true, false otherwise) a or b (true if a or b is true (1 alone or both), false otherwise) a ˆ b (true if either a or b is true (not both), false otherwise) not b (true if b is false, false otherwise) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 67 Introduction to Python for Biologists – Branching Boolean Operations II All example code for tests below return ”True” unless otherwise specified 1 2 3 4 # a b c l e t s e t v a l u e s o f 3 v a r i a b l e s ( s i n g l e ” = ” symbol ) = True = False = True # a b c s i m p l e t e s t s u s i n g two ” = ” symbols ( = = ) == True == False == True 5 6 7 8 9 10 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 68 Introduction to Python for Biologists – Branching Boolean Operations III 1 2 3 4 # a b c l e t s e t v a l u e s o f 3 v a r i a b l e s ( one ” = ” symbol ) = True = False = True 5 6 7 8 # order i s i r r e l e v a n t ( a o r b ) == ( b o r a ) ( a and b ) == ( b and a ) 9 10 11 12 # n e u t r a l ( whatever v a l u e o f a ) ( a o r False ) == a ( a and True ) == a 13 14 15 16 # always t h e same ( whatever v a l u e o f a ) ( a and False ) == False ( a o r True ) == True March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 69 Introduction to Python for Biologists – Branching Boolean Operations IV 1 2 3 4 # a b c l e t s e t v a l u e s o f 3 v a r i a b l e s ( one ” = ” symbol ) = True = False = True 5 6 7 8 # precedence ” = = ” > ” n o t ” > ” and ” > ” o r ” ( a and b o r c ) == ( ( a and b ) o r c ) ( n o t a == b ) == ( n o t ( a == b ) ) 9 10 11 12 13 # equivalent expressions ( ( a o r b ) o r c ) == ( a o r ( b o r c ) ) == ( a o r b o r c ) ( a o r a o r a ) == a ( b and b and b ) == b 14 15 b and b and b == b # False and False and True => False ! ! 16 17 18 a and ( b o r c ) == ( a and b ) o r ( a and c ) a o r ( b and c ) == ( a o r b ) and ( a o r c ) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 70 Introduction to Python for Biologists – Branching Comparisons 1 2 3 4 5 6 7 8 9 10 11 Operations < <= > >= == math . i s c l o s e ( a , b ) != is i s not x < y <= z # # # # # # # # # # s t r i c t l y l e s s than l e s s than o r equal s t r i c t l y g r e a t e r than g r e a t e r than o r equal equal ( two symbols =) equal f o r f l o a t i n g p o i n t s a and b n o t equal object i d e n t i t y negated o b j e c t i d e n t i t y i s e q u i v a l e n t t o ” x < y and y <= z ” Comparisons between objects of same class are supported if operator defined for the class. Different numerical types can be compared: e.g. 2<4.56 Floating points can not be compared exactly due to the limited precision to represent infinite numbers such as 1/3 = 0.33333... March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 71 Introduction to Python for Biologists – Branching Conditionals 1 2 3 4 5 6 7 IF-ELIF-ELSE seq = ’ATGAnnATG ’ i f ’ n ’ i n seq : p r i n t ( ” sequence c o n t a i n s u n d e f i n e d bases ( n ) ” ) e l i f ’ x ’ i n seq : p r i n t ( ” sequence c o n t a i n s unknown bases x b u t n o t n ” ) else : p r i n t ( ” no u n d e f i n e d bases i n sequence ” ) 8 10 # # sequence c o n t a i n s u n d e f i n e d bases ELIF and ELSE are optional multiple ELIF are possible 9 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 72 Introduction to Python for Biologists – – Exercise – Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 73 Introduction to Python for Biologists – – Exercise – Exercise URL https://cbdm.uni-mainz.de/mb17 Jupyter Notebook File: Conditionals.ipynb Download the file into the notebooks folder March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 74 Introduction to Python for Biologists – Regular Expressions Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 75 Introduction to Python for Biologists – Regular Expressions RE: Regular Expressions I Regular expressions (called REs, or regexes, or regex patterns) are a powerful language for matching text patterns (re module) In Python a regular expression search is typically written as: 1 match = r e . search ( expression , s t r i n g ) The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, re.search() returns a Match object (actually class ’ sre.SRE Match’) or None otherwise. March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 76 Introduction to Python for Biologists – Regular Expressions RE: Regular Expressions II 1 2 3 4 5 6 7 import re # s t r = ’ an example word : c a t ! ! ’ # match = r e . search ( r ’ word : \w\w\w ’ , s t r ) # i f match : p r i n t ( ’ found ’ , match . group ( ) ) # else : p r i n t ( ’ did not f i n d ’ ) i m p o r t r e module Example s t r i n g Search a p a t t e r n ’ found word : c a t ’ In the pattern string, \w codes a character (letter, digit or underscore) The ’r’ at the start of the pattern string designates a python ”raw” string which passes through backslashes without change. March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 77 Introduction to Python for Biologists – Regular Expressions RE: Basic Patterns Pattern a, X, 9, < . \w \W \b \s \S \t \n \r \d ˆ $ \ Match ordinary characters match themselves exactly a period matches any single character except newline matches a ”word” character: a letter or digit or underbar [a-zA-Z0-9 ] matches any non-word character boundary between word and non-word a single whitespace character – space, newline, return, tab, form [\n \r \t \f] matches any non-whitespace character tab newline return decimal digit [0-9] circumflex (top hat) matches the start of a string dollar matches the end of a string inhibits the ”specialness” of a character. So, for example, use \. to match a period Table: Regular expressions: basic patterns March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 78 Introduction to Python for Biologists – Regular Expressions RE: Basic examples I The basic rules of RE search for a pattern within a string are: The search proceeds through the string from start to end, stopping at the first match found All of the pattern must be matched, but not all of the string If match = re.search(pat, str) is successful, match is not None and in particular match.group() is the matching text March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 79 Introduction to Python for Biologists – Regular Expressions RE: Basic examples II 1 2 match = r e . search ( r ’ i i i ’ , ’ p i i i g ’ ) # found match . group ( ) == ” i i i ” # True 3 4 5 match = r e . search ( r ’ i g s ’ , ’ p i i i g ’ ) # n o t found match == None # True 6 7 8 match = r e . search ( r ’ . . g ’ , ’ p i i i g ’ ) # found match . group ( ) == ” i i g ” # True 9 10 11 match = r e . search ( r ’ \d\d\d ’ , ’ p123g ’ ) # found match . group ( ) == ” 123 ” # True 12 13 14 match = r e . search ( r ’ \w\w\w ’ , ’@@abcd ! ! ’ ) # found match . group ( ) == ” abc ” # True March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 80 Introduction to Python for Biologists – Regular Expressions RE: Repetitions I Repetitions are defined using +, *, ? and { } + means 1 or more occurrences of the pattern to its left e.g. i+ = one or more i’s * means 0 or more occurrences of the pattern to its left ? means match 0 or 1 occurrences of the pattern to its left curly brackets are used to specify exact number of repetitions e.g. A{5} for 5 A letters A{6,10} for 6 to 10 A letters Leftmost and Largest: First the search finds the leftmost match for the pattern, and second it tries to use up as much of the string as possible i.e. + and * go as far as possible (they are said to be ”greedy”). March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 81 Introduction to Python for Biologists – Regular Expressions RE: Repetitions II 1 2 3 4 5 6 7 8 # simple r e p e t i t i o n s r e . search ( r ’ p i + ’ , r e . search ( r ’ p i ? ’ , r e . search ( r ’ p i ? ’ , r e . search ( r ’ p i ∗ ’ , r e . search ( r ’ p i ∗ ’ , r e . search ( r ’ p i {3} ’ , r e . search ( r ’ i + ’ , ’ piiig ’ ’ ap ’ ’ apii ’ ’ ap ’ ’ apii ’ ’ apiiiii ’ ’ piigiiii ) . group ( ) ) . group ( ) ) . group ( ) ) . group ( ) ) . group ( ) ) . group ( ) ’ ) . group ( ) # # # # # # # piii p pi p pii piii i i (1 s t h i t only ) 9 10 11 12 13 # 3 d i g i t s p o s s i b l y separated by whitespaces ( \ s ∗ ) r e . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx1 2 3xx ’ ) . group ( ) # ” 1 2 3” r e . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx12 3xx ’ ) . group ( ) # ”12 3 ” r e . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx123xx ’ ) . group ( ) # ” 1 2 3 ” March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 82 Introduction to Python for Biologists – Regular Expressions RE: Sets of characters I Square brackets indicate a set of characters [ABC] matches ’A’ or ’B’ or ’C’. The codes \w, \s etc. work inside square brackets too with the one exception that dot (.) just means a literal dot Dash indicate a range or itself if put at the end [a-z] for lowercase alphabetic characters [a-zA-Z] for alphabetic characters [AB-] for A, B or dash Circumflex (ˆ) at the start inverts the set March 21, 2017 [ˆAB] for any character except A or B. Johannes Gutenberg-Universität Mainz Taškova & Fontaine 83 Introduction to Python for Biologists – Regular Expressions RE: Sets of characters II 1 2 3 4 s t r = ’ p u r p l e a l i c e −b@google . com monkey dishwasher ’ match = r e . search ( r ’ \w+@\w+ ’ , s t r ) i f match : p r i n t match . group ( ) ## ’ b@google ’ 5 6 7 8 match = r e . search ( r ’ [ \w. −]+@[ \w. −]+ ’ , s t r ) i f match : p r i n t match . group ( ) ## ’ a l i c e −b@google . com ’ March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 84 Introduction to Python for Biologists – Regular Expressions RE: Functions I RE module functions: re.match() returns a Match object if occurrence found at begining of string, None otherwise re.search() returns a Match object for 1st occurrence, None if not found re.findall() returns a list of matched sub strings, an empty list if not found re.finditer() returns an iterator on Match objects of the occurrences, an empty iterator if not found Match object methods: match.start() returns start index match.end() returns end index match.span() returns start and end index in a tuple match.group() returns matched string March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 85 Introduction to Python for Biologists – Regular Expressions RE: Functions II 1 2 3 import re seq = ”RPAPPDRAPDQX” # A sequence expr = ’A. { 1 , 2 }D ’ # A and D separated by 1 o r 2 c h a r a c t e r s 4 5 6 7 8 9 10 11 12 13 14 15 match = r e . search ( expr , seq ) i f match : print ( match . s t a r t ( ) , match . end ( ) , match . span ( ) , match . group ( ) , seq [ match . s t a r t ( ) : match . end ( ) ] , sep= ’ − ’ ) # 2 − 6 − ( 2 , 6 ) − APPD − APPD March 21, 2017 Johannes Gutenberg-Universität Mainz # # # # # s t a r t index end i n d e x s t a r t and end i n d e x t h e matched s t r i n g t h e matched s t r i n g Taškova & Fontaine 86 Introduction to Python for Biologists – Regular Expressions RE: Functions III 1 2 3 import re seq = ”RPAPPDRAPDQX” # A sequence expr = ’A. { 1 , 2 }D ’ # A and D separated by 1 o r 2 c h a r a c t e r s 4 5 6 7 match = r e . match ( expr , seq ) p r i n t ( match ) # None # Not found a t b e g i n i n g 8 9 10 11 matches = r e . f i n d a l l ( expr , seq ) # Found 2 occurrences p r i n t ( matches ) # [ ’ APPD ’ , ’APD ’ ] 12 13 14 15 16 17 matches = r e . f i n d i t e r ( expr , seq ) # Found 2 occurrences f o r m i n matches : # I t e r a t e over Match o b j e c t s p r i n t ( m. span ( ) , m. group ( ) ) # Use each Match o b j e c t # ( 2 , 6 ) APPD # ( 7 , 10) APD March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 87 Introduction to Python for Biologists – Regular Expressions RE: Group Extraction Groups are defined with parentheses On a successful search 1 2 3 4 5 6 7 match.group(): the whole match text match.group(1): match text of 1st left parenthesis match.group(2): match text of 2nd left parenthesis ... import re s t r = ’ p u r p l e a l i c e −b@google . com monkey dishwasher ’ match = r e . search ( ’ ( [ \ w. − ] + )@( [ \ w. − ] + ) ’ , s t r ) i f match : p r i n t ( match . group ( ) ) ## ’ a l i c e −b@google . com ’ p r i n t ( match . group ( 1 ) ) ## ’ a l i c e −b ’ p r i n t ( match . group ( 2 ) ) ## ’ google . com ’ March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 88 Introduction to Python for Biologists – Regular Expressions RE: Group Extraction and Findall If the pattern includes a single set of parenthesis, then findall() returns a list of strings corresponding to that single group If the pattern includes 2 or more parenthesis groups, then instead of returning a list of strings, findall() returns a list of tuples. Each tuple represents one match of the pattern, and inside the tuple is the group(1), group(2) ... data. 1 2 3 4 s t r = ’ alice@google . com , monkey bob@abc . com dishwasher ’ t u p l e s = r e . f i n d a l l ( r ’ ( [ \ w\. −]+)@( [ \ w\. −]+) ’ , s t r ) p r i n t ( tuples ) # [ ( ’ a l i c e ’ , ’ google . com ’ ) , ( ’ bob ’ , ’ abc . com ’ ) ] 5 6 7 8 9 for t in tuples : p r i n t ( t [ 0 ] , t [ 1 ] , sep= ’ | # a l i c e | google . com # bob | abc . com March 21, 2017 Johannes Gutenberg-Universität Mainz ’) Taškova & Fontaine 89 Introduction to Python for Biologists – Regular Expressions RE: Options The re functions take options to modify the behavior of the pattern match. The option flag is added as an extra argument to the search() or findall() etc., e.g. re.search(pat, str, re.IGNORECASE). IGNORECASE ignores upper/lowercase differences for matching DOTALL allows dot (.) to match newline – normally it matches anything but newline. Note that \s (whitespace) includes newlines MULTILINE allows ˆand $ to match the start and end of each line within a string made of many lines. Normally they just match the start and end of the whole string. March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 90 Introduction to Python for Biologists – Regular Expressions Greedy vs. Non-Greedy 1 .* or .+ return the largest match (aka it is ”greedy”) to get nested occurrences use .*? or .+? s t r i n g = ’<b>foo </b> and <i >so on</ i > ’ # s t r i n g w i t h xml t a g s 2 3 4 matches = r e . f i n d a l l ( r ’ <.∗> ’ , s t r i n g ) # <.∗> p r i n t ( matches ) # [ ’ <b>foo </b> and <i >so on</ i > ’] # g o t a l l s t r i n g 5 6 7 matches = r e . f i n d a l l ( r ’ <.∗?> ’ , s t r i n g ) # <.∗?> p r i n t ( matches ) # [ ’ <b > ’ , ’ </b > ’ , ’< i > ’ , ’ </ i > ’] # g o t each t a g March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 91 Introduction to Python for Biologists – Regular Expressions Substitution 1 2 3 4 1 2 3 4 5 6 7 re.sub(expression, replacement, string) t e x t 1 = ’ alice@google . com and bob@abc . n e t ’ t e x t 2 = r e . sub ( r ’ \ . \w+ ’ , r ’ . de ’ , t e x t 1 ) print ( text2 ) # alice@google . de and bob@abc . de \1, \2 ... in replacement refer to match group(1), group(2) ... t e x t 1 = ’ alice@google . com and bob@abc . com ’ t e x t 2 = r e . sub ( r ’ ( [ \ w\. −]+)@( [ \ w\. −]+) ’ , # Expression r ’ \2@\1 ’ , # Replacement s t r i n g str ) # Input string print ( text2 ) ## google . com@alice and abc . com@bob March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 92 Introduction to Python for Biologists – – Exercise – Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 93 Introduction to Python for Biologists – – Exercise – Exercise URL https://cbdm.uni-mainz.de/mb17 Jupyter Notebook File: Regex.ipynb Download the file into the notebooks folder Data file File: sequences.tsv Download the file into the data folder March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 94 Introduction to Python for Biologists – Annexes Introduction Running code Literals and variables Numeric types Strings – Exercise– Lists, tuples and ranges Sets and dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Convert and copy Loops – Exercise – Functions Branching – Exercise – Regular Expressions – Exercise – Annexes Taškova & Fontaine 95 Introduction to Python for Biologists – Annexes References Python documentation https://docs.python.org Online tutorials (Python 2 or 3) March 21, 2017 Google’s Python Class ProgrammingForBiologists.org Johannes Gutenberg-Universität Mainz Taškova & Fontaine 96 Introduction to Python for Biologists – Annexes Escape sequences Escape Sequence \newline \\ \’ \” \a \b \f \n \r \t \v \ooo \xhh Meaning Backslash and newline ignored Backslash (\) Single quote (’) Double quote (”) ASCII Bell (BEL) ASCII Backspace (BS) ASCII Formfeed (FF) ASCII Linefeed (LF) ASCII Carriage Return (CR) ASCII Horizontal Tab (TAB) ASCII Vertical Tab (VT) Character with octal value ooo Character with hex value hh Table: Escape sequences March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 97 Introduction to Python for Biologists – Annexes Common Sequence Operations Operation x in s x not in s s+t s * n or n * s s[i] s[i:j] s[i:j:k] len(s) min(s) max(s) s.index(x[, i[, j]]) s.count(x) Result True if an item of s is equal to x, else False False if an item of s is equal to x, else True the concatenation of s and t equivalent to adding s to itself n times ith item of s, origin 0 slice of s from i to j slice of s from i to j with step k length of s smallest item of s largest item of s index of the first occurrence of x in s (at or after index i and before index j) total number of occurrences of x in s Table: Sequence operations sorted in ascending priority. s and t are sequences of the same type, n, i, j and k are integers and x is an arbitrary object that meets any type and value restrictions imposed by s. March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 98 Introduction to Python for Biologists – Annexes Operations on mutable sequence types Operation s[i] = x s[i:j] = t del s[i:j] s[i:j:k] = t del s[i:j:k] s.append(x) s.clear() s.copy() s.extend(t) or s += t s *= n s.insert(i, x) s.pop([i]) s.remove(x) s.reverse() Result item i of s is replaced by x slice of s from i to j is replaced by the contents of the iterable t same as s[i:j] = [] the elements of s[i:j:k] are replaced by those of t removes the elements of s[i:j:k] from the list appends x to the end of the sequence (same as s[len(s):len(s)] = [x]) removes all items from s (same as del s[:]) creates a shallow copy of s (same as s[:]) extends s with the contents of t (for the most part the same as s[len(s):len(s)] = t) updates s with its contents repeated n times inserts x into s at the index given by i (same as s[i:i] = [x]) retrieves the item at i and also removes it from s remove the first item from s where s[i] == x reverses the items of s in place Table: s is an instance of a mutable sequence type, t is any iterable object and x is an arbitrary object that meets any type and value restrictions imposed by s March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 99 Introduction to Python for Biologists – Annexes Built-in functions abs() all() any() ascii() bin() bool() chr() dict() dir() float() format() help() hex() Return the absolute value of a number. Return True if all elements of the iterable are true (or if the iterable is empty). Return True if any element of the iterable is true. If the iterable is empty, return False. Return a string containing a printable representation of an object (escape non-ASCII characters). Convert an integer number to a binary string. Convert a value to a Boolean. Return the string representing a character. Create a new dictionary. Return the list of names in the current local scope. Convert a string or a number to floating point. Convert a value to a ”formatted” representation. Invoke the built-in help system. Convert an integer number to a hexadecimal string. Table: Python built-in functions March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 100