Download here - STP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Advanced introduction to
Python (draft)
Fredrik Wahlberg
Scripted Hello World
Python 2.7.5+ (default, Sep 19 2013, 13:48:49)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print "Hello World"
Hello World
>>>
$ python helloworld.py
Hello World
# Comment
print “Hello World”
Math
>>> 1+1
2
>>> 2345234523452345+234587347826948376
236932582350400721
>>> 123*123
15129
>>> 10**3
1000
>>> 2.8**3
21.951999999999995
>>> 10-4
6
>>> 6+4
10
>>> 9/3
3
>>> 9/2
Integer division
4
>>> -9/2
-5
>>> 13%2
1
>>> 13%5
3
>>> -13%5
2
>>> 2*2
4
>>> 32*32
1024
>>> 256*256
65536
>>> 65536*65536
4294967296
>>> 2**32
4294967296
>>> 2**33
8589934592
>>> 2**34
17179869184
>>> 2**100
1267650600228229401496703205376L
>>> 2.2345*.234
0.522873
>>> (2 + 4j)*3j
(-12+6j)
Data types
>>> list = []
>>> dict
>>> list
{}
[]
>>> dict['a'] = 'b'
>>> list = ['a', 'b', 1, 2]
>>> dict['c'] = 10
>>> list
>>> dict
['a', 'b', 1, 2]
{'a': 'b', 'c': 10}
>>> list[2]
>>> dict['some_list'] = [1, 2, 3]
1
>>> dict['another_dictionary'] = {'mykey':
>>> list[:2]
'mydata', 1:2}
['a', 'b']
>>> dict
>>> list[2:]
{'a': 'b', 'c': 10, 'some_list': [1, 2, 3],
[1, 2]
'another_dictionary': {1: 2, 'mykey':
>>> list[1:3]
'mydata'}}
['b', 1]
>>>
>>> list.append('c')
>>> list.append('d')
>>> list
['a', 'b', 1, 2, 'c', 'd']
Comparison
Python 2.7.5+ (default, Sep 19
2013, 13:48:49)
[GCC 4.8.1] on linux2
>>> not 2+2==4
False
>>> 2+2==4
Type "help", "copyright", "credits"
or "license" for more information.
True
>>> 1==1
>>> 'abc' < 'ABC'
True
False
>>> 1==2
>>> 'abc' > 'ABC'
False
True
>>> 1!=2
>>> 'abc' == 'ABC'
True
False
>>> 'abc' == 'abc'
True
>>>
Control flow & functions
if 2+2 == 4:
print “2+2=4”
elseif 2+2==3:
print “2+2=3”
else:
print “2+2=?”
i = 0
while i < 100:
i = i + 1
for i in range(100):
print i
>>> def somefunction(a, b):
...
print a+b
...
>>> somefunction(1, 2)
3
>>>
Modules and docstrings
>>> print nltk.model.NgramModel.__init__.__doc__
Python 2.7.5+ (default, Sep 19 2013, 13:48:49)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license"
for more information.
>>> import nltk
>>> print nltk.__doc__
The Natural Language Toolkit (NLTK) is an open
source Python library
for Natural Language Processing. A free online
book is available.
(If you use the library for academic research,
please cite the book.)
Steven Bird, Ewan Klein, and Edward Loper (2009).
Natural Language Processing with Python. O'Reilly
Media Inc.
http://nltk.org/book
@version: 2.0.4
>>> print nltk.model.NgramModel.__doc__
A processing interface for assigning a
probability to the next word.
>>> dir(nltk.model)
['NgramModel', '__builtins__', '__doc__',
'__file__', '__name__', '__package__', '__path__',
'api', 'ngram']
Create an ngram language model to capture patterns in n consecutive
words of training text. An estimator smooths the probabilities
derived
from the text and may allow generation of ngrams not seen during
training.
>>> from nltk.corpus import brown
>>> from nltk.probability import LidstoneProbDist
>>> est = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
>>> lm = NgramModel(3, brown.words(categories='news'),
estimator=est)
>>> lm
<NgramModel with 91603 3-grams>
>>> lm._backoff
<NgramModel with 62888 2-grams>
>>> lm.entropy(['The', 'Fulton', 'County', 'Grand', 'Jury',
'said',
... 'Friday', 'an', 'investigation', 'of', "Atlanta's", 'recent',
... 'primary', 'election', 'produced', '``', 'no', 'evidence',
... "''", 'that', 'any', 'irregularities', 'took', 'place', '.'])
... # doctest: +ELLIPSIS
0.5776...
:param n: the order of the language model (ngram size)
:type n: int
:param train: the training text
:type train: list(str) or list(list(str))
:param pad_left: whether to pad the left of each sentence with an
(n-1)-gram of empty strings
:type pad_left: bool
:param pad_right: whether to pad the right of each sentence with an
(n-1)-gram of empty strings
:type pad_right: bool
:param estimator: a function for generating a probability distribution
:type estimator: a function that takes a ConditionalFreqDist and
returns a ConditionalProbDist
:param estimator_args: Extra arguments for estimator.
These arguments are usually used to specify extra
properties for the probability distributions of individual
conditions, such as the number of bins they contain.
Note: For backward-compatibility, if no arguments are specified,
the
number of bins in the underlying ConditionalFreqDist are passed to
the estimator as an argument.
:type estimator_args: (any)
:param estimator_kwargs: Extra keyword arguments for the estimator
:type estimator_kwargs: (any)
>>>
Classes
Python 2.7.5+ (default, Sep 19 2013,
13:48:49)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or
"license" for more information.
>>> from Complex import Complex
>>> x = Complex(3.0, -4.5)
>>> x.r, x.i
(3.0, -4.5)
>>> x
(3.0, -4.5j)
>>> y = Complex(2.5, 1)
File Complex.py:
>>> y
(2.5, 1j)
class Complex:
>>> x.add(y)
def __init__(self, realpart, imagpart):
>>> x
self.r = realpart
(5.5, -3.5j)
self.i = imagpart
>>>
def add(self, other):
self.r = self.r + other.r
self.i = self.i + other.i
def __repr__(self):
return "(" + str(self.r) + ", " + str(self.i) + "j)"
NLTK
Python 2.7.5+ (default, Sep 19 2013, 13:48:49)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('gutenberg')
[nltk_data] Downloading package 'gutenberg' to
[nltk_data]
/home/fredrik/nltk_data...
[nltk_data]
Unzipping corpora/gutenberg.zip.
True
>>> nltk.download('brown')
[nltk_data] Downloading package 'brown' to /home/fredrik/nltk_data...
[nltk_data]
Unzipping corpora/brown.zip.
True
>>> nltk.download()
showing info http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml
True
>>>
NLTK: N-gram
Python 2.7.5+ (default, Sep 19 2013, 13:48:49)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.corpus import brown
>>> data = brown.sents(categories='news')
>>> data = data[:len(data)/2]
>>> from nltk.model import NgramModel
>>> from nltk import MLEProbDist
>>> trigram = NgramModel(3, data, estimator=MLEProbDist)
>>>
Getting the data
def Pml(ngram, word, context):
context = tuple(context)
if (context + (word,) in ngram._ngrams) or (ngram._n == 1):
fd = ngram[context].freqdist()
i = float(fd[word])
N = float(fd.N())
return i/N
else:
return 0
fd = ngram[context].freqdist()
i = float(fd[word])
N = float(fd.N())
Get the data container
Number of occurrences
Number of words???
Links and resources
The internet is full of good resources:
http://docs.python.org/2/tutorial/
http://pyvideo.org/
Book:
Python Essential Reference, David M. Beazley
Related documents