Download Double MSc in Human Language Science and Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Wizard of Oz experiment wikipedia , lookup

Knowledge representation and reasoning wikipedia , lookup

Time series wikipedia , lookup

Transcript
HLST COURSE CATALOGUE
2007/8
1
CSA5002 – Corpora and Statistical Methods
Lecturer(s):
Semester:
ECTS Credits:
Contact Hours:
Method of Assessment:
Dr. Albert Gatt
I
6
42
Test: (85%)
Coursework: (15%)
Aims
This course provides a grounding in (i) foundational statistical methods, especially probability,
information theory, and statistical inference and (ii) corpus design, annotation and construction
and the use of these to:
 Conduct linguistic research, whose aim is to test empirical hypotheses about language and
make generalisations;
 Build Natural Language Processing systems (e.g. parsers, thesauri, generators) which differ
from traditional rule-based or “symbol-processing” systems in that their core is a statistical
language model derived from corpus data.
Syllabus
This course will be divided into three parts.
Part I deals with introductory material and some of the mathematical background. An important
aspect of this is to provide students with exposure to existing corpora and also existing tools for
corpus-based research, corpus building, and corpus annotation. Another important aspect is the
use of the web as corpus.
Part II focuses in detail on particular areas of corpus-based research in NLP, and the methods
used including
Research on words, word distributions, word frequencies and collocations
Semantic similarity and corpus-derived thesauri
N-gram language models for parsers and generators.
Machine-learning techniques (both statistical and “rule-based”, where the latter involves the class
of rule learners that infer symbolic/production rules from annotated corpora).
Part III aims to provide a more comprehensive picture of state-of-the art NLP research using
corpora including
 Statistical Parsing: An overview of recent work in this area, covering TAG Grammars and
the RASP parser (Carroll et al); parsers trained on treebanks (Charniak, Collins).
 Statistical Generation: This will mainly cover statistical language realisers, which take as
input a semantic form, and output a natural language expression. Recent work in this area
includes the overgeneration-and-ranking approach (Knight, Langkilde-Geary, Varges). Some
recent work that applies statistical techniques to less “surface-oriented” issues (including
content determination for NLG systems)
Bibliography
2

Key Papers by Carroll, Charniak, Collins, Knight, Lankilde-Geary, Varges
3
CSA5003 – Finite State Machinery and Computational Morphology
Lecturer(s):
Semester:
ECTS Credits:
Contact Hours:
Method of Assessment
Dr. Gertjan van Noord, University of Groningen
II
4
28
Coursework (100%)
Linguistic Morphology studies the internal structure of words. The main issues that have to be
considered are (a) how a word is segmented into its component parts, (b) which parts are common
to different forms of the same word and (c) how the parts interact with each other to define the
particular nature of a given wordform. Computational morphology attempts to shed light on these
issues by building computational models. For the most part these models are based on Finite State
Automata of different kinds.
The aim of this course is to present linguistic issues and then provide examples of computational
approaches to the area.
Use will be made of the FSA Utilities toolbox developed in Groningen: a collection of utilities to
manipulate regular expressions, finite-state automata and finite-state transducers. Manipulations
include automata construction from regular expresssions, determinization (both for finite-state
acceptors and finite-state transducers), minimization, composition, complementation, intersection,
Kleene closure, etc. Various visualization tools are available to browse finite-state automata.
Interpreters are provided to apply finite automata. Finite automata can also be compiled into
stand-alone C programs.
Texts:

Lauri Karttunen, Kimmo Koskenniemi, Gertjan van Noord. Special issue: Finite State
Methods in Natural Language Processing. Natural Language Engineering. Volume 9,
Part 1, March 2003.
4
CSA5004 – Unification Grammar
Lecturer(s):
Semester:
ECTS Credits:
Contact Hours:
Method of Assessment
Dr. Shuly Wintner, University of Haifa.
II
5
35 Hrs
Coursework (100%)
Description:
The course introduces the foundations of some of the major formalisms used in computational
linguistics nowadays, providing both the linguistic motivation and the necessary mathematical
infrastructure.
Syllabus:

Context-free grammars
 Basics: strings, grammars, derivations, languages, trees
 Properties of CFGs
 The (in)adequacy of CFGs for describing natural languages

Extending CFGs: feature structures
 Motivation
 Properties: features, values, variables, paths, reentrancy
 Subsumption and unification
 Representing lists, trees and graphs

Unification grammars
 Adding features to rules
 Multi-AVMs, forms, derivations, languages, trees
 Internalizing categories

Linguistic examples
 Imposing subject-verb agreement
 Case control
 Subcategorization
 Unbounded dependencies
 Coordination
 Typed feature structures

The expressiveness of unification grammars
 Grammars for trans-context-free languages
 Turing equivalence
 The mathematics of feature structures

Computational processing of unification grammars
Textbooks:
5



Shuly Wintner & Nissim Francez, Unification Grammars (forthcoming)
Stuart M. Shieber. Constraint-Based Grammar Formalisms. MIT Press, 1992
Bob Carpenter, The Logic of Typed Feature Structures, Cambridge 1992
6
CSA5005 – Practical Dialogue Systems
Lecturer(s):
Semester:
ECTS Credits:
Contact Hours
Method of Assessment
Dr. Matthew Montebello & Mr. Michael Rosner
I & II
5
35
Test (80%), Coursework (20%)
This course will investigate the computational aspects of dialogue systems.
The first part of the course is largely devoted to the underlying computational
infrastructure and by offering a comprehensive introduction to the syntax,
semantics and features of Prolog, a well-known logic programming language that
has been used extensively in a wide variety of AI application areas. Teaching is
organised around a series of carefully chosen laboratory exercises.
The second part of the course identifies the main types of dialogue system and identifies specific
concepts and programming techniques for building a practical system. This will be developed
using Definite Clause Grammars, a simple and widely used formalism built on top of Prolog.
Method of Assessment:
Test: (80%)
Coursework: (20%)
Please note, that during the September Resit Sessions, assignment marks obtained during the first
sit will be retained.
Textbooks:




Sterling and E. Shapiro. The Art of Prolog (2nd Edition). MIT Press 1994. ISBN 0-26219338-8.
Callear D. Prolog Programming for Students. DP PULL. 1994 ISBN 1-85805-093-6
Pereira, F. & Shieber, S. Prolog and Natural Language Analysis, CSLI Publications and
http://www.mtome.com/Publications/PNLA/prolog-digital.pdf
Michael McTear, Spoken Dialogue Technology, ISBN 1852336722, Springer, 2004
7
CSA5006 – Logic, Representation and Inference
Lecturer(s):
Semester:
ECTS Credits:
Contact Hours
Method of Assessment:
Mr. Michael Rosner
2
4
28
Test (75%)
Coursework (25%)
This course introduces techniques for tackling the following issues:




What is semantic representation?
What is the relationship between semantic representation and logic?
What mechanisms are required to associate semantic representations with expressions of
natural language?
How can we use logical representations of natural language expressions to automate the
process of drawing inferences?
We will approach them by developing program modules that handle the key concepts of
representation and inference including.





First Order Logic
Lambda Calculus
Underspecified Representations
Propositional Inference
First Order Inference
Method of Assessment:
Test (75%)
Coursework (25%)
Textbooks:

Patrick Blackburn and Johan Bos, Representation and Inference for Natural Language,
Stanford: CSLI Publications, 2005
8
LIN2080 - Discourse Pragmatics I: Introduction to Discourse Analysis & Conversational
Pragmatics
Lecturer(s):
Semester:
ECTS Credits:
Contact Hours
Method of Assessment:
Mr. Paul A. Falzon
I and II
6
42 Hrs
Test (33%)
Assignment (67%)
Module1: Introduction to Discourse Analysis*
Learning Objectives
The unit is designed to provide students with an understanding of the fundamentals of Discourse
Analysis as well as an appreciation of the broad scope encompassed by the discipline.
Content Covered
Discourse Analysis is presented as a tool for the study of spoken and written language. The broad
scope of Discourse Analysis is discussed in terms of the plurality of meanings attached to
underpinning notions such as discourse, text and context. The course covers a range of
approaches to the study of discourse including Critical Discourse Analysis, Discourse Analysis of
the Birmingham School, Speech Act Theory, Textual Discourse Analysis, Narratology,
Interactional Sociolinguistics, Variation Analysis and the Ethnography of Communication.
The course will mention, but not develop, other discourse analytic approaches, e.g.
Ethnomethodological Conversation Analysis and Pragmatics.
Reading List:

Coulthard, M. (Ed.) (1994). Advances in written text analysis. London: Routledge.


Schiffrin, D. (1994). Approaches to discourse. Oxford: Blackwell.
Stillar, G. F. (1998). Analyzing everyday texts: Discourse, rhetoric, and social
perspectives. Thousand Oaks, CA: Sage.



Stubbs, M. (1996). Text and corpus analysis: Computer assisted studies of language and
culture. Oxford: Blackwell.
Van Dijk, T. A. (Ed.) (1997). Discourse studies: A multidisciplinary introduction:
Vol. 1. Discourse as structure and process. Thousand Oaks, CA: Sage.
Van Dijk, T. A. (Ed.) 1997. Discourse studies: A multidisciplinary introduction:
Vol. 2. Discourse as social interaction. Thousand Oaks, CA: Sage.
9
Module 2: Conversational Pragmatics I *
Learning Objectives
The course is designed to reach the following aims:
 To introduce students to different approaches to the analysis of conversation
 To introduce students to current developments in conversational pragmatics
Content Covered
The first part of the course will focus mainly on the Theoretical Component. A combination of
lectures, seminars and tutorials is employed.
The course introduces students to three of the more influential approaches to the study of
conversation, namely Ethnomethodological Conversation Analysis (CA), Discourse Analysis of
the Birmingham School and Clark’s socio-cognitive Theory of Language. Students are required
to give brief presentations on current issues in conversational pragmatics and its application to the
analysis of a range of conversational domains.
Reading List:





Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.
Hutchby, I., & Wooffitt, R. (1998). Conversation analysis: Principles, practices and
applications. Cambridge: Polity Press.
Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systemics for the
organization of turn-taking for conversation. Language, 50: 696-735.
Stenström, A. B. (1994). An Introduction to spoken interaction. London: Longman.
Ten Have, P. (1999). Doing conversation analysis: A practical guide. London: Sage.
10
Module 3: Conversational Pragmatics II
Learning Objectives
The course is designed to reach the following aims:
 To introduce students to more complex aspects of conversational pragmatics
 To increase students’ awareness of both intercultural and intracultural variation in
conversational practice
 To enable students to undertake systematic research into the nature of conversational practice
Content Covered
The second part of the course will focus on the Practical Component.
The course builds on knowledge acquired in Conversational Pragmatics I. Topics covered include
discourse markers, repair, openings, closings, preference organization, intersubjectivity and nextturn proof procedure. Data collection methods and related ethical considerations, transcription
procedures, and methods of data analysis are covered in the practical component. Students carry
out a study project involving data collection, transcription and analysis.
Reading List :





Antaki, C., & Widdicombe, S. (Eds.). (1998). Identities in talk. London: Sage.
Atkinson, J. M., & Heritage, J. (Eds.). (1984). Structures of social action: Studies in
conversation analysis. Cambridge: Cambridge University Press.
Button, G., & Lee, J. R. E. (Eds.). (1987). Talk and social organisation. Clevedon, Avon:
Multilingual Matters.
Hutchby, I., & Wooffitt, R. (1998). Conversation analysis: Principles, practices and
applications. Cambridge: Polity Press.
Ten Have, P. (1999). Doing conversation analysis: A practical guide. London: Sage.
*N.B. When parts of the study-unit, with the express permission of the lecturer, are taken on
their own by students whose area of study is not Linguistics, they shall be registered as
follows:
LIN2180 Introduction to Discourse Analysis
(2 ECTS, assessed by test)
LIN2280 Conversational Pragmatics I
(2 ECTS, assessed by test)
LIN2380 Conversational Pragmatics II
(2 ECTS, assessed by test)
11
BIT5103 - Introduction to Computer Science I
Lecturer(s):
Semester:
ECTS Credits:
Tutorials / Practicals:
Lectures:
Method of Assignment
Dr. Gordon Pace and Dr. John Abela
I & II
5
4 Hrs
25 Hrs
Assignment: (10%)
Test (90%)
Mathematics of Discrete Structures
Lecturer: Dr. Gordon Pace
The part of the course is primarily aimed to introduce the basic mathematical tools that are
required for the formal and rigorous treatment of the various aspects of computing.
The importance of formal reasoning is emphasised in the course, concentrating on syntax, and
formal proofs. The course also explains various mathematical notions and structures that will be
used in later courses.
Syllabus:






Propositional Calculus
Predicate Calculus
Set theory
Relations and Functions
Natural Numbers and cardinality
Graph theory
Algorithms and Data Structures
Lecturer: Dr. John Abela
The aim of the second part is to introduce the concepts of algorithm and data structure,
highlighting the relation which exists between the two. These concepts are introduced in a
gradual fashion, proceeding from abstract principles to concrete examples. Correctness and
efficiency will be emphasized as the main properties of algorithms.
In the first part of the course a number of algorithms will be discussed, with emphasis on sorting
and searching. Abstract data types (ADT’s) will be formally defined and illustrated with case
studies for list, stack, queue, priority queues and heaps, and the ADT table. The structure of
binary trees and associated algorithms will be investigated. In the second part of the course, the
‘Big O’ notation will be introduced as a formal framework for describing resource use (i.e. time
and space) of an algorithm. Further topics covered are: graphs and their associated searching and
traversal algorithms, hashing techniques, AVL trees, 2-3 trees, B-trees.
Reading List:

Mark Allen Weiss Data Structures and Algorithm Analysis Benjamin Cummings.
12

David Harel Algorithmics: The Spirit of Computing Addison-Wesley. Aho J.E. Hopcroft
J.D. Ullman Data Structures and Algorithms.
13
BIT5201 A.I. as Representation and Search
Lecturer(s):
Semester:
ECTS Credits:
Tutorials / Practicals:
Lectures:
Method of Assignment
Mr. Sandro Spina/ Kristian Guillaumier
II
5
4 Hrs
25 Hrs
Assignment: (30%)
Test (70%)
Programs which apparently exhibit intelligent behaviour (like for example winning a game of
chess) usually employ some sort of AI technique. This course will focus on the basic elements of
AI namely knowledge representation and search strategies. AI is intimately linked to the
representation of a given problem domain. This role of representation is to capture the essential
features of a problem domain and make that information accessible to the problem-solving
procedure. State space strategies are used to enumerate a number of solutions to a given problem
domain. The validity of this enumeration is manifest in the apparent "intelligence" of these
algorithms. The course is divided into the following three main sections:



Knowledge Representation
Strategies for State Space Search
Heuristic Search
Textbooks

George F Luger. Artifial Intelligence, Structures and Strategies for Complex Problem
Solving. Addison Wesley

Russell, Norvig. Artificial Intelligence A Modern Approach. Prentice Hall.
14
CSA2010 – Compiling Techniques
Lecturer(s):
Semester:
ECTS Credits:
Tutorials / Practicals:
Lectures:
Mr. Sandro Spina
1st
4
8 Hrs
20 Hrs
Compilers translate code from a source to a target language, the latter usually being a lower level
language. The main aim of this course is to equip students with the necessary knowledge required
to understand how modern compilers work. Moreover on a more practical note (as part of the
assignment) students will be building a compiler for a small imperative programming language.
The materials provided will be based on the Java programming language, however students can
opt to work with other programming languages such as C or Haskell. The course will cover
compilation both to JVM bytecode and native code. Apart from the usual topics associated with
compiling theory the course will also offer introductions to the areas of compiler correctness and
hardware compilers.
Topics covered include:











Grammars
lexers
parsers
abstract syntax
type systems (checking, derivations, type inference, etc
syntax-directed translation
code generation and analysis (JVM, native)
register allocation
optimisation
compiler correctness
hardware compilers
Method of Assessment:
Test: (70%)
Coursework: (30%)
The method of assessment for this unit will be consisting of a written exam covering 70% and an
assignment covering 30% of the final mark. For Resit sessions, the method of assessment will be
of a written exam of 70%. The 30% mark of the assignment can either be retained from the first
sit or another assignment submission can be done according to the preference of the student.
Textbooks:


Aho, Sethi, Ullman. /Compilers: Principles, Techniques, and Tools.
Andrew S. Appel, <http://www.cs.princeton.edu/%7Eappel/modern/java/>Modern
Compiler Implementation in Java, Cambridge University Press, 1998. ISBN 0-52160764-7
15
CSA5007 – Formal Methods & Automata
Lecturer(s):
Semester:
ECTS Credits:
Tutorials / Practicals:
Lectures:
Method of Assessment:
Dr. Gordon Pace
2
4
7 Hrs
21 Hrs
Test: (75%)
Coursework (75 %)
This course takes theoretical approach to the formal treatment of languages and automata (or
machines) to recognise languages. The aims are not only to instill the basic notions of languages,
grammars and automata using formal mathematical notation but also to provide a practical
perspective.
An assignment will be given involving the design of a parser based on the mathematical results.
Syllabus:







Finite State Transducers
Formal languages and grammars.
Regular languages: regular grammars, finite-state automata, regular expressions.
Context-free languages: context-free grammars, pushdown automata.
Closure properties of regular and context-free languages.
Normal forms for grammars.
Recognition algorithms for grammars.
The Resit will be in the form of one exam together with a possible resubmission of coursework if
failed at first sit.
Textbooks:



K. Beesley & L. Karttunen, Finite State Morphology, ISBN 1-57586-434 -7 CSLI
Press, 2003
V.J. Rayward-Smith, A First Course in Formal Language Theory, McGraw-Hill
Computer Science Series, 1995.
John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman, Introduction to Automata
Theory, Languages, and Computation (second edition), Addison-Wesley, ISBN
0201441241, 2001
16
BIT5105 - Programming in JAVA and Problem Solving Techniques
Lecturer(s):
Semester:
ECTS Credits:
Contact Hours
Dr. V. Nezval and Mr. J. Galea
2
5
35
This unit covers both the Java Language and important algorithms and data structures applied to
solving practical problems in the lab.
The accent will be given to writing efficient and correctly structured programs. Java language
topics will include structure of Java program, compilation and execution, concept of classes and
objects, data types, assignment, basic I/O using streams, if and switch statements, loops, methods,
arrays, strings, arrays of classes, utility classes, concept of applets with awt and swing classes.
Practical problem solutions will be based on use and application of basic algorithms in user
written programs both during practical sessions guided by tutor as well as by set of assignments
to be worked out independently at home and problems to be solved in laboratory and assessed by
a tutor. A gradual increase of load and difficulty will be adopted as the unit progresses.
Method of Assessment:
Coursework (100%)
Textbook
Deitel and Deitel, Java, How to Program, Prentice Hall
17
BIT5205 - Databases and their Implementations
Lecturer(s):
Semester:
ECTS Credits:
Contact Hours
Mr. Joseph Vella
2
5
35
The unit starts with an introduction to databases and Database Management Systems (DBMS) in
context of their role in Computer Information Systems. Also a quick summary of major
developments of databases, DBMSs and related computing artifacts is presented - e.g. for
example the development of CODASYL, ANSI/SPARC generalisation of databases and DBMSs,
and the emergence of the relational model. Also the main sub-systems expected in any DBMS are
explained.
The first effort of this unit is the understanding of data models and an introduction to a language
to model database schemas at an abstract level. This language is graphical in its representation of
models and is independent of any implementation or physical details – the favourite of this unit is
Chen's notation (and its derivatives).
The second effort is an introduction of a database model that is popular with the majority of
implementations - Codd's relational model. The initial part concerns understanding the relational
data model. We then study various languages that interact over the relational model: the relational
algebra and Structured Query Language (SQL). We also study how a database schema, specified
in an ERM diagram is converted into a set of SQL data definition constructs (e.g. CREATE
TABLE commands). Related to the relational database model is our concern to control data
redundancy in a database design, consequently we study Codd's original normal forms and their
later refinements. The third part of the units describes practical facets that deal with striving for
the DBMS to make efficient use of the available resources (e.g. RAM, HDs, communication
networks, tapes). These include data sharing, query processing, and sophisticated data definition
and manipulation languages. Also an important part is the emphasis of a multi tier
implementation of a computer information systems (three tier for presentation, business and data
processing) and how and with what can software developers design, implement and test these
tiers.
Method of Assessment:
Coursework (20%)
Exam (80%)
Textbooks:


R Elmasri & S Navathe, Fundamentals of Database Systems, Addison-Wesley
R Earp & S Bagui, Learning SQL, Addison-Wesley
18
CSA5008 - Introduction to Bioinformatics
Lecturer:
Semester:
ECTS Credits:
Contact Hours
Method of Assessment
Dr. John Abela
1
6
42
Exam (70%), Coursework (30%)
This course deals with the storage, processing, retrieval, analysis, and understanding of biological
information. This information is usually protein or DNA sequences. This aim of the course is to
show that analysis of these sequences leads to a much fuller understanding of many biological
processes allowing drug designers, scientists, pharmaceutical and biotechnology companies to
determine, for example, new drug targets or to predict of a particular drug is applicable to all
patients. Students will be introduced to the basic concepts behind Bioinformatics and
Computational Biology tools
The first part of the course deals with string processing and analysis algorithms. Topics covered
include:







Formal Languages
String edit distance.
Suffix trees
Multiple string comparison
Indexing
String searching.
String Matching.
The second part of the course deals with applying the above algorithms in Bioinformatics. Topics
covered include:





Protein and DNA sequences.
Alignment algorithms.
Sequence classification
AI techniques applied to sequence analysis
The protein folding prediction problem.
Textbooks:
19


Algorithms on Strings, Trees, and Sequences. Dan Gusfield.
Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis.
Eidhammer, Jonassen, and Taylor.
20
CSA3208 - Agent Technologies
Lecturer:
Semester:
ECTS Credits:
Lectures:
Dr. Matthew Montebello & Mr. Charlie Abela
TBA
6
42 Hrs
The first part of this course gives an overview of the state of the art in agent research and
technologies with reference to applications in a variety of domains including: Internet-based
information systems, adaptive (customizable) software systems, autonomous mobile and
immobile robots, data mining and knowledge discovery, smart systems (smart homes, smart
automobiles, etc.), decision support systems, and intelligent design and manufacturing
systems. The second part will concentrate on employment of such software agents to practical
and intelligent applications. It will build on issues covered in the first part with particular
interest in areas of agent application like electronic commerce, recommendation systems,
auctions, information retrieval over the WWW, and other commercial and cutting-edge
scenarios Some of the topics covered are: basics (history, subject matter), software
architecture, properties and models of agents, agent inter connectors and agent systems, aspect
models, mobility, co-ordination and security, architecture types for agent-based application
systems, commercial agent application, standardization efforts, web services, ontologies,
mark-up languages, semantic web and future directions.
Method of Assessment:
Test: (70%)
Coursework: (30%)
Textbooks:




N.R. Jennings & M.J. Wooldridge (Editors), Agent Technology, (1998), Springer Verlag, ISBN: 3540635912
D.N. Chorafas, Agent Technology Handbook, (2000), McGraw-Hill, ISBN: 0070119236
R.Murch & T. Johnson, Intelligent Software Agents, (1998), Prentice Hall, ISBN:
0130110213
Website:
http://staff.um.edu.mt/mmon1/lectures/csa3210/
21
CSA3212 – User-Adaptive Systems
Lecturer:
Semester:
ECTS Credits:
Lectures:
Dr. Christopher Staff
TBA
6
42 Hrs
User-Adaptive Systems are systems that are able to discover, represent, and manipulate, user
interests and requirements as users navigate and search through an information space, and then
adapt the organisation of and the presentation of information accordingly. This study-unit
explores the history of user-adaptive systems and delves into essential components of useradaptive systems: user modelling, information and knowledge representation, information
retrieval, adaptation techniques, and hypertext systems.
Method of Assessment:
Test: (100%)
Main textbooks (recommended):




Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern Information Retrieval. AddisonWesley.
Brusilovsky, P. (1996) Methods and techniques of adaptive hypermedia. In User
Modeling and User Adapted Interaction, 6 (2-3), pp. 87-129. Available on-line at:
http://www.contrib.andrew.cmu.edu/~plb/UMUAI.ps
Berners-Lee, T., Hendler, J., and Lassila, O. (2001), The Semantic Web. In Scientific
American, May 2001. Available on-line at:
http://www.scientificamerican.com/issue.cfm?issuedate=May-01
Balasubramanian, V. (1994). A State of the Art Review on Hypermedia Issues and
Applications. Available on-line at
http://citeseer.nj.nec.com/balasubramanian94state.html.
22
CSA5009 Information Extraction
Lecturers:
Semester:
ECTS Credits:
Contact Hours
Method of Assessment:
Mr. Angelo Dalli
I
6
42
Coursework (100%)
Information Extraction is an important area of modern Natural Language Processing and
Information Retrieval, enabling computers to identify named entities, numbers, and other types of
data automatically generally from unstructured data. This course will cover:










Text Classification
Information Extraction Techniques
Links with the Semantic Web
MUC and TREC systems
Named Entity Recognition
Anaphora Resolution
Multi-Source IE
Multi-Lingual IE
Simple Question Answering
Simple Discourse Analysis
Various examples and approached from MUC and TREC systems will be examined. Some
practical examples using the University of Sheffield's GATE system will compliment the
theoretical aspects of this course.
Texts

Soumen Chakrabarti. 2002. “Mining the Web: Discovering Knowledge from Hypertext
Data”. Morgan Kaufmann, ISBN 978-1558607545.
23
CSA5010 Text Data Mining / Clustering
Lecturers:
Semester:
ECTS Credits:
Contact Hours
Method of Assessment
Mr. Angelo Dalli
II
6
42 Hrs
Test (100%)
Text Data Mining / Clustering is an exciting area of Human Language Processing and Business
Intelligence, enabling new insights to be gained from unstructured data. Aspects of handling large
unstructured datasets will be discussed, together with appropriate tools and techniques necessary
for the handling of such datasets, including the UIMA architecture. Text classification will be
treated briefly and compared and contrasted with clustering approached. Various clustering
techniques will be covered ranging from simple perception and winnow algorithms to more
advanced techniques. This course will cover the following topics:














Data Handling and Preparation Issues
UIMA Architecture
Linear and Non-Linear Classification
Binary and Multi-Class Classification
Differences between Classification and Clustering
Use of Clustering for Author Identification
Feature Selection Techniques
Perceptron and Winnow Algorithms
Commonalities with Neural Networks
Decision Trees
Support Vector Machines
kNN Clustering
Kernel Methods
Naive Bayes
Some practical examples using tools such as the WEKA toolkit will complement the theoretical
aspects of this course, together with practical examples using the University of Sheffield's GATE
system.
Texts:


Ian Witten, Eibe Frank. 2005. “Data Mining: Practical Machine Learning Tools and
Techniques (Second Edition)”. Morgan Kaufmann. ISBN 0-12-088407-0
Jiawei Han, Micheline Kamber. 2005. “Data Mining, Second Edition, Second Edition :
Concepts and Techniques”. Morgan Kaufmann. ISBN 978-1558609013
24
BIT5307 - Speech Technology with Digital Signal Processing
Lecturers:
Semester:
ECTS Credits:
Tutorials / Practicals:
Lectures:
Assessment:
Prof. Paul Micallef
2
5
None
30 Hrs
Test (60%) Coursework (40%)
The aim of this unit is to introduce the student to basic techniques for handling speech signals and
to the higher level issues of speech technology. The topics will include:
Introduction to Speech Technology
 Speech and Hearing;Vocal Chords and Pitch;Vocal System; Articulatory Model;
 Phones; Formants of Phonemes
Speech Analysis
 Time Waveform; The relationship between time information and frequency
 Information; Pitch Period, Harmonics; Frequency Spectrum
 Introduction to Digital Signal Processing; Sampling and Aliasing;
 The Linear Predictive Coding Model;
 The Spectral Envelope;
 Segmentation of Speech; Acoustic Parameters
Speech Synthesis
 Segment concatenation; Harmonic Model; LPC Model; Problems of Noise’
 PSOLA and MBROLA; Intonation and Intonation Modelling
Text-to-Speech Synthesis
 The Grapheme to Phoneme Problem; Rule Based and Neural based Solutions;
 The Bilingual Problem; Analysis of broad phrases; Phonetic Assembly;
 Duration and Stress;
Speech Corpora
 Need for annotated corpora; Spoken Corpora Types; Methods used for
 Annotation; Relation between Annotation and Recognition
Speech Recognition
 Speech parameters used for recognition;
 Tools available:The statistical approach: Hidden Markov Model,Neural nets;
 Problems of background noise; Problems of variability
Reading List:


W. and J. Holmes, Speech Synthesis and Recognition, Taylor & Francis (2001), ISBN:
0748408576
L. Rabiner and B-H. Juang, Fundamentals of speech recognition, Englewood Cliffs, NJ ; PTR
Prentice Hall, 1993
25
CSA5011 Seminar
Lecturers:
Various
ECTS Credits:
4
Contact Hours:
28
Method of Assessment: Coursework: (75%); Presentation: (25%)
This study unit aims to give the student the opportunity to research in depth and deliver a critical
analysis of a specialized topic. In the process, the unit should enhance the student’s ability to
research and report in a professional scientific manner. A choice of topics will be offered by
lecturers to the students.
Students taking the unit will be assigned a topic, accompanied by a series of readings by the
lecturer. It is expected that the student will research the area by studying the given material,
supported by additional papers and books that the student is expected to discover as part of his or
her research. Regular meetings with the student’s supervisor will ensure that the research is duly
carried out.
At the end of the unit, the student will be expected to submit a detailed, and professional
scientific report, which should take the form of a literature review. Furthermore, the student is
also expected to deliver a presentation of his or her findings.
26