Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HLST COURSE CATALOGUE 2007/8 1 CSA5002 – Corpora and Statistical Methods Lecturer(s): Semester: ECTS Credits: Contact Hours: Method of Assessment: Dr. Albert Gatt I 6 42 Test: (85%) Coursework: (15%) Aims This course provides a grounding in (i) foundational statistical methods, especially probability, information theory, and statistical inference and (ii) corpus design, annotation and construction and the use of these to: Conduct linguistic research, whose aim is to test empirical hypotheses about language and make generalisations; Build Natural Language Processing systems (e.g. parsers, thesauri, generators) which differ from traditional rule-based or “symbol-processing” systems in that their core is a statistical language model derived from corpus data. Syllabus This course will be divided into three parts. Part I deals with introductory material and some of the mathematical background. An important aspect of this is to provide students with exposure to existing corpora and also existing tools for corpus-based research, corpus building, and corpus annotation. Another important aspect is the use of the web as corpus. Part II focuses in detail on particular areas of corpus-based research in NLP, and the methods used including Research on words, word distributions, word frequencies and collocations Semantic similarity and corpus-derived thesauri N-gram language models for parsers and generators. Machine-learning techniques (both statistical and “rule-based”, where the latter involves the class of rule learners that infer symbolic/production rules from annotated corpora). Part III aims to provide a more comprehensive picture of state-of-the art NLP research using corpora including Statistical Parsing: An overview of recent work in this area, covering TAG Grammars and the RASP parser (Carroll et al); parsers trained on treebanks (Charniak, Collins). Statistical Generation: This will mainly cover statistical language realisers, which take as input a semantic form, and output a natural language expression. Recent work in this area includes the overgeneration-and-ranking approach (Knight, Langkilde-Geary, Varges). Some recent work that applies statistical techniques to less “surface-oriented” issues (including content determination for NLG systems) Bibliography 2 Key Papers by Carroll, Charniak, Collins, Knight, Lankilde-Geary, Varges 3 CSA5003 – Finite State Machinery and Computational Morphology Lecturer(s): Semester: ECTS Credits: Contact Hours: Method of Assessment Dr. Gertjan van Noord, University of Groningen II 4 28 Coursework (100%) Linguistic Morphology studies the internal structure of words. The main issues that have to be considered are (a) how a word is segmented into its component parts, (b) which parts are common to different forms of the same word and (c) how the parts interact with each other to define the particular nature of a given wordform. Computational morphology attempts to shed light on these issues by building computational models. For the most part these models are based on Finite State Automata of different kinds. The aim of this course is to present linguistic issues and then provide examples of computational approaches to the area. Use will be made of the FSA Utilities toolbox developed in Groningen: a collection of utilities to manipulate regular expressions, finite-state automata and finite-state transducers. Manipulations include automata construction from regular expresssions, determinization (both for finite-state acceptors and finite-state transducers), minimization, composition, complementation, intersection, Kleene closure, etc. Various visualization tools are available to browse finite-state automata. Interpreters are provided to apply finite automata. Finite automata can also be compiled into stand-alone C programs. Texts: Lauri Karttunen, Kimmo Koskenniemi, Gertjan van Noord. Special issue: Finite State Methods in Natural Language Processing. Natural Language Engineering. Volume 9, Part 1, March 2003. 4 CSA5004 – Unification Grammar Lecturer(s): Semester: ECTS Credits: Contact Hours: Method of Assessment Dr. Shuly Wintner, University of Haifa. II 5 35 Hrs Coursework (100%) Description: The course introduces the foundations of some of the major formalisms used in computational linguistics nowadays, providing both the linguistic motivation and the necessary mathematical infrastructure. Syllabus: Context-free grammars Basics: strings, grammars, derivations, languages, trees Properties of CFGs The (in)adequacy of CFGs for describing natural languages Extending CFGs: feature structures Motivation Properties: features, values, variables, paths, reentrancy Subsumption and unification Representing lists, trees and graphs Unification grammars Adding features to rules Multi-AVMs, forms, derivations, languages, trees Internalizing categories Linguistic examples Imposing subject-verb agreement Case control Subcategorization Unbounded dependencies Coordination Typed feature structures The expressiveness of unification grammars Grammars for trans-context-free languages Turing equivalence The mathematics of feature structures Computational processing of unification grammars Textbooks: 5 Shuly Wintner & Nissim Francez, Unification Grammars (forthcoming) Stuart M. Shieber. Constraint-Based Grammar Formalisms. MIT Press, 1992 Bob Carpenter, The Logic of Typed Feature Structures, Cambridge 1992 6 CSA5005 – Practical Dialogue Systems Lecturer(s): Semester: ECTS Credits: Contact Hours Method of Assessment Dr. Matthew Montebello & Mr. Michael Rosner I & II 5 35 Test (80%), Coursework (20%) This course will investigate the computational aspects of dialogue systems. The first part of the course is largely devoted to the underlying computational infrastructure and by offering a comprehensive introduction to the syntax, semantics and features of Prolog, a well-known logic programming language that has been used extensively in a wide variety of AI application areas. Teaching is organised around a series of carefully chosen laboratory exercises. The second part of the course identifies the main types of dialogue system and identifies specific concepts and programming techniques for building a practical system. This will be developed using Definite Clause Grammars, a simple and widely used formalism built on top of Prolog. Method of Assessment: Test: (80%) Coursework: (20%) Please note, that during the September Resit Sessions, assignment marks obtained during the first sit will be retained. Textbooks: Sterling and E. Shapiro. The Art of Prolog (2nd Edition). MIT Press 1994. ISBN 0-26219338-8. Callear D. Prolog Programming for Students. DP PULL. 1994 ISBN 1-85805-093-6 Pereira, F. & Shieber, S. Prolog and Natural Language Analysis, CSLI Publications and http://www.mtome.com/Publications/PNLA/prolog-digital.pdf Michael McTear, Spoken Dialogue Technology, ISBN 1852336722, Springer, 2004 7 CSA5006 – Logic, Representation and Inference Lecturer(s): Semester: ECTS Credits: Contact Hours Method of Assessment: Mr. Michael Rosner 2 4 28 Test (75%) Coursework (25%) This course introduces techniques for tackling the following issues: What is semantic representation? What is the relationship between semantic representation and logic? What mechanisms are required to associate semantic representations with expressions of natural language? How can we use logical representations of natural language expressions to automate the process of drawing inferences? We will approach them by developing program modules that handle the key concepts of representation and inference including. First Order Logic Lambda Calculus Underspecified Representations Propositional Inference First Order Inference Method of Assessment: Test (75%) Coursework (25%) Textbooks: Patrick Blackburn and Johan Bos, Representation and Inference for Natural Language, Stanford: CSLI Publications, 2005 8 LIN2080 - Discourse Pragmatics I: Introduction to Discourse Analysis & Conversational Pragmatics Lecturer(s): Semester: ECTS Credits: Contact Hours Method of Assessment: Mr. Paul A. Falzon I and II 6 42 Hrs Test (33%) Assignment (67%) Module1: Introduction to Discourse Analysis* Learning Objectives The unit is designed to provide students with an understanding of the fundamentals of Discourse Analysis as well as an appreciation of the broad scope encompassed by the discipline. Content Covered Discourse Analysis is presented as a tool for the study of spoken and written language. The broad scope of Discourse Analysis is discussed in terms of the plurality of meanings attached to underpinning notions such as discourse, text and context. The course covers a range of approaches to the study of discourse including Critical Discourse Analysis, Discourse Analysis of the Birmingham School, Speech Act Theory, Textual Discourse Analysis, Narratology, Interactional Sociolinguistics, Variation Analysis and the Ethnography of Communication. The course will mention, but not develop, other discourse analytic approaches, e.g. Ethnomethodological Conversation Analysis and Pragmatics. Reading List: Coulthard, M. (Ed.) (1994). Advances in written text analysis. London: Routledge. Schiffrin, D. (1994). Approaches to discourse. Oxford: Blackwell. Stillar, G. F. (1998). Analyzing everyday texts: Discourse, rhetoric, and social perspectives. Thousand Oaks, CA: Sage. Stubbs, M. (1996). Text and corpus analysis: Computer assisted studies of language and culture. Oxford: Blackwell. Van Dijk, T. A. (Ed.) (1997). Discourse studies: A multidisciplinary introduction: Vol. 1. Discourse as structure and process. Thousand Oaks, CA: Sage. Van Dijk, T. A. (Ed.) 1997. Discourse studies: A multidisciplinary introduction: Vol. 2. Discourse as social interaction. Thousand Oaks, CA: Sage. 9 Module 2: Conversational Pragmatics I * Learning Objectives The course is designed to reach the following aims: To introduce students to different approaches to the analysis of conversation To introduce students to current developments in conversational pragmatics Content Covered The first part of the course will focus mainly on the Theoretical Component. A combination of lectures, seminars and tutorials is employed. The course introduces students to three of the more influential approaches to the study of conversation, namely Ethnomethodological Conversation Analysis (CA), Discourse Analysis of the Birmingham School and Clark’s socio-cognitive Theory of Language. Students are required to give brief presentations on current issues in conversational pragmatics and its application to the analysis of a range of conversational domains. Reading List: Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. Hutchby, I., & Wooffitt, R. (1998). Conversation analysis: Principles, practices and applications. Cambridge: Polity Press. Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systemics for the organization of turn-taking for conversation. Language, 50: 696-735. Stenström, A. B. (1994). An Introduction to spoken interaction. London: Longman. Ten Have, P. (1999). Doing conversation analysis: A practical guide. London: Sage. 10 Module 3: Conversational Pragmatics II Learning Objectives The course is designed to reach the following aims: To introduce students to more complex aspects of conversational pragmatics To increase students’ awareness of both intercultural and intracultural variation in conversational practice To enable students to undertake systematic research into the nature of conversational practice Content Covered The second part of the course will focus on the Practical Component. The course builds on knowledge acquired in Conversational Pragmatics I. Topics covered include discourse markers, repair, openings, closings, preference organization, intersubjectivity and nextturn proof procedure. Data collection methods and related ethical considerations, transcription procedures, and methods of data analysis are covered in the practical component. Students carry out a study project involving data collection, transcription and analysis. Reading List : Antaki, C., & Widdicombe, S. (Eds.). (1998). Identities in talk. London: Sage. Atkinson, J. M., & Heritage, J. (Eds.). (1984). Structures of social action: Studies in conversation analysis. Cambridge: Cambridge University Press. Button, G., & Lee, J. R. E. (Eds.). (1987). Talk and social organisation. Clevedon, Avon: Multilingual Matters. Hutchby, I., & Wooffitt, R. (1998). Conversation analysis: Principles, practices and applications. Cambridge: Polity Press. Ten Have, P. (1999). Doing conversation analysis: A practical guide. London: Sage. *N.B. When parts of the study-unit, with the express permission of the lecturer, are taken on their own by students whose area of study is not Linguistics, they shall be registered as follows: LIN2180 Introduction to Discourse Analysis (2 ECTS, assessed by test) LIN2280 Conversational Pragmatics I (2 ECTS, assessed by test) LIN2380 Conversational Pragmatics II (2 ECTS, assessed by test) 11 BIT5103 - Introduction to Computer Science I Lecturer(s): Semester: ECTS Credits: Tutorials / Practicals: Lectures: Method of Assignment Dr. Gordon Pace and Dr. John Abela I & II 5 4 Hrs 25 Hrs Assignment: (10%) Test (90%) Mathematics of Discrete Structures Lecturer: Dr. Gordon Pace The part of the course is primarily aimed to introduce the basic mathematical tools that are required for the formal and rigorous treatment of the various aspects of computing. The importance of formal reasoning is emphasised in the course, concentrating on syntax, and formal proofs. The course also explains various mathematical notions and structures that will be used in later courses. Syllabus: Propositional Calculus Predicate Calculus Set theory Relations and Functions Natural Numbers and cardinality Graph theory Algorithms and Data Structures Lecturer: Dr. John Abela The aim of the second part is to introduce the concepts of algorithm and data structure, highlighting the relation which exists between the two. These concepts are introduced in a gradual fashion, proceeding from abstract principles to concrete examples. Correctness and efficiency will be emphasized as the main properties of algorithms. In the first part of the course a number of algorithms will be discussed, with emphasis on sorting and searching. Abstract data types (ADT’s) will be formally defined and illustrated with case studies for list, stack, queue, priority queues and heaps, and the ADT table. The structure of binary trees and associated algorithms will be investigated. In the second part of the course, the ‘Big O’ notation will be introduced as a formal framework for describing resource use (i.e. time and space) of an algorithm. Further topics covered are: graphs and their associated searching and traversal algorithms, hashing techniques, AVL trees, 2-3 trees, B-trees. Reading List: Mark Allen Weiss Data Structures and Algorithm Analysis Benjamin Cummings. 12 David Harel Algorithmics: The Spirit of Computing Addison-Wesley. Aho J.E. Hopcroft J.D. Ullman Data Structures and Algorithms. 13 BIT5201 A.I. as Representation and Search Lecturer(s): Semester: ECTS Credits: Tutorials / Practicals: Lectures: Method of Assignment Mr. Sandro Spina/ Kristian Guillaumier II 5 4 Hrs 25 Hrs Assignment: (30%) Test (70%) Programs which apparently exhibit intelligent behaviour (like for example winning a game of chess) usually employ some sort of AI technique. This course will focus on the basic elements of AI namely knowledge representation and search strategies. AI is intimately linked to the representation of a given problem domain. This role of representation is to capture the essential features of a problem domain and make that information accessible to the problem-solving procedure. State space strategies are used to enumerate a number of solutions to a given problem domain. The validity of this enumeration is manifest in the apparent "intelligence" of these algorithms. The course is divided into the following three main sections: Knowledge Representation Strategies for State Space Search Heuristic Search Textbooks George F Luger. Artifial Intelligence, Structures and Strategies for Complex Problem Solving. Addison Wesley Russell, Norvig. Artificial Intelligence A Modern Approach. Prentice Hall. 14 CSA2010 – Compiling Techniques Lecturer(s): Semester: ECTS Credits: Tutorials / Practicals: Lectures: Mr. Sandro Spina 1st 4 8 Hrs 20 Hrs Compilers translate code from a source to a target language, the latter usually being a lower level language. The main aim of this course is to equip students with the necessary knowledge required to understand how modern compilers work. Moreover on a more practical note (as part of the assignment) students will be building a compiler for a small imperative programming language. The materials provided will be based on the Java programming language, however students can opt to work with other programming languages such as C or Haskell. The course will cover compilation both to JVM bytecode and native code. Apart from the usual topics associated with compiling theory the course will also offer introductions to the areas of compiler correctness and hardware compilers. Topics covered include: Grammars lexers parsers abstract syntax type systems (checking, derivations, type inference, etc syntax-directed translation code generation and analysis (JVM, native) register allocation optimisation compiler correctness hardware compilers Method of Assessment: Test: (70%) Coursework: (30%) The method of assessment for this unit will be consisting of a written exam covering 70% and an assignment covering 30% of the final mark. For Resit sessions, the method of assessment will be of a written exam of 70%. The 30% mark of the assignment can either be retained from the first sit or another assignment submission can be done according to the preference of the student. Textbooks: Aho, Sethi, Ullman. /Compilers: Principles, Techniques, and Tools. Andrew S. Appel, <http://www.cs.princeton.edu/%7Eappel/modern/java/>Modern Compiler Implementation in Java, Cambridge University Press, 1998. ISBN 0-52160764-7 15 CSA5007 – Formal Methods & Automata Lecturer(s): Semester: ECTS Credits: Tutorials / Practicals: Lectures: Method of Assessment: Dr. Gordon Pace 2 4 7 Hrs 21 Hrs Test: (75%) Coursework (75 %) This course takes theoretical approach to the formal treatment of languages and automata (or machines) to recognise languages. The aims are not only to instill the basic notions of languages, grammars and automata using formal mathematical notation but also to provide a practical perspective. An assignment will be given involving the design of a parser based on the mathematical results. Syllabus: Finite State Transducers Formal languages and grammars. Regular languages: regular grammars, finite-state automata, regular expressions. Context-free languages: context-free grammars, pushdown automata. Closure properties of regular and context-free languages. Normal forms for grammars. Recognition algorithms for grammars. The Resit will be in the form of one exam together with a possible resubmission of coursework if failed at first sit. Textbooks: K. Beesley & L. Karttunen, Finite State Morphology, ISBN 1-57586-434 -7 CSLI Press, 2003 V.J. Rayward-Smith, A First Course in Formal Language Theory, McGraw-Hill Computer Science Series, 1995. John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman, Introduction to Automata Theory, Languages, and Computation (second edition), Addison-Wesley, ISBN 0201441241, 2001 16 BIT5105 - Programming in JAVA and Problem Solving Techniques Lecturer(s): Semester: ECTS Credits: Contact Hours Dr. V. Nezval and Mr. J. Galea 2 5 35 This unit covers both the Java Language and important algorithms and data structures applied to solving practical problems in the lab. The accent will be given to writing efficient and correctly structured programs. Java language topics will include structure of Java program, compilation and execution, concept of classes and objects, data types, assignment, basic I/O using streams, if and switch statements, loops, methods, arrays, strings, arrays of classes, utility classes, concept of applets with awt and swing classes. Practical problem solutions will be based on use and application of basic algorithms in user written programs both during practical sessions guided by tutor as well as by set of assignments to be worked out independently at home and problems to be solved in laboratory and assessed by a tutor. A gradual increase of load and difficulty will be adopted as the unit progresses. Method of Assessment: Coursework (100%) Textbook Deitel and Deitel, Java, How to Program, Prentice Hall 17 BIT5205 - Databases and their Implementations Lecturer(s): Semester: ECTS Credits: Contact Hours Mr. Joseph Vella 2 5 35 The unit starts with an introduction to databases and Database Management Systems (DBMS) in context of their role in Computer Information Systems. Also a quick summary of major developments of databases, DBMSs and related computing artifacts is presented - e.g. for example the development of CODASYL, ANSI/SPARC generalisation of databases and DBMSs, and the emergence of the relational model. Also the main sub-systems expected in any DBMS are explained. The first effort of this unit is the understanding of data models and an introduction to a language to model database schemas at an abstract level. This language is graphical in its representation of models and is independent of any implementation or physical details – the favourite of this unit is Chen's notation (and its derivatives). The second effort is an introduction of a database model that is popular with the majority of implementations - Codd's relational model. The initial part concerns understanding the relational data model. We then study various languages that interact over the relational model: the relational algebra and Structured Query Language (SQL). We also study how a database schema, specified in an ERM diagram is converted into a set of SQL data definition constructs (e.g. CREATE TABLE commands). Related to the relational database model is our concern to control data redundancy in a database design, consequently we study Codd's original normal forms and their later refinements. The third part of the units describes practical facets that deal with striving for the DBMS to make efficient use of the available resources (e.g. RAM, HDs, communication networks, tapes). These include data sharing, query processing, and sophisticated data definition and manipulation languages. Also an important part is the emphasis of a multi tier implementation of a computer information systems (three tier for presentation, business and data processing) and how and with what can software developers design, implement and test these tiers. Method of Assessment: Coursework (20%) Exam (80%) Textbooks: R Elmasri & S Navathe, Fundamentals of Database Systems, Addison-Wesley R Earp & S Bagui, Learning SQL, Addison-Wesley 18 CSA5008 - Introduction to Bioinformatics Lecturer: Semester: ECTS Credits: Contact Hours Method of Assessment Dr. John Abela 1 6 42 Exam (70%), Coursework (30%) This course deals with the storage, processing, retrieval, analysis, and understanding of biological information. This information is usually protein or DNA sequences. This aim of the course is to show that analysis of these sequences leads to a much fuller understanding of many biological processes allowing drug designers, scientists, pharmaceutical and biotechnology companies to determine, for example, new drug targets or to predict of a particular drug is applicable to all patients. Students will be introduced to the basic concepts behind Bioinformatics and Computational Biology tools The first part of the course deals with string processing and analysis algorithms. Topics covered include: Formal Languages String edit distance. Suffix trees Multiple string comparison Indexing String searching. String Matching. The second part of the course deals with applying the above algorithms in Bioinformatics. Topics covered include: Protein and DNA sequences. Alignment algorithms. Sequence classification AI techniques applied to sequence analysis The protein folding prediction problem. Textbooks: 19 Algorithms on Strings, Trees, and Sequences. Dan Gusfield. Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis. Eidhammer, Jonassen, and Taylor. 20 CSA3208 - Agent Technologies Lecturer: Semester: ECTS Credits: Lectures: Dr. Matthew Montebello & Mr. Charlie Abela TBA 6 42 Hrs The first part of this course gives an overview of the state of the art in agent research and technologies with reference to applications in a variety of domains including: Internet-based information systems, adaptive (customizable) software systems, autonomous mobile and immobile robots, data mining and knowledge discovery, smart systems (smart homes, smart automobiles, etc.), decision support systems, and intelligent design and manufacturing systems. The second part will concentrate on employment of such software agents to practical and intelligent applications. It will build on issues covered in the first part with particular interest in areas of agent application like electronic commerce, recommendation systems, auctions, information retrieval over the WWW, and other commercial and cutting-edge scenarios Some of the topics covered are: basics (history, subject matter), software architecture, properties and models of agents, agent inter connectors and agent systems, aspect models, mobility, co-ordination and security, architecture types for agent-based application systems, commercial agent application, standardization efforts, web services, ontologies, mark-up languages, semantic web and future directions. Method of Assessment: Test: (70%) Coursework: (30%) Textbooks: N.R. Jennings & M.J. Wooldridge (Editors), Agent Technology, (1998), Springer Verlag, ISBN: 3540635912 D.N. Chorafas, Agent Technology Handbook, (2000), McGraw-Hill, ISBN: 0070119236 R.Murch & T. Johnson, Intelligent Software Agents, (1998), Prentice Hall, ISBN: 0130110213 Website: http://staff.um.edu.mt/mmon1/lectures/csa3210/ 21 CSA3212 – User-Adaptive Systems Lecturer: Semester: ECTS Credits: Lectures: Dr. Christopher Staff TBA 6 42 Hrs User-Adaptive Systems are systems that are able to discover, represent, and manipulate, user interests and requirements as users navigate and search through an information space, and then adapt the organisation of and the presentation of information accordingly. This study-unit explores the history of user-adaptive systems and delves into essential components of useradaptive systems: user modelling, information and knowledge representation, information retrieval, adaptation techniques, and hypertext systems. Method of Assessment: Test: (100%) Main textbooks (recommended): Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern Information Retrieval. AddisonWesley. Brusilovsky, P. (1996) Methods and techniques of adaptive hypermedia. In User Modeling and User Adapted Interaction, 6 (2-3), pp. 87-129. Available on-line at: http://www.contrib.andrew.cmu.edu/~plb/UMUAI.ps Berners-Lee, T., Hendler, J., and Lassila, O. (2001), The Semantic Web. In Scientific American, May 2001. Available on-line at: http://www.scientificamerican.com/issue.cfm?issuedate=May-01 Balasubramanian, V. (1994). A State of the Art Review on Hypermedia Issues and Applications. Available on-line at http://citeseer.nj.nec.com/balasubramanian94state.html. 22 CSA5009 Information Extraction Lecturers: Semester: ECTS Credits: Contact Hours Method of Assessment: Mr. Angelo Dalli I 6 42 Coursework (100%) Information Extraction is an important area of modern Natural Language Processing and Information Retrieval, enabling computers to identify named entities, numbers, and other types of data automatically generally from unstructured data. This course will cover: Text Classification Information Extraction Techniques Links with the Semantic Web MUC and TREC systems Named Entity Recognition Anaphora Resolution Multi-Source IE Multi-Lingual IE Simple Question Answering Simple Discourse Analysis Various examples and approached from MUC and TREC systems will be examined. Some practical examples using the University of Sheffield's GATE system will compliment the theoretical aspects of this course. Texts Soumen Chakrabarti. 2002. “Mining the Web: Discovering Knowledge from Hypertext Data”. Morgan Kaufmann, ISBN 978-1558607545. 23 CSA5010 Text Data Mining / Clustering Lecturers: Semester: ECTS Credits: Contact Hours Method of Assessment Mr. Angelo Dalli II 6 42 Hrs Test (100%) Text Data Mining / Clustering is an exciting area of Human Language Processing and Business Intelligence, enabling new insights to be gained from unstructured data. Aspects of handling large unstructured datasets will be discussed, together with appropriate tools and techniques necessary for the handling of such datasets, including the UIMA architecture. Text classification will be treated briefly and compared and contrasted with clustering approached. Various clustering techniques will be covered ranging from simple perception and winnow algorithms to more advanced techniques. This course will cover the following topics: Data Handling and Preparation Issues UIMA Architecture Linear and Non-Linear Classification Binary and Multi-Class Classification Differences between Classification and Clustering Use of Clustering for Author Identification Feature Selection Techniques Perceptron and Winnow Algorithms Commonalities with Neural Networks Decision Trees Support Vector Machines kNN Clustering Kernel Methods Naive Bayes Some practical examples using tools such as the WEKA toolkit will complement the theoretical aspects of this course, together with practical examples using the University of Sheffield's GATE system. Texts: Ian Witten, Eibe Frank. 2005. “Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)”. Morgan Kaufmann. ISBN 0-12-088407-0 Jiawei Han, Micheline Kamber. 2005. “Data Mining, Second Edition, Second Edition : Concepts and Techniques”. Morgan Kaufmann. ISBN 978-1558609013 24 BIT5307 - Speech Technology with Digital Signal Processing Lecturers: Semester: ECTS Credits: Tutorials / Practicals: Lectures: Assessment: Prof. Paul Micallef 2 5 None 30 Hrs Test (60%) Coursework (40%) The aim of this unit is to introduce the student to basic techniques for handling speech signals and to the higher level issues of speech technology. The topics will include: Introduction to Speech Technology Speech and Hearing;Vocal Chords and Pitch;Vocal System; Articulatory Model; Phones; Formants of Phonemes Speech Analysis Time Waveform; The relationship between time information and frequency Information; Pitch Period, Harmonics; Frequency Spectrum Introduction to Digital Signal Processing; Sampling and Aliasing; The Linear Predictive Coding Model; The Spectral Envelope; Segmentation of Speech; Acoustic Parameters Speech Synthesis Segment concatenation; Harmonic Model; LPC Model; Problems of Noise’ PSOLA and MBROLA; Intonation and Intonation Modelling Text-to-Speech Synthesis The Grapheme to Phoneme Problem; Rule Based and Neural based Solutions; The Bilingual Problem; Analysis of broad phrases; Phonetic Assembly; Duration and Stress; Speech Corpora Need for annotated corpora; Spoken Corpora Types; Methods used for Annotation; Relation between Annotation and Recognition Speech Recognition Speech parameters used for recognition; Tools available:The statistical approach: Hidden Markov Model,Neural nets; Problems of background noise; Problems of variability Reading List: W. and J. Holmes, Speech Synthesis and Recognition, Taylor & Francis (2001), ISBN: 0748408576 L. Rabiner and B-H. Juang, Fundamentals of speech recognition, Englewood Cliffs, NJ ; PTR Prentice Hall, 1993 25 CSA5011 Seminar Lecturers: Various ECTS Credits: 4 Contact Hours: 28 Method of Assessment: Coursework: (75%); Presentation: (25%) This study unit aims to give the student the opportunity to research in depth and deliver a critical analysis of a specialized topic. In the process, the unit should enhance the student’s ability to research and report in a professional scientific manner. A choice of topics will be offered by lecturers to the students. Students taking the unit will be assigned a topic, accompanied by a series of readings by the lecturer. It is expected that the student will research the area by studying the given material, supported by additional papers and books that the student is expected to discover as part of his or her research. Regular meetings with the student’s supervisor will ensure that the research is duly carried out. At the end of the unit, the student will be expected to submit a detailed, and professional scientific report, which should take the form of a literature review. Furthermore, the student is also expected to deliver a presentation of his or her findings. 26