Download COMP1942

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
COMP1942
Project: Nursery Application
Prepared by Raymond Wong
Presented by Raymond Wong
raywong@cse
COMP1942
1
Dataset

Two real datasets


First dataset “training.txt”




Contain the information about nursery applications in a city
9,945 records
8 attributes
1 additional Boolean attribute called “success” indicating
whether the nursery application is successful or not
Second dataset “test.txt”



3,015 records
8 attributes
No additional Boolean attribute
COMP1942
2
Dataset

8 attributes








parent-occupation
child-nursery
form-of-the-family
no-of-children
housing-condition
finance-standing-of-the-family
social-condition
health-condition
COMP1942
3
Dataset

Boolean attribute

success
COMP1942
4
Objective

Objective

To predict whether each nursery
application in the second dataset is
successful or not
COMP1942
5
Project

There are three phases in this project
COMP1942
6
Project Deadlines

Due Date for Phase 1


Due Date for Phase 2


24 Feb, 2017 9am
21 April, 2017 9am
Due Date for Phase 3

5 May, 2017 9am
COMP1942
7
Tasks to be done

Phase 1


Phase 2



To generate an Excel file from two raw files together with
attribute names
To write a design report for this project
To list 5 possible data mining models you want to try
Phase 3



To follow the design report in Phase 2
To predict whether each nursery application in the second
dataset is successful or not by using a data mining tool
(XLMiner)
To write a final report
COMP1942
8
Grading Policy

Phase 1


Phase 2


Excel file (10%) (via CASS)
Design Report (30%) (in class)
Phase 3


Final Report (40%) (in class)
Predicted Attribute Files for the Second
Real Dataset (20%) (via CASS)
COMP1942
9
Mark Deduction


No late submission is allowed for Phase 1
and Phase 2
Late submission is allowed for Phase 3
Number of Days Late
1
2
3
4 or above
COMP1942
Deduction (out of 100
marks)
10
30
70
100
10
Very easy to obtain full scores! 
Grading Scheme
Specify the data mining models clearly
(e.g., what model you are using and
what exact set of parameters you are
using)
Note: There are no STANDARD answers.

Phase 1
Excel file

Phase 2
Design Report Follow the design report
Write observations clearly
Analyze the results in a “logical” way
Note: There are no STANDARD answers.

Phase 3 Final Report
Predicted Attribute Files
We will compare your predicted files with
our files.
Note: There is a STANDARD answer.
COMP1942
11