Download BIS 541

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Multinomial logistic regression wikipedia , lookup

Transcript
BIS 541
2015/2016 Summer
Homework 3
sc]?pKp_\^P_Z ¢L[HKJMLmJ VxKJMLUJ bnmJMX[Z H K\3]KP_Z¢.§\3X__G_J XT ___2_0_ ____" _#?__
_V.k_\3j KP_ehPvL[P+ \^.¶L[H KZ ¢.6\^qdqh\_nOedj_.z.6\^]?X• x_ehw)Z_j _P_eh\^j _P .G#_
#$___= #?
__=" •_J 3j _xS_7_c #_
_"_@Z_=
___" • _J 3j_x.LQn+\> w)Z_JM P_]_X[Z PT ____@j _Z
.§\M XTJ-.3eh.uZ j!PvL[]_x_Z _juL_•_k \^]?X[P_Z ^• _P_Z w)Z_PvL[ZiX_• ¶JMjKx!ehjKPvL_X[]_ki L[\3Xfk \^w;lKedjKJM L[ eh\^jI'U•_L[H _Z p_ _= X X_" __!_!_z w)Z _J 3P_]?X[Zz PvL[\MX[Z _PR L[H _ZVJ3ki L[]_JMq
k \^]?X[P_Z-.3XUJ 3x_Z.
.
_@Z _!___Z
_q__?__@Y" _#$__q
mJ 3j_xU _ _=X X_" __!_!_iC F<H _Z j ¯JYL• L[H KZfq d\_n+Z PvLR k \^j _k_Zip_L[]KJ 3qrqhZ .uZ q % Z3C ._Ch•
1.A database has four transactions. Let minsupport = 60% and minconfidence=80%
TID items bought
100 K A D B
200 D A C E B
300 C A B E
400 B A D
Part a and b are done by hand calculations
a) Find all frequent itemsets using the Apriori algorithm
b) List all strong association rules
c) Find frequent intemsets and strong rules using RapidMiner
start with the given minsuport and minconfidence Experiment with minsuport and
miconfidence increasae and decrease the values and report shat happend to the frequent
itemsets and strong rules.
2 In order to predict how a school district would have scored when accounting for pawerty
and other income measures Cincinati Enquirer gethered data from verious sources. An ovarall
score for passage percentages of students for each district is computed which is based on test
scores for math, science, language and so on. The percentage of a school district students on
Aid for Dependent Childeren (ADC), the percentage who quality for free or reduced price
lunches, average income of each district, are also available.on the data file Enquirer.xls.
a) Estimate a linear regression of passing percentages on the three expanatory variables.
b) What is SST, SSR, SSE, coefficient of determination, estimated variabce of error.
c) Does the explanatory variable explain the variabability in passage rates. Test the null
hyothesis that all varibles together expalin variability in passage rat at %95 confidence
level.
d) Perform the same analysis with stepwise regression and command on the expanatory
power of each of the input variables.
3. Consider a real life situation that interest you: an university registration system, an
hospital information system or any other business problem.
Answer the following in at most two page.
a) Describe the environment with about 50-60 workds.
b) Describe very briefly the database where relevant attributes are stored.
c) Define three data mining problems on that database: requiring three different
functionalities such as association, classification and clustering, …. Clearly state the
importance of each problem for the organization.
d) Describe the variables in the database to be used in the solution of these problems
e) Are there any data problems such as missing data, outlayers, integration
inconsistanccies. ?
f) Do you define new variables not exist in the database to solve these problems? Do
you apply any transformations? For what reswon do you make these transformations.
g) What are your target and output variables if you are to solve a classification or
estimation problem?
h) Suppose the problem is solved successfully. Describe the implementation of the
solution in the environment What are some possible impacts of the data mining solution?. Can
you imagine any unanticipated events after implementation?