Download Foundations and Applications of Data Mining - Yao

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
USC Viterbi School
of Engineering
CatalogueCourseDescription
INF553:FoundationsandApplicationsofData
Mining
Units:3
Term—Day—Time:
Fall2015–TT–9:30-10:50am(section32423D)
Fall2015–TT–5:00-6:20pm(section32444D)
Location:KAP163
Instructor:Yao-YiChiang
Office:AHFB55C
OfficeHours:Tuesdayafterclass
ContactInfo:[email protected],213-740-7618
Instructor:WenshengWu
Office:GER204
OfficeHours:TBD
ContactInfo:[email protected]
CourseProducer:PoojaAnand
Office:TBD
OfficeHours:TBD
ContactInfo:[email protected]
Grader:SiddharthMahendraDasani
Office:TBD
OfficeHours:TBD
ContactInfo:[email protected]
Dataminingandmachinelearningalgorithmsforanalyzingverylargedatasets.EmphasisonMap
Reduce.Casestudies.
ExpandedCourseDescription
Dataminingisafoundationalpieceofthedataanalyticsskillset.Atahighlevel,itallows
theanalysttodiscoverpatternsindata,andtransformitintoausableproduct.The
coursewillteachdataminingalgorithmsforanalyzingverylargedatasets.Itwillhavean
appliedfocus,inthatitismeantforpreparingstudentstoutilizetopicsindataminingto
solverealworldproblems.
RecommendedPreparation:INF550,INF551andINF552.Knowledgeofprobability,linear
algebra,basicprogramming,andmachinelearning.
Abasicunderstandingengineeringprinciplesisrequired,includingbasicprogramming
skills;familiaritywiththePythonlanguageisdesirable.Mostassignmentsaredesigned
fortheUnixenvironment;basicUnixskillswillmakeprogrammingassignmentsmuch
easier.Studentswillneedsufficientmathematicalbackground,includingprobability,
statistics,andlinearalgebra.Someknowledgeofmachinelearningishelpful,butnot
required.
CourseNotes
Thecoursewillberunasalectureclasswithstudentparticipationstronglyencouraged.Thereareweekly
readingsandstudentsareencouragedtodothereadingspriortothediscussioninclass.Allofthecourse
materials,includingthereadings,lectureslides,homeworkswillbepostedonline
TechnologicalProficiencyandHardware/SoftwareRequired
StudentsareexpectedtoknowhowtoprograminalanguagesuchasPython.Studentsarealsoexpected
tohavetheirownlaptopordesktopcomputerwheretheycaninstallandrunsoftwaretodotheweekly
homeworkassignments.
RequiredReadingsandSupplementaryMaterials
• Rajaraman,J.LeskovecandJ.D.Ullman,MiningofMassiveDatasets
o CambridgeUniversityPress,2012.
o Availablefreeat:http://infolab.stanford.edu/~ullman/mmds.html
Inadditiontothetextbook,studentsmaybegivenadditionalreadingmaterialssuchas
researchpapers.Studentsareresponsibleforallassignedreadingassignments.
DescriptionandAssessmentofAssignments
HomeworkAssignments
Therewillbe5homeworkassignments.Theassignmentsmustbedoneindividually.Eachassignmentis
gradedonascaleof0-100andthespecificrubricforeachassignmentisgivenintheassignment.
GradingBreakdown
Quizzes:Therewillbeweeklyquizzesbasedonthematerialfromtheweekbefore.Thereisnomid-term
forthisclass.
Homework:Therewillbe5homeworksbasedonthetopicsoftheclasseachweek.
FinalExam:Thereisafinalexamattheendofthesemestercoveringallofthematerialcoveredintheclass.
ClassParticipation:Studentsareexpectedtocometoclassandparticipateintheclassdiscussionsand
discussionboard.
GradingSchema:
Quizzes 30%
Homework
40%
Final: 25%
ClassParticipation
5%
__________________________________________
Total 100%
GradeswillrangefromAthroughF.Thefollowingisthebreakdownforgrading:
94-100=A
90–93=A-
87–89=B+
84–86=B
80–83=B-
77–79=C+
74-76=C
70-73=C-
67-69=D+
64-66=D
60-63=D-
Below60isanF
AssignmentSubmissionPolicy
Homeworkassignmentsaredueat11:59pmontheduedateandshouldbesubmittedinBlackboard.You
cansubmithomeworkuptooneweeklate,butyouwillloose20%ofthepossiblepointsforthe
assignment.Afteroneweek,theassignmentcannotbesubmitted.
CourseSchedule:AWeeklyBreakdown
Week
1
(8/24)
Topic
IntroductiontoDataMining,
MapReduce
2
(8/31)
3
(9/7)
MapReduce(cont.)
Frequentitemsetsand
Associationrules
4
(9/14)
5
(9/21)
Frequentitemsetsand
Associationrules
Shingling,Minhashing,
LocalitySensitiveHashing
6
(9/28)
7
(10/5)
Shingling,Minhashing,
LocalitySensitiveHashing
RecommendationSystems:
Content-basedand
CollaborativeFiltering
RecommendationSystems:
Content-basedand
CollaborativeFiltering
Ch3:FindingSimilarItems
9
(10/19)
10
(10/26)
Clustering
Ch7:Clustering
LinkAnalysis:PageRank,
WebspamandTrustRank,
RandomWalkswithRestarts
Ch5:LinkAnalysis
11
(11/2)
12
(11/9)
AnalysisofMassiveGraphs
(SocialNetworks)
AnalysisofMassiveGraphs
(SocialNetworks)
Ch10:AnalysisofSocialNetworks
13
(11/16)
14
(11/23)
WebAdvertising
Ch8:AdvertisingontheWeb
Ch4:Miningdatastreams
8
(10/12)
Miningdatastreams
Readings
Ch1:DataMiningand
Ch2:Large-ScaleFileSystemsand
Map-Reduce
Ch2:Large-ScaleFileSystemsand
Map-Reduce
Ch6:Frequentitemsets,
Ch3:FindingSimilarItems(section
3.5:DistanceMeasures)
Ch6:Frequentitemsets
Homework
Instructor
Wu
Wu
Homework1
assigned
Chiang
Chiang
Ch3:FindingSimilarItems
Homework1
Wu
due,Homework
2assigned
Wu
Ch9:Recommendationsystems,
additionalreadings
Chiang
Ch9:Recommendationsystems
Homework2
due,
Homework3
assigned
Chiang
Homework3
due,
Homework4
assigned
Wu
Ch10:AnalysisofSocialNetworks
Wu
Chiang
Homework4
Chiang
due,Homework
5assigned
Wu
Homework5
due
Wu
15
(11/30)
Final(TBD12/912/11)
Miningdatastreams
CourseSummary
FinalExam
Chiang/Wu
StatementonAcademicConductandSupportSystems
AcademicConduct
Plagiarism–presentingsomeoneelse’sideasasyourown,eitherverbatimorrecastinyourown
words–isaseriousacademicoffensewithseriousconsequences.Pleasefamiliarizeyourselfwith
thediscussionofplagiarisminSCampusinSection11,BehaviorViolatingUniversityStandards
https://scampus.usc.edu/1100-behavior-violating-university-standards-and-appropriatesanctions.Otherformsofacademicdishonestyareequallyunacceptable.Seeadditional
informationinSCampusanduniversitypoliciesonscientificmisconduct,
http://policy.usc.edu/scientific-misconduct.
Discrimination,sexualassault,andharassmentarenottoleratedbytheuniversity.Youare
encouragedtoreportanyincidentstotheOfficeofEquityandDiversityhttp://equity.usc.eduorto
theDepartmentofPublicSafetyhttp://capsnet.usc.edu/department/department-publicsafety/online-forms/contact-us.ThisisimportantforthesafetyofthewholeUSC
community.Anothermemberoftheuniversitycommunity–suchasafriend,classmate,advisor,
orfacultymember–canhelpinitiatethereport,orcaninitiatethereportonbehalfofanother
person.TheCenterforWomenandMenhttp://www.usc.edu/student-affairs/cwm/provides24/7
confidentialsupport,andthesexualassaultresourcecenterwebpagehttp://sarc.usc.edudescribes
reportingoptionsandotherresources.
SupportSystems
AnumberofUSC’sschoolsprovidesupportforstudentswhoneedhelpwithscholarly
writing.Checkwithyouradvisororprogramstafftofindoutmore.Studentswhoseprimary
languageisnotEnglishshouldcheckwiththeAmericanLanguageInstitute
http://dornsife.usc.edu/ali,whichsponsorscoursesandworkshopsspecificallyforinternational
graduatestudents.TheOfficeofDisabilityServicesandPrograms
http://sait.usc.edu/academicsupport/centerprograms/dsp/home_index.htmlprovidescertification
forstudentswithdisabilitiesandhelpsarrangetherelevantaccommodations.Ifan
officiallydeclaredemergencymakestraveltocampusinfeasible,USCEmergencyInformation
http://emergency.usc.eduwillprovidesafetyandotherupdates,includingwaysinwhich
instructionwillbecontinuedbymeansofblackboard,teleconferencing,andothertechnology.