Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
02/11/16 Syllabus–Dersİzlencesi Ê www.SadiEvrenSEKER.com->Courses->DataMining, IstanbulCommerceUniversity Ê http://sadievrenseker.com/wp/?p=558 Ê Slide’lar: http://web.engr.illinois.edu/~hanj/bk3/bk3_slidesindex.htm VeriMadenciliği-Giriş ŞadiEvrenŞEKER NecessityIs theMotherofInvention Kaynaklar Ê Dataexplosionproblem Ê DataMining:ConceptsandTechniques,Third Edition,JiaweiHan,MichelineKamber,JianPei Ê DataMining:PracticalMachineLearningTools andTechniques,Third...IanH.Witten,EibeFrank, MarkA.Hall Ê Automateddatacollectiontools,widelyuseddatabasesystems, computerizedsociety,andtheInternetleadtotremendousamountsof dataaccumulatedand/ortobeanalyzedindatabases,data warehouses,WWW,andotherinformationrepositories Ê Wearedrowningindata,butstarvingforknowledge! Ê Solution:Datawarehousinganddatamining Ê Datawarehousingandon-lineanalyticalprocessing(OLAP) Ê Mininginterestingknowledge(rules,regularities,patterns,constraints) fromdatainlargedatabases EvolutionofDatabaseTechnology Ê 1960s: Ê Datacollection,databasecreation,IMSandnetworkDBMS Ê 1970s: Ê Relationaldatamodel,relationalDBMSimplementation Ê TeddCodd(1923-2003) EvolutionofDatabaseTechnology Ê 1990s: Ê Datamining,datawarehousing,multimediadatabases Ê Webdatabases(..,Amazon) Ê 2000s Ê Streamdatamanagementandmining Ê Datamininganditsapplications Ê StructuredEnglishQueryLanguage(SEQUEL),SQL Ê 1980s: Ê Advanceddatamodels(extended-relational,OO,deductive,etc.) Ê Application-orientedDBMS(spatial,scientific,engineering,etc.) Ê Webtechnology(XML,dataintegration)andglobalinformation systems 1 02/11/16 WhyDataMining? —PotentialApplications WhatIsDataMining? Ê Datamining(knowledgediscoveryfromdata) Ê Extractionofinteresting(non-trivial,implicit,previouslyunknownand potentiallyuseful)patternsorknowledgefromhugeamountofdata (interestingpatterns?) Ê Datamining:amisnomer?(erro de nome) Ê Dataanalysisanddecisionsupport Ê Marketanalysisandmanagement Ê Targetmarketing,customerrelationshipmanagement(CRM),marketbasket analysis,crossselling,marketsegmentation Ê Riskanalysisandmanagement Ê Alternativenames Ê Knowledgediscovery(mining)indatabases(KDD),knowledgeextraction, data/patternanalysis,dataarcheology,datadredging,information harvesting,businessintelligence,etc. Ê Watchout:Iseverything“datamining”? Ê (Deductive)queryprocessing. Ê ExpertsystemsorsmallML/statisticalprograms Ê Forecasting,customerretention,improvedunderwriting,qualitycontrol, competitiveanalysis Ê Frauddetectionanddetectionofunusualpatterns(outliers) Ê OtherApplications Ê Textmining(newsgroup,email,documents)andWebmining Ê Medicaldatamining Ê Bioinformaticsandbio-dataanalysis MarketAnalysisandManagement Example1:MarketAnalysisandManagement Ê Wheredoesthedatacomefrom? Ê Creditcardtransactions,loyaltycards,discountcoupons, customercomplaintcalls,plus(public)lifestylestudies Ê Cross-marketanalysis—Findassociations/co- relationsbetweenproductsales,&predictbasedon suchassociation Ê Customerprofiling—Whattypesofcustomersbuy Ê Targetmarketing whatproducts(clusteringorclassification) Ê Findclustersof“model”customerswhosharethesame Ê Customerrequirementanalysis characteristics:interest,incomelevel,spendinghabits,etc., Ê Identifythebestproductsfordifferentcustomers Ê Determinecustomerpurchasingpatternsovertime Ê Predictwhatfactorswillattractnewcustomers Example3: FraudDetection&MiningUnusualPatterns Example2: CorporateAnalysis&RiskManagement Ê Financeplanningandassetevaluation Ê cashflowanalysisandprediction(featuredevelopment) Ê contingentclaimanalysistoevaluateassets Ê Approaches: (componente do ativo) Ê cross-sectionalandtimeseriesanalysis(trendanalysis,etc.) Ê Resourceplanning Ê UnsupervisedLearning:Clustering Ê SupervisedLearning:NeuronalNetworks Ê summarizeandcomparetheresourcesandspending Ê Competition Ê monitorcompetitorsandmarketdirections Ê modelconstructionforfrauds Ê outlieranalysis Ê groupcustomersintoclassesandaclass-basedpricingprocedure Ê setpricingstrategyinahighlycompetitivemarket 2 02/11/16 Applications:Healthcare,retail,creditcard service,telecomm. Ê Autoinsurance:ringofcollisions DataMiningandKnowledgeDiscovery(KDD)Process Pattern Evaluation Ê Datamining—coreof knowledgediscovery process Ê Moneylaundering:suspiciousmonetarytransactions Ê Medicalinsurance Ê Professionalpatients,ringofdoctors,andringofreferences Data Mining Task-relevant Data Ê Unnecessaryorcorrelatedscreeningtests Ê Telecommunications:phone-callfraud Data Warehouse Data Selection Mart Ê Phonecallmodel:destinationofthecall,duration,timeofdayorweek.Analyze patternsthatdeviatefromanexpectednorm Data Cleaning Ê Retailindustry(venderavarejo) Data Integration Ê Analystsestimatethat38%ofretailshrinkisduetodishonestemployees Ê Anti-terrorism Databases StepsofaKDDProcess(1) StepsofaKDDProcess(2) Ê Choosingfunctionsofdatamining Ê Learningtheapplicationdomain Ê summarization,classification,regression,association,clustering Ê relevantpriorknowledgeandgoalsofapplication Ê Creatingatargetdataset:dataselection Ê Choosingtheminingalgorithm(s) Ê Datacleaningandpreprocessing:(maytake60%ofeffort!) Ê Datamining:searchforpatternsofinterest Ê Understanddata(statistics) Ê Patternevaluationandknowledgepresentation Ê visualization,transformation,removingredundantpatterns,etc. Ê Datareductionandtransformation Ê Useofdiscoveredknowledge Ê Findusefulfeatures,dimensionality/variablereduction,invariant representation OverviewofDataMiningMethods KDDSample Who is associated with Group X, and what is the nature of their association? Crime DBs Accessed Retrieved Information Standing Information Requests Ê AutomatedExploration/Discovery Visualization Ê Tell me when something related to my situation changes Who is associated with Sam Jones, and what is the nature of their association? Active Agents Are there any other interesting relationships I should know about? Suspect Profiles Crisis Watch Intel Analyst Command History DC Traffic Alert Ê FEMA data coordinator coworkers Overlay the traffic density Intelink data sources Ê Middleware Data Mining FBIS databases Text OIT databases receiver environment Ê Mediator/Broker Geospatial Structured source environments f(x) Ê Explanation/Description ... Internet data sources x1 e.g..forecastinggrosssalesgivencurrentfactors regression,neuralnetworks,geneticalgorithms Text Imagery OIA databases x2 Ê Prediction/Classification Ê Discovered Knowledge KDD Process e.g..discoveringnewmarketsegments distanceandprobabilisticclusteringalgorithms Ê x e.g..characterizingcustomersbydemographics andpurchasehistory inductivedecisiontrees, if age > 35 associationrulesystems and income < $35k Focus is on induction of a model from specific examples then ... 3 02/11/16 DataMiningMethods DataMiningMethods AutomatedExplorationandDiscovery PredictionandClassification Ê Distance-basednumericalclustering Ê metricgroupingofexamples(KNN) Ê graphicalvisualizationcanbeused Income f(x) Ê Functionapproximation(curvefitting) x Ê Classification(conceptlearning,patternrecognition) A Ê Methods: Ê Age Ê Bayesianclustering Ê searchforthenumberofclasseswhichresultinbestfit ofaprobabilitydistributiontothedata Ê Ê Ê x2 Statisticalregression Artificialneuralnetworks Geneticalgorithms Nearestneighbouralgorithms DataMiningMethods O1 O2 x1 Ê SupervisedLearning Ê UnsupervisedLearning B I1 I2 I3 I4 Clustering Ê Findgroupsofsimilardataitems Generalization “Grouppeoplewithsimilar travelprofiles” Ê Statisticaltechniquesrequiresome Ê Theobjectiveoflearningistoachievegood definitionof“distance”(e.g. betweentravelprofiles)while conceptualtechniquesuse backgroundconceptsandlogical descriptions generalizationtonewcases,otherwisejustusea look-uptable. Ê Generalizationcanbedefinedasamathematical interpolationorregressionoverasetoftraining points: Ê George,Patricia Ê Jeff,Evelyn,Chris Ê Rob Uses: Ê Demographicanalysis Clusters Technologies: Ê Self-OrganizingMaps f(x) Ê ProbabilityDensities x Ê ConceptualClustering 22 Classification CS590D AssociationRules Ê Findwaystoseparatedataitems Ê WeknowXandYbelongtogether, findotherthingsinsamegroup Ê Indicatesignificanceofeachdependency Ê Englishornon-english? Ê Requires“trainingdata”:Data Ê Peoplewhopurchasefishare Ê Bayesianmethods Ê DomesticorForeign? itemswheregroupisknown “Findgroupsofitems commonlypurchased together” Ê Identifydependenciesinthedata: Ê XmakesYlikely “Routedocumentstomost likelyinterestedparties” intopre-definedgroups extraordinarilylikelyto purchasewine Ê PeoplewhopurchaseTurkey Uses: Uses: Ê Profiling tool produces Technologies: Ê Generatedecisiontrees(resultsare Groups humanunderstandable) 23 Date/Time/Register 12/6 13:15 2 12/6 13:16 3 Fish N Y Turkey Cranberries Wine Y Y Y N N Y … … … Technologies: classifier Ê NeuralNets areextraordinarilylikelyto purchasecranberries Ê Targetedmarketing Training Data Ê AIS,SETM,Hugin,TETRADII CS590D 24 CS590D 4 02/11/16 SequentialAssociations DeviationDetection “Findcommonsequencesof warnings/faultswithin10 minuteperiods” Ê Findeventsequencesthatare unusuallylikely Ê Requires“training”eventlist, known“interesting”events Ê Warn2onSwitchCpreceded Ê Mustberobustinthefaceof byFault21onSwitchB additional“noise”events Ê Fault17onanyswitch Uses: precededbyWarn2onany switch Ê Failureanalysisandprediction Ê Findunexpectedvalues,outliers Ê “Findunusualoccurrences inIBMstockprices” Uses: Ê Failureanalysis Ê Anomalydiscoveryforanalysis Sample date 58/07/04 59/01/06 59/04/04 73/10/09 Event Market closed 2.5% dividend 50% stock split not traded Occurrences 317 times 2 times 7 times 1 time Technologies: Technologies: Ê Dynamicprogramming(Dynamic timewarping) Ê “Custom”algorithms 25 Time Switch Event 21:10 B Fault 21 21:11 A Warn 2 21:13 C Warn 2 21:20 A Fault 17 CS590D DataMiningandBusinessIntelligence Increasing potential to support business decisions Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery End User Data Analyst housestofinddistributionpatterns Ê Maximizingintra-classsimilarity&minimizinginterclasssimilarity Ê Outlieranalysis Ê Outlier:Dataobjectthatdoesnotcomplywiththegeneralbehaviorof Ê Similarity-basedanalysis discrimination wetregions Ê Frequentpatterns,association,correlationandcausality Ê SmokingàCancer(Correlationorcausality?) conceptsforfutureprediction DBA Ê Classlabelisunknown:Groupdatatoformnewclasses,e.g.,cluster Ê Sequentialpatternmining,periodicityanalysis Ê Multidimensionalconceptdescription:Characterizationand Ê Constructmodels(functions)thatdescribeanddistinguishclassesor Ê Clusteranalysis Ê Trendanddeviation:e.g.,regressionanalysis .022561 CS590D Ê Classificationandprediction DataMiningFunctionalities(2) Ê Noiseorexception? Spread .022561 .022561 Ê Generalize,summarize,andcontrastdatacharacteristics,e.g.,dryvs. Business Analyst Data Warehouses / Data Marts OLAP, MDA Data Sources Paper, Files, Information Providers, Database Systems, OLTP Ê Trendandevolutionanalysis Close Volume 369.50 314.08 369.25 313.87 Market Closed 370.00 314.50 DataMiningFunctionalities(1) Data Exploration Statistical Analysis, Querying and Reporting thedata Ê clustering/classificationmethods Date 58/07/02 Ê Statisticaltechniques 58/07/03 58/07/04 Ê visualization 58/07/07 26 Ê E.g.,classifycountriesbasedonclimate,orclassifycarsbasedongas mileage Ê Predictsomeunknownormissingnumericalvalues AreAllthe“Discovered”PatternsInteresting? Ê Dataminingmaygeneratethousandsofpatterns:Notallofthemare interesting Ê Suggestedapproach:Human-centered,query-based,focusedmining Ê Interestingnessmeasures Ê Apatternisinterestingifitiseasilyunderstoodbyhumans,validonnewortest datawithsomedegreeofcertainty,potentiallyuseful,novel,orvalidatessome hypothesisthatauserseekstoconfirm Ê Objectivevs.subjectiveinterestingnessmeasures Ê Objective:basedonstatisticsandstructuresofpatterns,e.g.,support,confidence, etc. Ê Subjective:basedonuser’sbeliefinthedata,e.g.,unexpectedness,novelty, actionability,etc. 5 02/11/16 CanWeFindAllandOnlyInterestingPatterns? DataMining:ConfluenceofMultipleDisciplines Database Technology Ê Findalltheinterestingpatterns:Completeness Ê Canadataminingsystemfindalltheinterestingpatterns? Statistics Ê Heuristicvs.exhaustivesearch Ê Associationvs.classificationvs.clustering Ê Searchforonlyinterestingpatterns:Anoptimizationproblem Machine Learning Data Mining Visualization Ê Canadataminingsystemfindonlytheinterestingpatterns? Ê Approaches Ê Firstgeneralallthepatternsandthenfilterouttheuninterestingones. Algorithm Other Disciplines Ê Generateonlytheinterestingpatterns—miningqueryoptimization DataMining: ClassificationSchemes Ê Generalfunctionality Ê Descriptivedatamining Ê Predictivedatamining DataMiningfromdifferentperspectives Ê Datatobemined Ê Object-oriented/relational,spatial,time-series,text,multi-media,heterogeneous, legacy,WWW Ê Knowledgetobemined Ê Characterization,discrimination,association,classification,clustering,trend/ Ê Differentviewsleadtodifferentclassifications Ê Kindsofdatatobemined Ê Kindsofknowledgetobediscovered Ê Kindsoftechniquesutilized Ê Kindsofapplicationsadapted deviation,outlieranalysis,etc. Ê Multiple/integratedfunctionsandminingatmultiplelevels Ê Techniquesutilized Ê Database-oriented,datawarehouse,machinelearning,statistics,visualization,etc. Ê Applicationsadapted Ê Retail,telecommunication,banking,fraudanalysis,bio-datamining,stockmarket analysis,textmining,Webmining,etc. PrimitivesthatDefine aDataMiningTask Primitive1: Task-RelevantData Ê Task-relevantdata Ê Databaseordatawarehousename Ê Typeofknowledgetobemined Ê Backgroundknowledge Ê Patterninterestingnessmeasurements(?) Ê Visualization/presentationofdiscoveredpatterns Ê Databasetablesordatawarehousecubes Ê Conditionfordataselection Ê Relevantattributesordimensions Ê Datagroupingcriteria 6 02/11/16 Primitive2: TypesofKnowledgetoBeMined Ê Characterization(Categories) Ê Discrimination Ê Association Ê Classification/prediction Ê Clustering Ê Outlieranalysis Ê Otherdataminingtasks Primitive3: BackgroundKnowledge Ê Schemahierarchy(taxonomy) Ê E.g.,street<city<province_or_state<country Ê Set-groupinghierarchy Ê E.g.,{20-39}=young,{40-59}=middle_aged Ê Operation-derivedhierarchy Ê emailaddress:[email protected] login-name<department<university<country Ê Rule-basedhierarchy Ê low_profit_margin(X)<=price(X,P1)andcost(X,P2)and(P1- P2)<$50 Primitive4: MeasurementsofPatternInterestingness Primitive5: PresentationofDiscoveredPatterns Ê Simplicity Ê Differentbackgrounds/usagesmayrequiredifferentformsof Ê e.g.,(association)rulelength,(decision)treesize Ê Certainty Ê e.g.,confidence,classificationreliabilityoraccuracy,certaintyfactor, rulestrength,rulequality,discriminatingweight,etc. Ê Utility Ê potentialusefulness,e.g.,support(association),noisethreshold (description) Ê Novelty Ê notpreviouslyknown,surprising(usedtoremoveredundantrules) WhyDataMiningQueryLanguage? Ê Automatedvs.query-driven? Ê Findingallthepatternsautonomouslyinadatabase?— unrealisticbecausethepatternscouldbetoomanybut uninteresting Ê Dataminingshouldbeaninteractiveprocess Ê Userdirectswhattobemined Ê Usersmustbeprovidedwithasetofprimitivestobeusedto communicatewiththedataminingsystem Ê Incorporatingtheseprimitivesinadataminingquerylanguage Ê Moreflexibleuserinteraction Ê Foundationfordesignofgraphicaluserinterface Ê Standardizationofdataminingindustryandpractice representation Ê E.g.,rules,tables,crosstabs,pie/barchart,etc. Ê Concepthierarchyisalsoimportant Ê Discoveredknowledgemightbemoreunderstandable whenrepresentedathighlevelofabstraction Ê Interactivedrillup/down,pivoting,slicinganddicing providedifferentperspectivestodata Ê Differentkindsofknowledgerequiredifferentrepresentation: association,classification,clustering,etc. Architecture:TypicalDataMiningSystem Graphical User Interface Pattern Evaluation Data Mining Engine Know ledge -Base Database or Data Warehouse Server data cleaning, integration, and selection Database Data World-Wide Other Info Repositories Warehouse Web 7 02/11/16 StateofCommercial/ResearchPractice RelatedTechniques:OLAP On-LineAnalyticalProcessing Ê Increasinguseofdataminingsystemsinfinancial Ê On-LineAnalyticalProcessingtoolsprovidetheabilityto community,marketingsectors,retailing Ê Stillhavemajorproblemswithlarge,dynamicsets ofdata(needbetterintegrationwiththedatabases) Ê COTSdataminingpackagesperformspecializedlearning onsmallsubsetofdata Ê Mostresearchemphasizesmachinelearning;little posestatisticalandsummaryqueriesinteractively (traditionalOn-LineTransactionProcessing(OLTP)databases maytakeminutesorevenhourstoanswerthesequeries) Ê Advantagesrelativetodatamining Ê Canobtainawidervarietyofresults Ê Generallyfastertoobtainresults Ê Peopleachievingresultsarenotlikelytoshare Ê Disadvantagesrelativetodatamining Ê Usermust“asktherightquestion” Ê Generallyusedtodeterminehigh-levelstatisticalsummaries, ratherthanspecificrelationshipsamonginstances OLAP:On-LineAnalyticalProcessing IntegrationofDataMiningandData Warehousing emphasisondatabaseside(especiallytext) knowledge Ê Dataminingsystems,DBMS,Datawarehousesystems OLAPFunctionality Profit Values coupling Ê Dimensionselection Ê slice&dice Sales Region OLAP cube Year by Month Product Class by Product Name Ê Nocoupling,loose-coupling,semi-tight-coupling,tight-coupling Ê Rotation Ê allowschangeinperspective Ê Filtration Ê valuerangeselection Ê Interactiveminingmulti-levelknowledge Ê Necessityofminingknowledgeandpatternsatdifferentlevelsof abstractionbydrilling/rolling,pivoting,slicing/dicing,etc. Ê Hierarchies Ê drill-downstolowerlevels Ê roll-upstohigherlevels Ê Integrationofmultipleminingfunctions Ê Characterizedclassification,firstclusteringandthenassociation IntegrationofDataMiningand DataWarehousing AnOLAMArchitecture Mining query Mining result Layer4 User Interface User GUI API OLAM Engine OLAP Engine Layer3 OLAP/OLAM Data Cube API Meta Data Databases Ê Dataminingsystems,DBMS,Datawarehousesystemscoupling Ê Nocoupling,loose-coupling,semi-tight-coupling,tight-coupling Ê On-lineanalyticalminingdata Ê integrationofminingandOnlineAnalyticalProcessing(OLAP)technologies Layer2 MDDB Filtering&Integration Ê On-lineanalyticalminingdata Ê integrationofminingandOLAPtechnologies MDDB Ê Necessityofminingknowledgeandpatternsatdifferentlevelsof abstractionbydrilling/rolling,pivoting,slicing/dicing,etc. Database API Filterin g Data cleaning Data Warehouse Data integration Ê Interactiveminingmulti-levelknowledge Layer1 Data Repository Ê Integrationofmultipleminingfunctions Ê Characterizedclassification,firstclusteringandthenassociation 8 02/11/16 CouplingDataMiningwith DB/DWSystems Miningmethodology Ê Nocoupling—flatfileprocessing,notrecommended Ê Miningdifferentkindsofknowledgefromdiversedatatypes, e.g.,bio,stream,Web Ê Loosecoupling Ê Performance:efficiency,effectiveness,andscalability Ê FetchingdatafromDB/DW Ê Semi-tightcoupling—enhancedDMperformance Ê Patternevaluation:theinterestingnessproblem Ê ProvideefficientimplementafewdataminingprimitivesinaDB/DW Ê Incorporationofbackgroundknowledge system,e.g.,sorting,indexing,aggregation,histogramanalysis,multiway join,precomputationofsomestatfunctions Ê (constraints,taxonomy) Ê Handlingnoiseandincompletedata(preprocessing) Ê Tightcoupling—Auniforminformationprocessingenvironment Ê DMissmoothlyintegratedintoaDB/DWsystem,miningqueryisoptimized Ê Parallel,distributedandincrementalminingmethods basedonminingquery,indexing,etc. Ê Integrationofthediscoveredknowledgewithexistingone: knowledgefusion Summary Ê Datamining:discoveringinterestingpatternsfromlargeamountsofdata(DB) Ê Userinteraction Ê Anaturalevolutionofdatabasetechnology,ingreatdemand,withwide Ê Dataminingquerylanguagesandad-hocmining applications Ê Expressionandvisualizationofdataminingresults Ê Interactiveminingofknowledgeatmultiplelevelsof Ê AKDDprocessincludesdatacleaning,dataintegration(DataWarehouse),data abstraction selection(DataMart),transformation,datamining,patternevaluation,and knowledgepresentation Ê Miningcanbeperformedinavarietyofinformationrepositories Ê Applicationsandsocialimpacts Ê Domain-specificdatamining&invisibledatamining Ê Dataminingfunctionalities:characterization,discrimination,association, classification,clustering,outlierandtrendanalysis,etc. Ê Protectionofdatasecurity,integrity,andprivacy Ê Subjective,requiresexpertknowledge Ê Dataminingsystemsandarchitectures ABriefHistoryofDataMiningSociety ConferencesandJournalsonDataMining Ê 1989IJCAIWorkshoponKnowledgeDiscoveryinDatabases Ê KnowledgeDiscoveryinDatabases(G.Piatetsky-ShapiroandW.Frawley,1991) Ê 1991-1994WorkshopsonKnowledgeDiscoveryinDatabases Ê AdvancesinKnowledgeDiscoveryandDataMining(U.Fayyad,G.Piatetsky-Shapiro, P.Smyth,andR.Uthurusamy,1996) Ê 1995-1998InternationalConferencesonKnowledgeDiscoveryinDatabasesand DataMining(KDD’95-98) Ê JournalofDataMiningandKnowledgeDiscovery(1997) Ê ACMSIGKDDconferencessince1998andSIGKDDExplorations Ê Moreconferencesondatamining Ê PAKDD(1997),PKDD(1997),SIAM-DataMining(2001),(IEEE)ICDM(2001),etc. Ê ACMSIGKDDInt.Conf.on KnowledgeDiscoveryin DatabasesandDataMining (KDD) n Journals: n Data Mining and Knowledge Discovery (DAMI or DMKD) n IEEE Trans. On Knowledge and Data Eng. (TKDE) n KDD Explorations n ACM Trans. on KDD Ê SIAMDataMiningConf.(SDM) Ê (IEEE)Int.Conf.onDataMining (ICDM) Ê Conf.onPrinciplesandpractices ofKnowledgeDiscoveryandData Mining(PKDD) Ê Pacific-AsiaConf.onKnowledge DiscoveryandDataMining (PAKDD) Ê ACMTransactionsonKDDstartingin2007 9 02/11/16 WheretoFindReferences?—DBLP,CiteSeer,Google Ê DataminingandKDD(SIGKDD:CDROM) Ê Conferences:ACM-SIGKDD,IEEE-ICDM,SIAM-DM,PKDD,PAKDD,etc. Ê Journal:DataMiningandKnowledgeDiscovery,KDDExplorations,ACMTKDD DataWarehousingandOLAP Technology Ê Databasesystems(SIGMOD:ACMSIGMODAnthology—CDROM) Ê Conferences:ACM-SIGMOD,ACM-PODS,VLDB,IEEE-ICDE,EDBT,ICDT,DASFAA Ê Journals:IEEE-TKDE,ACM-TODS/TOIS,JIIS,J.ACM,VLDBJ.,Info.Sys.,etc. Ê AI&MachineLearning Ê Conferences:Machinelearning(ML),AAAI,IJCAI,COLT(LearningTheory),CVPR,NIPS, etc. Ê Journals:MachineLearning,ArtificialIntelligence,KnowledgeandInformationSystems, IEEE-PAMI,etc. www.SadiEvrenSEKER.com 10