Download Veri Madenciliği - Giriş

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
02/11/16
Syllabus–Dersİzlencesi
Ê  www.SadiEvrenSEKER.com->Courses->DataMining,
IstanbulCommerceUniversity
Ê  http://sadievrenseker.com/wp/?p=558
Ê  Slide’lar:
http://web.engr.illinois.edu/~hanj/bk3/bk3_slidesindex.htm
VeriMadenciliği-Giriş
ŞadiEvrenŞEKER
NecessityIs
theMotherofInvention
Kaynaklar
Ê  Dataexplosionproblem
Ê  DataMining:ConceptsandTechniques,Third
Edition,JiaweiHan,MichelineKamber,JianPei
Ê  DataMining:PracticalMachineLearningTools
andTechniques,Third...IanH.Witten,EibeFrank,
MarkA.Hall Ê  Automateddatacollectiontools,widelyuseddatabasesystems,
computerizedsociety,andtheInternetleadtotremendousamountsof
dataaccumulatedand/ortobeanalyzedindatabases,data
warehouses,WWW,andotherinformationrepositories
Ê  Wearedrowningindata,butstarvingforknowledge!
Ê  Solution:Datawarehousinganddatamining
Ê  Datawarehousingandon-lineanalyticalprocessing(OLAP)
Ê  Mininginterestingknowledge(rules,regularities,patterns,constraints)
fromdatainlargedatabases
EvolutionofDatabaseTechnology
Ê  1960s:
Ê  Datacollection,databasecreation,IMSandnetworkDBMS
Ê  1970s:
Ê  Relationaldatamodel,relationalDBMSimplementation
Ê  TeddCodd(1923-2003)
EvolutionofDatabaseTechnology
Ê  1990s:
Ê  Datamining,datawarehousing,multimediadatabases
Ê  Webdatabases(..,Amazon)
Ê  2000s
Ê  Streamdatamanagementandmining
Ê  Datamininganditsapplications
Ê  StructuredEnglishQueryLanguage(SEQUEL),SQL
Ê  1980s:
Ê  Advanceddatamodels(extended-relational,OO,deductive,etc.)
Ê  Application-orientedDBMS(spatial,scientific,engineering,etc.)
Ê  Webtechnology(XML,dataintegration)andglobalinformation
systems
1
02/11/16
WhyDataMining?
—PotentialApplications
WhatIsDataMining?
Ê  Datamining(knowledgediscoveryfromdata)
Ê  Extractionofinteresting(non-trivial,implicit,previouslyunknownand
potentiallyuseful)patternsorknowledgefromhugeamountofdata
(interestingpatterns?)
Ê  Datamining:amisnomer?(erro de nome)
Ê  Dataanalysisanddecisionsupport
Ê  Marketanalysisandmanagement
Ê  Targetmarketing,customerrelationshipmanagement(CRM),marketbasket
analysis,crossselling,marketsegmentation
Ê  Riskanalysisandmanagement
Ê  Alternativenames
Ê  Knowledgediscovery(mining)indatabases(KDD),knowledgeextraction,
data/patternanalysis,dataarcheology,datadredging,information
harvesting,businessintelligence,etc.
Ê  Watchout:Iseverything“datamining”?
Ê  (Deductive)queryprocessing.
Ê  ExpertsystemsorsmallML/statisticalprograms
Ê  Forecasting,customerretention,improvedunderwriting,qualitycontrol,
competitiveanalysis
Ê  Frauddetectionanddetectionofunusualpatterns(outliers)
Ê  OtherApplications
Ê  Textmining(newsgroup,email,documents)andWebmining
Ê  Medicaldatamining
Ê  Bioinformaticsandbio-dataanalysis
MarketAnalysisandManagement Example1:MarketAnalysisandManagement
Ê  Wheredoesthedatacomefrom?
Ê  Creditcardtransactions,loyaltycards,discountcoupons,
customercomplaintcalls,plus(public)lifestylestudies
Ê  Cross-marketanalysis—Findassociations/co-
relationsbetweenproductsales,&predictbasedon
suchassociation
Ê  Customerprofiling—Whattypesofcustomersbuy
Ê  Targetmarketing
whatproducts(clusteringorclassification)
Ê  Findclustersof“model”customerswhosharethesame
Ê  Customerrequirementanalysis
characteristics:interest,incomelevel,spendinghabits,etc.,
Ê  Identifythebestproductsfordifferentcustomers
Ê  Determinecustomerpurchasingpatternsovertime
Ê  Predictwhatfactorswillattractnewcustomers
Example3:
FraudDetection&MiningUnusualPatterns
Example2:
CorporateAnalysis&RiskManagement
Ê  Financeplanningandassetevaluation
Ê  cashflowanalysisandprediction(featuredevelopment)
Ê  contingentclaimanalysistoevaluateassets
Ê  Approaches:
(componente do ativo)
Ê  cross-sectionalandtimeseriesanalysis(trendanalysis,etc.)
Ê  Resourceplanning
Ê  UnsupervisedLearning:Clustering
Ê  SupervisedLearning:NeuronalNetworks
Ê  summarizeandcomparetheresourcesandspending
Ê  Competition
Ê  monitorcompetitorsandmarketdirections
Ê  modelconstructionforfrauds
Ê  outlieranalysis
Ê  groupcustomersintoclassesandaclass-basedpricingprocedure
Ê  setpricingstrategyinahighlycompetitivemarket
2
02/11/16
Applications:Healthcare,retail,creditcard
service,telecomm.
Ê  Autoinsurance:ringofcollisions
DataMiningandKnowledgeDiscovery(KDD)Process
Pattern Evaluation
Ê  Datamining—coreof
knowledgediscovery
process
Ê  Moneylaundering:suspiciousmonetarytransactions
Ê  Medicalinsurance
Ê  Professionalpatients,ringofdoctors,andringofreferences
Data Mining
Task-relevant Data
Ê  Unnecessaryorcorrelatedscreeningtests
Ê  Telecommunications:phone-callfraud
Data Warehouse
Data
Selection Mart
Ê  Phonecallmodel:destinationofthecall,duration,timeofdayorweek.Analyze
patternsthatdeviatefromanexpectednorm
Data Cleaning
Ê  Retailindustry(venderavarejo)
Data Integration
Ê  Analystsestimatethat38%ofretailshrinkisduetodishonestemployees
Ê  Anti-terrorism
Databases
StepsofaKDDProcess(1)
StepsofaKDDProcess(2)
Ê  Choosingfunctionsofdatamining
Ê  Learningtheapplicationdomain
Ê  summarization,classification,regression,association,clustering
Ê  relevantpriorknowledgeandgoalsofapplication
Ê  Creatingatargetdataset:dataselection
Ê  Choosingtheminingalgorithm(s)
Ê  Datacleaningandpreprocessing:(maytake60%ofeffort!)
Ê  Datamining:searchforpatternsofinterest
Ê  Understanddata(statistics)
Ê  Patternevaluationandknowledgepresentation
Ê  visualization,transformation,removingredundantpatterns,etc.
Ê  Datareductionandtransformation
Ê  Useofdiscoveredknowledge
Ê  Findusefulfeatures,dimensionality/variablereduction,invariant
representation
OverviewofDataMiningMethods
KDDSample
Who is associated with Group
X, and what is the nature of their
association?
Crime DBs
Accessed
Retrieved Information
Standing Information
Requests
Ê  AutomatedExploration/Discovery
Visualization
Ê 
Tell me when something
related to my situation
changes
Who is associated
with Sam Jones, and
what is the nature of
their association?
Active Agents
Are there any other interesting
relationships I should know
about?
Suspect Profiles
Crisis
Watch
Intel
Analyst
Command History
DC
Traffic
Alert
Ê 
FEMA
data
coordinator
coworkers
Overlay the traffic
density
Intelink
data sources
Ê 
Middleware
Data Mining
FBIS
databases
Text
OIT
databases
receiver
environment
Ê 
Mediator/Broker
Geospatial
Structured
source
environments
f(x)
Ê  Explanation/Description
...
Internet
data sources
x1
e.g..forecastinggrosssalesgivencurrentfactors
regression,neuralnetworks,geneticalgorithms
Text
Imagery
OIA
databases
x2
Ê  Prediction/Classification
Ê 
Discovered
Knowledge
KDD
Process
e.g..discoveringnewmarketsegments
distanceandprobabilisticclusteringalgorithms
Ê 
x
e.g..characterizingcustomersbydemographics
andpurchasehistory
inductivedecisiontrees, if age > 35
associationrulesystems
and income < $35k
Focus is on induction of a model
from specific examples
then ...
3
02/11/16
DataMiningMethods
DataMiningMethods
AutomatedExplorationandDiscovery
PredictionandClassification
Ê  Distance-basednumericalclustering
Ê  metricgroupingofexamples(KNN)
Ê  graphicalvisualizationcanbeused
Income
f(x)
Ê  Functionapproximation(curvefitting)
x
Ê  Classification(conceptlearning,patternrecognition)
A
Ê  Methods:
Ê 
Age
Ê  Bayesianclustering
Ê  searchforthenumberofclasseswhichresultinbestfit
ofaprobabilitydistributiontothedata
Ê 
Ê 
Ê 
x2
Statisticalregression
Artificialneuralnetworks
Geneticalgorithms
Nearestneighbouralgorithms
DataMiningMethods
O1 O2
x1
Ê  SupervisedLearning
Ê  UnsupervisedLearning
B
I1 I2
I3
I4
Clustering
Ê  Findgroupsofsimilardataitems
Generalization
“Grouppeoplewithsimilar
travelprofiles”
Ê  Statisticaltechniquesrequiresome
Ê  Theobjectiveoflearningistoachievegood
definitionof“distance”(e.g.
betweentravelprofiles)while
conceptualtechniquesuse
backgroundconceptsandlogical
descriptions
generalizationtonewcases,otherwisejustusea
look-uptable.
Ê  Generalizationcanbedefinedasamathematical
interpolationorregressionoverasetoftraining
points:
Ê  George,Patricia
Ê  Jeff,Evelyn,Chris
Ê  Rob
Uses:
Ê  Demographicanalysis
Clusters
Technologies:
Ê  Self-OrganizingMaps
f(x)
Ê  ProbabilityDensities
x
Ê  ConceptualClustering
22
Classification
CS590D
AssociationRules
Ê  Findwaystoseparatedataitems
Ê  WeknowXandYbelongtogether,
findotherthingsinsamegroup
Ê  Indicatesignificanceofeachdependency
Ê  Englishornon-english?
Ê  Requires“trainingdata”:Data
Ê  Peoplewhopurchasefishare
Ê  Bayesianmethods
Ê  DomesticorForeign?
itemswheregroupisknown
“Findgroupsofitems
commonlypurchased
together”
Ê  Identifydependenciesinthedata:
Ê  XmakesYlikely
“Routedocumentstomost
likelyinterestedparties”
intopre-definedgroups
extraordinarilylikelyto
purchasewine
Ê  PeoplewhopurchaseTurkey
Uses:
Uses:
Ê  Profiling
tool produces
Technologies:
Ê  Generatedecisiontrees(resultsare
Groups
humanunderstandable)
23
Date/Time/Register
12/6 13:15 2
12/6 13:16 3
Fish
N
Y
Turkey Cranberries Wine
Y
Y
Y
N
N
Y
…
…
…
Technologies:
classifier
Ê  NeuralNets
areextraordinarilylikelyto
purchasecranberries
Ê  Targetedmarketing
Training Data
Ê  AIS,SETM,Hugin,TETRADII
CS590D
24
CS590D
4
02/11/16
SequentialAssociations
DeviationDetection
“Findcommonsequencesof
warnings/faultswithin10
minuteperiods”
Ê  Findeventsequencesthatare
unusuallylikely
Ê  Requires“training”eventlist,
known“interesting”events
Ê  Warn2onSwitchCpreceded
Ê  Mustberobustinthefaceof
byFault21onSwitchB
additional“noise”events
Ê  Fault17onanyswitch
Uses:
precededbyWarn2onany
switch
Ê  Failureanalysisandprediction
Ê  Findunexpectedvalues,outliers
Ê  “Findunusualoccurrences
inIBMstockprices”
Uses:
Ê  Failureanalysis
Ê  Anomalydiscoveryforanalysis
Sample date
58/07/04
59/01/06
59/04/04
73/10/09
Event
Market closed
2.5% dividend
50% stock split
not traded
Occurrences
317 times
2 times
7 times
1 time
Technologies:
Technologies:
Ê  Dynamicprogramming(Dynamic
timewarping)
Ê  “Custom”algorithms
25
Time Switch Event
21:10
B
Fault 21
21:11
A
Warn 2
21:13
C
Warn 2
21:20
A
Fault 17
CS590D
DataMiningandBusinessIntelligence
Increasing potential
to support
business decisions
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
End User
Data
Analyst
housestofinddistributionpatterns
Ê  Maximizingintra-classsimilarity&minimizinginterclasssimilarity
Ê  Outlieranalysis
Ê  Outlier:Dataobjectthatdoesnotcomplywiththegeneralbehaviorof
Ê  Similarity-basedanalysis
discrimination
wetregions
Ê  Frequentpatterns,association,correlationandcausality
Ê  SmokingàCancer(Correlationorcausality?)
conceptsforfutureprediction
DBA
Ê  Classlabelisunknown:Groupdatatoformnewclasses,e.g.,cluster
Ê  Sequentialpatternmining,periodicityanalysis
Ê  Multidimensionalconceptdescription:Characterizationand
Ê  Constructmodels(functions)thatdescribeanddistinguishclassesor
Ê  Clusteranalysis
Ê  Trendanddeviation:e.g.,regressionanalysis
.022561
CS590D
Ê  Classificationandprediction
DataMiningFunctionalities(2)
Ê  Noiseorexception?
Spread
.022561
.022561
Ê  Generalize,summarize,andcontrastdatacharacteristics,e.g.,dryvs.
Business
Analyst
Data Warehouses / Data Marts
OLAP, MDA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
Ê  Trendandevolutionanalysis
Close Volume
369.50
314.08
369.25
313.87
Market Closed
370.00
314.50
DataMiningFunctionalities(1)
Data Exploration
Statistical Analysis, Querying and Reporting
thedata
Ê  clustering/classificationmethods Date
58/07/02
Ê  Statisticaltechniques
58/07/03
58/07/04
Ê  visualization
58/07/07
26
Ê  E.g.,classifycountriesbasedonclimate,orclassifycarsbasedongas
mileage
Ê  Predictsomeunknownormissingnumericalvalues
AreAllthe“Discovered”PatternsInteresting?
Ê  Dataminingmaygeneratethousandsofpatterns:Notallofthemare
interesting
Ê  Suggestedapproach:Human-centered,query-based,focusedmining
Ê  Interestingnessmeasures
Ê  Apatternisinterestingifitiseasilyunderstoodbyhumans,validonnewortest
datawithsomedegreeofcertainty,potentiallyuseful,novel,orvalidatessome
hypothesisthatauserseekstoconfirm
Ê  Objectivevs.subjectiveinterestingnessmeasures
Ê  Objective:basedonstatisticsandstructuresofpatterns,e.g.,support,confidence,
etc.
Ê  Subjective:basedonuser’sbeliefinthedata,e.g.,unexpectedness,novelty,
actionability,etc.
5
02/11/16
CanWeFindAllandOnlyInterestingPatterns?
DataMining:ConfluenceofMultipleDisciplines
Database
Technology
Ê  Findalltheinterestingpatterns:Completeness
Ê  Canadataminingsystemfindalltheinterestingpatterns?
Statistics
Ê  Heuristicvs.exhaustivesearch
Ê  Associationvs.classificationvs.clustering
Ê  Searchforonlyinterestingpatterns:Anoptimizationproblem
Machine
Learning
Data Mining
Visualization
Ê  Canadataminingsystemfindonlytheinterestingpatterns?
Ê  Approaches
Ê  Firstgeneralallthepatternsandthenfilterouttheuninterestingones.
Algorithm
Other
Disciplines
Ê  Generateonlytheinterestingpatterns—miningqueryoptimization
DataMining:
ClassificationSchemes
Ê  Generalfunctionality
Ê  Descriptivedatamining
Ê  Predictivedatamining
DataMiningfromdifferentperspectives
Ê  Datatobemined
Ê  Object-oriented/relational,spatial,time-series,text,multi-media,heterogeneous,
legacy,WWW
Ê  Knowledgetobemined
Ê  Characterization,discrimination,association,classification,clustering,trend/
Ê  Differentviewsleadtodifferentclassifications
Ê  Kindsofdatatobemined
Ê  Kindsofknowledgetobediscovered
Ê  Kindsoftechniquesutilized
Ê  Kindsofapplicationsadapted
deviation,outlieranalysis,etc.
Ê  Multiple/integratedfunctionsandminingatmultiplelevels
Ê  Techniquesutilized
Ê  Database-oriented,datawarehouse,machinelearning,statistics,visualization,etc.
Ê  Applicationsadapted
Ê  Retail,telecommunication,banking,fraudanalysis,bio-datamining,stockmarket
analysis,textmining,Webmining,etc.
PrimitivesthatDefine
aDataMiningTask
Primitive1:
Task-RelevantData
Ê  Task-relevantdata
Ê  Databaseordatawarehousename
Ê  Typeofknowledgetobemined
Ê  Backgroundknowledge
Ê  Patterninterestingnessmeasurements(?)
Ê  Visualization/presentationofdiscoveredpatterns
Ê  Databasetablesordatawarehousecubes
Ê  Conditionfordataselection
Ê  Relevantattributesordimensions
Ê  Datagroupingcriteria
6
02/11/16
Primitive2:
TypesofKnowledgetoBeMined
Ê  Characterization(Categories)
Ê  Discrimination
Ê  Association
Ê  Classification/prediction
Ê  Clustering
Ê  Outlieranalysis
Ê  Otherdataminingtasks
Primitive3:
BackgroundKnowledge
Ê  Schemahierarchy(taxonomy)
Ê  E.g.,street<city<province_or_state<country
Ê  Set-groupinghierarchy
Ê  E.g.,{20-39}=young,{40-59}=middle_aged
Ê  Operation-derivedhierarchy
Ê  emailaddress:[email protected]
login-name<department<university<country
Ê  Rule-basedhierarchy
Ê  low_profit_margin(X)<=price(X,P1)andcost(X,P2)and(P1-
P2)<$50
Primitive4:
MeasurementsofPatternInterestingness
Primitive5:
PresentationofDiscoveredPatterns
Ê  Simplicity
Ê  Differentbackgrounds/usagesmayrequiredifferentformsof
Ê  e.g.,(association)rulelength,(decision)treesize
Ê  Certainty
Ê  e.g.,confidence,classificationreliabilityoraccuracy,certaintyfactor,
rulestrength,rulequality,discriminatingweight,etc.
Ê  Utility
Ê  potentialusefulness,e.g.,support(association),noisethreshold
(description)
Ê  Novelty
Ê  notpreviouslyknown,surprising(usedtoremoveredundantrules)
WhyDataMiningQueryLanguage?
Ê  Automatedvs.query-driven?
Ê  Findingallthepatternsautonomouslyinadatabase?—
unrealisticbecausethepatternscouldbetoomanybut
uninteresting
Ê  Dataminingshouldbeaninteractiveprocess
Ê  Userdirectswhattobemined
Ê  Usersmustbeprovidedwithasetofprimitivestobeusedto
communicatewiththedataminingsystem
Ê  Incorporatingtheseprimitivesinadataminingquerylanguage
Ê  Moreflexibleuserinteraction
Ê  Foundationfordesignofgraphicaluserinterface
Ê  Standardizationofdataminingindustryandpractice
representation
Ê  E.g.,rules,tables,crosstabs,pie/barchart,etc.
Ê  Concepthierarchyisalsoimportant
Ê  Discoveredknowledgemightbemoreunderstandable
whenrepresentedathighlevelofabstraction
Ê  Interactivedrillup/down,pivoting,slicinganddicing
providedifferentperspectivestodata
Ê  Differentkindsofknowledgerequiredifferentrepresentation:
association,classification,clustering,etc.
Architecture:TypicalDataMiningSystem
Graphical User Interface
Pattern Evaluation
Data Mining Engine
Know
ledge
-Base
Database or Data
Warehouse Server
data cleaning, integration, and selection
Database
Data World-Wide Other Info
Repositories
Warehouse
Web
7
02/11/16
StateofCommercial/ResearchPractice
RelatedTechniques:OLAP
On-LineAnalyticalProcessing
Ê  Increasinguseofdataminingsystemsinfinancial
Ê  On-LineAnalyticalProcessingtoolsprovidetheabilityto
community,marketingsectors,retailing
Ê  Stillhavemajorproblemswithlarge,dynamicsets
ofdata(needbetterintegrationwiththedatabases)
Ê  COTSdataminingpackagesperformspecializedlearning
onsmallsubsetofdata
Ê  Mostresearchemphasizesmachinelearning;little
posestatisticalandsummaryqueriesinteractively
(traditionalOn-LineTransactionProcessing(OLTP)databases
maytakeminutesorevenhourstoanswerthesequeries)
Ê  Advantagesrelativetodatamining
Ê  Canobtainawidervarietyofresults
Ê  Generallyfastertoobtainresults
Ê  Peopleachievingresultsarenotlikelytoshare
Ê  Disadvantagesrelativetodatamining
Ê  Usermust“asktherightquestion”
Ê  Generallyusedtodeterminehigh-levelstatisticalsummaries,
ratherthanspecificrelationshipsamonginstances
OLAP:On-LineAnalyticalProcessing
IntegrationofDataMiningandData
Warehousing
emphasisondatabaseside(especiallytext)
knowledge
Ê  Dataminingsystems,DBMS,Datawarehousesystems
OLAPFunctionality
Profit Values
coupling
Ê  Dimensionselection
Ê  slice&dice
Sales
Region
OLAP
cube
Year
by Month
Product Class
by Product Name
Ê  Nocoupling,loose-coupling,semi-tight-coupling,tight-coupling
Ê  Rotation
Ê  allowschangeinperspective
Ê  Filtration
Ê  valuerangeselection
Ê  Interactiveminingmulti-levelknowledge
Ê  Necessityofminingknowledgeandpatternsatdifferentlevelsof
abstractionbydrilling/rolling,pivoting,slicing/dicing,etc.
Ê  Hierarchies
Ê  drill-downstolowerlevels
Ê  roll-upstohigherlevels
Ê  Integrationofmultipleminingfunctions
Ê  Characterizedclassification,firstclusteringandthenassociation
IntegrationofDataMiningand
DataWarehousing
AnOLAMArchitecture
Mining query
Mining result
Layer4
User Interface
User GUI API
OLAM
Engine
OLAP
Engine
Layer3
OLAP/OLAM
Data Cube API
Meta Data
Databases
Ê  Dataminingsystems,DBMS,Datawarehousesystemscoupling
Ê  Nocoupling,loose-coupling,semi-tight-coupling,tight-coupling
Ê  On-lineanalyticalminingdata
Ê  integrationofminingandOnlineAnalyticalProcessing(OLAP)technologies
Layer2
MDDB
Filtering&Integration
Ê  On-lineanalyticalminingdata
Ê  integrationofminingandOLAPtechnologies
MDDB
Ê  Necessityofminingknowledgeandpatternsatdifferentlevelsof
abstractionbydrilling/rolling,pivoting,slicing/dicing,etc.
Database API
Filterin
g
Data cleaning
Data
Warehouse
Data integration
Ê  Interactiveminingmulti-levelknowledge
Layer1
Data Repository
Ê  Integrationofmultipleminingfunctions
Ê  Characterizedclassification,firstclusteringandthenassociation
8
02/11/16
CouplingDataMiningwith
DB/DWSystems
Miningmethodology
Ê  Nocoupling—flatfileprocessing,notrecommended
Ê  Miningdifferentkindsofknowledgefromdiversedatatypes,
e.g.,bio,stream,Web
Ê  Loosecoupling
Ê  Performance:efficiency,effectiveness,andscalability
Ê  FetchingdatafromDB/DW
Ê  Semi-tightcoupling—enhancedDMperformance
Ê  Patternevaluation:theinterestingnessproblem
Ê  ProvideefficientimplementafewdataminingprimitivesinaDB/DW
Ê  Incorporationofbackgroundknowledge
system,e.g.,sorting,indexing,aggregation,histogramanalysis,multiway
join,precomputationofsomestatfunctions
Ê  (constraints,taxonomy)
Ê  Handlingnoiseandincompletedata(preprocessing)
Ê  Tightcoupling—Auniforminformationprocessingenvironment
Ê  DMissmoothlyintegratedintoaDB/DWsystem,miningqueryisoptimized
Ê  Parallel,distributedandincrementalminingmethods
basedonminingquery,indexing,etc.
Ê  Integrationofthediscoveredknowledgewithexistingone:
knowledgefusion
Summary
Ê  Datamining:discoveringinterestingpatternsfromlargeamountsofdata(DB)
Ê  Userinteraction
Ê  Anaturalevolutionofdatabasetechnology,ingreatdemand,withwide
Ê  Dataminingquerylanguagesandad-hocmining
applications
Ê  Expressionandvisualizationofdataminingresults
Ê  Interactiveminingofknowledgeatmultiplelevelsof
Ê  AKDDprocessincludesdatacleaning,dataintegration(DataWarehouse),data
abstraction
selection(DataMart),transformation,datamining,patternevaluation,and
knowledgepresentation
Ê  Miningcanbeperformedinavarietyofinformationrepositories
Ê  Applicationsandsocialimpacts
Ê  Domain-specificdatamining&invisibledatamining
Ê  Dataminingfunctionalities:characterization,discrimination,association,
classification,clustering,outlierandtrendanalysis,etc.
Ê  Protectionofdatasecurity,integrity,andprivacy
Ê  Subjective,requiresexpertknowledge
Ê  Dataminingsystemsandarchitectures
ABriefHistoryofDataMiningSociety
ConferencesandJournalsonDataMining
Ê  1989IJCAIWorkshoponKnowledgeDiscoveryinDatabases
Ê  KnowledgeDiscoveryinDatabases(G.Piatetsky-ShapiroandW.Frawley,1991)
Ê  1991-1994WorkshopsonKnowledgeDiscoveryinDatabases
Ê  AdvancesinKnowledgeDiscoveryandDataMining(U.Fayyad,G.Piatetsky-Shapiro,
P.Smyth,andR.Uthurusamy,1996)
Ê  1995-1998InternationalConferencesonKnowledgeDiscoveryinDatabasesand
DataMining(KDD’95-98)
Ê  JournalofDataMiningandKnowledgeDiscovery(1997)
Ê  ACMSIGKDDconferencessince1998andSIGKDDExplorations
Ê  Moreconferencesondatamining
Ê  PAKDD(1997),PKDD(1997),SIAM-DataMining(2001),(IEEE)ICDM(2001),etc.
Ê  ACMSIGKDDInt.Conf.on
KnowledgeDiscoveryin
DatabasesandDataMining
(KDD)
n 
Journals:
n 
Data Mining and
Knowledge Discovery
(DAMI or DMKD)
n 
IEEE Trans. On
Knowledge and Data
Eng. (TKDE)
n 
KDD Explorations
n 
ACM Trans. on KDD
Ê  SIAMDataMiningConf.(SDM)
Ê  (IEEE)Int.Conf.onDataMining
(ICDM)
Ê  Conf.onPrinciplesandpractices
ofKnowledgeDiscoveryandData
Mining(PKDD)
Ê  Pacific-AsiaConf.onKnowledge
DiscoveryandDataMining
(PAKDD)
Ê  ACMTransactionsonKDDstartingin2007
9
02/11/16
WheretoFindReferences?—DBLP,CiteSeer,Google
Ê  DataminingandKDD(SIGKDD:CDROM)
Ê  Conferences:ACM-SIGKDD,IEEE-ICDM,SIAM-DM,PKDD,PAKDD,etc.
Ê  Journal:DataMiningandKnowledgeDiscovery,KDDExplorations,ACMTKDD
DataWarehousingandOLAP
Technology
Ê  Databasesystems(SIGMOD:ACMSIGMODAnthology—CDROM)
Ê  Conferences:ACM-SIGMOD,ACM-PODS,VLDB,IEEE-ICDE,EDBT,ICDT,DASFAA
Ê  Journals:IEEE-TKDE,ACM-TODS/TOIS,JIIS,J.ACM,VLDBJ.,Info.Sys.,etc.
Ê  AI&MachineLearning
Ê  Conferences:Machinelearning(ML),AAAI,IJCAI,COLT(LearningTheory),CVPR,NIPS,
etc.
Ê  Journals:MachineLearning,ArtificialIntelligence,KnowledgeandInformationSystems,
IEEE-PAMI,etc.
www.SadiEvrenSEKER.com
10