Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Adventures in Data Mining Kimberly Kirkpatrick Kansas State University What is Data Mining? The process of knowledge discovery in databases Data mining should be hypothesis driven This is not the same as data dredging or data fishing! Why Data Mining? Flexibility in data analysis ◦ Scaling issues May be able to use one data many purposes set for ◦ Grant applications ◦ Modelling/curve fitting ◦ Generate new research questions Data Collection: Time-event codes Time stamp in ms 841.005 1564.005 1650.005 2901.005 3666.005 3856.005 15409.005 19075.005 20331.005 21975.005 47126.006 47277.006 47391.006 47495.006 47598.006 55217.006 55268.006 59765.005 59959.010 60793.005 62070.005 62326.005 62377.005 62411.005 63585.005 64494.005 64882.005 65873.005 66514.005 66741.005 69959.020 69959.013 70059.023 70477.005 106429.005 108570.006 108702.006 109337.010 112883.005 113133.005 119337.020 119337.013 119387.023 120100.005 Event codes Head entry into food cup = 005 Drinking from water tube = 006 Tone on = 010 Tone off = 020 Food on = 013 Food off = 023 Value of Data Mining: Scaling Issues Delay IRT1=1564-841 IRT2=1650-1564 US US Temporal Conditioning Extract times of responses of interest (005) Take the difference of those times 0.03 0.05 0.025 0.045 Probability 0.04 0.035 Probability 841.005 1564.005 1650.005 2901.005 3666.005 3856.005 15409.005 19075.005 20331.005 21975.005 47126.006 47277.006 47391.006 47495.006 47598.006 55217.006 55268.006 59765.005 59959.010 60793.005 62070.005 62326.005 62377.005 62411.005 63585.005 64494.005 64882.005 65873.005 66514.005 66741.005 69959.020 69959.013 70059.023 70477.005 106429.005 108570.006 108702.006 109337.010 112883.005 113133.005 119337.020 119337.013 119387.023 120100.005 0.03 0.025 0.02 0.015 0.01 0.02 0.015 0.01 0.005 0.005 0 0 50 100 150 200 250 300 Interresponse time (s) 0 0.01 0.1 1 10 100 Interresponse Time (s) Kirkpatrick & Church (2003) 1000 Value of Data Mining: Scaling Issues Probability density 0.6 Bout IRTs 0.5 Pause IRTs 0.4 0.3 0.2 0.1 0 0.01 0.1 1 10 100 1000 Interresponse time (s) Kirkpatrick & Church (2003) Value of Data Mining May be able to use one data set for many purposes ◦ Grant applications ◦ Generate new research questions B Change-over Time (s) 140 120 100 80 60 40 20 0 1 2 3 Reinforcer Magnitude Galtress, Garcia and Kirkpatrick (2012) 4 The Dark Side of Data Mining Time consuming (5002000 data files/study) Resource intensive Requires high-level programming and data analysis skills Summary and Conclusions Consider sharing tools for analysis [email protected] Russ Church Ana Garcia Tiffany Galtress Yuci Gou Andrew Marshall My rats