Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Motivation Dataset Project Scope Data Warehousing Data Mining Conclusion Learning Experience References Research 1: Public school funding is the largest program in the state budget, receiving more than 40 % of the state's General Fund resources. The 2014–15 state budget includes more than $45 billion in General Fund resources. - California Department of Education Research 2: Since 1980, higher education spending has decreased by 13 percent in inflation adjusted dollars, whereas spending on California’s prisons and associated correctional programs has skyrocketed by 436 percent. - California HuffingtonPost CA State Government financial data reported by counties, cities and districts with more than billions of records in each files . This dataset has details of Expenditures, Revenues and State Income of all the departments generated in the form of fees, penalties and taxes. https://bythenumbers.sco.ca.gov/ Value County Year Dataset City Department Sub-Dept. District Financial Class To give important financial information on government’s funding and income based on distinct regions and departments. Target user ? Citizens, Tax payers, Students Businesses, Non-profit organizations Data Mart What is the State Income based on County, City and District? Which Business categories and Sub-Departments are responsible for the maximum income collection? Determine the expenditures for a particular department. How much has your county spent on public safety in the past 4 years? Original data: three different files in .csv format Handled missing values and listed required attributes for our project Data integration and data reduction to relevant records To provide relation and association among three different datasets, we created extra attributes and identifier Load tables in MySQL database Large Dataset with 20 billions of records approximately Departments with invalid and blank values were eliminated to maintain consistency in our record Removed least required sub-categories. Primary focus was on income and expenditure Demo !!! Link to our Data Mart : http://athena.ecs.csus.edu/~appanap/ Q. A star schema has what type of relationship between a dimension and fact table? a) b) c) d) Many to many One to one One to many All of the above Answer: One to many Data Mining Classification algorithm is used to classify the counties and departments into Loss and Profit classes Prediction for deciding value ranges for year 2014 Used the combined 3 datasets from our DataMart application Maintained required attributes for mining in CSV format Converted numeric column to nominal i.e. values to ranges Challenges Algorithms selection for dataset Large dataset Creating classes for important categories Classification Tree: J48 Tool: Weka, Tableau Used data of 2010 to 2013 as training data Data of year 2014 is used as Test data to check prediction accuracy As we can see the prediction is almost accurate. The Predicted class of 2014 has approximate same values and range compared to above actual values. Using tableau we were able to visualize and research for the following: what are the top 5 revenue and expenditure generating counties? Financial data comparison in between Sacramento and Los Angeles Funds distribution for flood control Government expenditures on public facilities like roads, parking facilities etc. Government expenses on public health Comparison on government’s expenses between prisons and education Q. Which of the following is not a data mining functionality? a) Characterization and Discrimination b) Classification and regression c) Selection and interpretation d) Clustering and Analysis Answer: Selection and interpretation Data Warehouse design: PHP, HTML/CSS, JavaScript Database: MySQL Data Mining tools: WEKA Data Visualization: Tableau Learned designing of data mart application Learned different data mining tools like Weka, Rapid miner and Tableau Learned the practical usage of various classification algorithms like J48, Naïve Bayesian, correlation matrix Team work and brainstorming really helped us to resolve issues in execution of our project California States Controller’s Office , Government Financial Reports, https://bythenumbers.sco.ca.gov/browse?utf8=%E2%9C%93&page=1 California Department of Education: http://www.cde.ca.gov/fg/fr/eb/ California Drought http://drought.ca.gov/topstory/top-story-58.html California Spending More On Prisons Than Colleges, Report http://www.huffingtonpost.com/2012/09/06/california-prisonscolleges_n_1863101.html