Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Tandem Computers wikipedia , lookup
Microsoft Access wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Team Foundation Server wikipedia , lookup
Relational model wikipedia , lookup
Database model wikipedia , lookup
Clusterpoint wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Industrial Project (234313) Final Presentation “App Analyzer” Deliver the right apps users want! (VMware) Students: Edward Khachatryan & Elina Zharikov Supervisors: Yoel Calderon, Yan Aksenfeld The Problem IT administrator doesn’t know which applications need to be managed Mirage Servers & Single Instance Stores Network Optimized Synchronization & Streaming Base layer Application layer(s) Drivers User profile User data Machine identity Apps not installed by Mirage Goals Find the optimal combination of Base and App layers for a given organization Produce reports for the administrator Finance Apps Finance Desktops Single Base Layer HR Apps HR Desktops Windows 7 Antivirus Common Apps IT Apps IT Desktops Methodology Research clustering algorithms Connect to Mirage Database on SQL Server Parse UTF encoded XML data Process and analyze the data Build custom reports Methodology Research and choose the right set of tools ◦ Python libraries: scikit-learn for clustering algorithms lxml for parsing UTF encoded XML SQLAlchemy for SQL interaction pandas for gluing it all together ◦ Microsoft SQL Report Builder for custom reports ◦ VMWare Mirage web interface for GUI Achievements Quick and efficient data analysis: the desired results can be generated in just a few minutes User friendly experience: a variety of reports can be produced in a matter of few clicks Integration with the existing VMWare Mirage platform A variety of parameters to customize the output Examples Examples Examples Examples Examples Examples Live demonstration… Conclusions DBSCAN is a fast clustering algorithm. It’s scalable for large datasets and works well with Boolean vectors data. Instead of the usual Euclidian distance, it’s better to work with metrics intended for boolean-valued vector spaces, such as Jaccard, Sokal-Sneath or Dice. Using open source libraries saves a lot of valuable time. Microsoft SQL Report Builder is a great WYSIWYG tool for building custom reports Progress Recap 31.3 – Kickoff Meeting 31.3-12.4 – Research period: reading materials on clustering algorithms. 12.4-19.4 – Installing Microsoft SQL Server, restoring a VMWare Mirage database, querying and parsing the data from the database. 19.4-26.4 – Creating a filtering module to clean up the raw application list: uniting applications by their name, product ID or upgrade code, filtering out unimportant applications. Finalizing the criteria for Base Layer apps. Progress Recap 26.4-11.5 – Focusing on 4 clustering algorithms (K-Means, Agglomerative, DBSCAN, Birch), testing various parameters and metrics on different databases. 12.5 – Midway meeting 12.5-19.5 – Continuing the aforementioned tests, focusing strictly on DBSCAN. 19.5-25.5 – Setting up and configuring a virtual machine running Windows Server with VMWare Mirage and Microsoft SQL Server Reporting Services. Progress Recap 25.5-7.6 ◦ Learning to use Microsoft SSRS, the Report Builder tool and Mirage web interface. ◦ Moving the Python IDE and SQL databases to the virtual machine. ◦ Actually exporting our results to SQL instead of CSV and text files. ◦ Building a sample report. 7.6-17.6 – Building custom reports according to the given guidelines. 18.6-27.6 – Improving reports’ appearance, fixing bugs, parameterizing the Python code.