Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tracing and accounting of physical resources in the computer centre August 2015 Author: Josip Domšić Supervisor: Ulrich Schwickerath CERN openlab Summer Student Report 2015 CERN openlab Summer Student Report 2015 Project Specification CERN is going for a large-scale virtualization of its more than 10’000 servers and all hypervisors and most virtual machines are centrally managed. However there are some legitimate use cases, which cannot be covered by this scheme. Therefore, service managers have the possibility to request physical resources, which will then be owned by them. Central management of these resources is not strictly required. In order to trace and properly account for these resources, the information from different sources needs to be combined. The aim of the project is to develop the required tools, ensure regular running, and feed the data into the relevant accounting repositories. CERN openlab Summer Student Report 2015 Abstract Within the project “tracing and accounting of physical resources in CERN data centre” a fully automated way of collecting information about physical resources has been designed and implemented. The implementation is split into three phases. In the first phase the information is collected from a number of different databases: hardware, network, elastic search, puppet, and foreman. In the second phase the collected information is stored into a Django database. The third part is a web application aggregates and displays the accumulated data. . CERN openlab Summer Student Report 2015 Table of Contents Project Specification........................................................................................................................ 2 Abstract ........................................................................................................................................... 3 Table of Contents ............................................................................................................................ 4 1. Introduction ............................................................................................................................. 5 2. Technologies ........................................................................................................................... 5 3. Design ...................................................................................................................................... 6 1. Generate daily report ........................................................................................................... 7 Hardware database................................................................................................................... 7 Network database .................................................................................................................... 7 Puppet and Foreman ................................................................................................................ 7 Generating report ..................................................................................................................... 8 2. Accumulate daily data ......................................................................................................... 8 3. Hosting application data ...................................................................................................... 9 4. Usage ..................................................................................................................................... 11 5. Conclusion ............................................................................................................................. 11 CERN openlab Summer Student Report 2015 1. Introduction The IT department is an organization responsible for managing computer resources at CERN, European Organization for Nuclear Research. CERN is offering computer infrastructure to scientists to do different experiments, number crunching, validation of hypothesis, etc. All mentioned tasks are performed within the CERN computer infrastructure which utilises databases, job schedulers, message queues and virtualization of hardware resources. To perform those tasks efficiently, the IT department is divided into different groups: databases, network, cloud, etc. The computing needs increase with time. Therefore, existing equipment is regularly being replaced and/or renewed, and the capacity is being increased regularly. The procurement of new resources roughly has three steps: Order new hardware Run tests and rate the new hardware Appoint new machine to specific tasks and organizational groups A resource can be appointed to either experiments, scientists directly or the IT department. After some time, resource can change owner, tasks and/or configuration. In addition, manually maintain track of this process bears the risk for errors, resulting in a sub-optimal resource usage. The goal of this project is to generate daily reports, automatize statistics of all the CERN hardware resources and prevent unwanted events. 2. Technologies Technologies used in the project are concentrated around Python programming language and the web framework Django, Linux tool cron job, and different types of relational databases. Python is an object oriented scripting programing language. It offers simple means to deal with raw text and JSON documents. With an addition of Django web framework, and its object management and native handing of SQL databases, Python becomes powerful tool for building web pages. Linux cron job tool offer simple infrastructure to run specific, periodic tasks at a given time. CERN openlab Summer Student Report 2015 3. Design The project follows the idea of micro services. Micro services are infrastructural pattern for developing large applications. Large applications are divided into smaller programs called services. All services are developed, tested and ran separately, but they should support some interface for communication. Infrastructure for collecting information about hardware resources in CERN data centre is divided into 3 parts: Generate daily report Accumulate daily data, per owner and current user of machine Host statistical data on a server The process can be seen in the bottom picture, and is explained in detail in following chapters. CERN openlab Summer Student Report 2015 1. Generate daily report Hardware database Generator of daily raw data is run each day and collects data from 5 different databases. Hardware database supplies the starting point: serial number. Together with serial number, generator collects hostname, number of physical and logical cores and HEP spec06 ratings. HEP spec06 rating is a CERN standard way of comparing machines' specifications. If something unusual happens during calculating HEP spec06 ratings or acquiring number of physical or logical cores, generator deals with it in a following manner: If number of physical cores is missing, it is set to 0 and a warning is printed: “[WARNING] Number of physical cores is missing for #Serial Number”. If number of logical cores is missing, it is set to number of physical cores If HEP spec06 ratings is missing, or set to 0, it is set to logical cores times number_from_configuration Network database Next step is collecting the information from network database. The data is collected by a serial number: Owner of the machine Current user of the machine If current user is missing, it's set to owner. If serial number found in network database is missing in current report (missing in hardware database), an appropriate warning is printed out: “[NETWORK] Serial number #serial_number not in hardware database. Device name #device_name”. Puppet and Foreman Next step is collecting management flags: Puppet and Foreman. All entries from Puppet and Foreman databases are compared to the current report. If there is a match between hostnames, an appropriate flag is set. E.g. If a hostname from network database is present in Puppet database, flag is_puppet is set to true. Additional to the management flags, flags “SPARE” and “INCOMING” are generated from machines’ host group field. Additionally, if some machines are present in PuppetDB or ForemanDB, but not in HardwareDB, they are printed to error file as: [PUPPET] Device missing #device_name [FOREMAN] Device missing #device_name CERN openlab Summer Student Report 2015 Generating report All connected data is grouped into a JSON file. Filename is created from today’s date in a following pattern: %yyyy-%mm-%dd.json. An appropriate error file has the same name but different ending: error. Reports are stored into directory “/var/reports/hardware_resources”. From collected flags, 3 additional flags are generated for each device: UNMANAGED flag STALE flag NOT_IN_PRODUCTION flag The data is grouped by current owner department and owner groups in the following manner: owner_department: owner_group: user_group – user_department: [information about devices] .... .... .... Example: A: A1: A-A1: ... (INFO) D-D9: ... A2: ... B: .... .... Suggestion for reading this raw data would be: cat report.json | python -m json.tool | grep [expression] cat report | jgrep [expression] 2. Accumulate daily data The accumulation of daily data is really done in previously explained generator. Accumulation can be then referred to as saving raw data into the Django web app (Sqlite database). Script is called accumulate.py and is run every day at 6 am. It reads raw data from directory mentioned into configuration file that is passed as first parameter to script. Second parameter is optional and it's a filename / data that wish to be imported into database. Example: accumulate.py configuration.conf --date 2015-08-21 CERN openlab Summer Student Report 2015 Structure of data that is being saved is: not_in_production stale_machines unmanaged_machines number_of_machines hepspec_ratings logical_cores physical_cores user owner_group owner_department Table name is hardware_resources, in a database mentioned in Django's settings.py 3. Hosting application data Let’s assume the application has been deployed on a server named “example.cern.ch”. Additional GET flags are hw_details and hw_group_details. ?hw_details=[DEPARTMENT] flag DEPARTMENT ( E.g. IT, PH , ...) shows all group statistics in the specified ?hw_group_details=[GROUP_NAME] flag shows current_user statistics in the specified DEPARTMENT and with the specific GROUP_NAME (E.g. PES, DB, ... ) Example: https://example.cern.ch/accounting?hw_details=A&hw_group_details=A1 Picture below shows example statistics for the 1st of August 2015. CERN openlab Summer Student Report 2015 The fields in the table are as follows: Department name Department group Current user (department-group) Number of machines Number of physical cores Number of logical cores Total HEP spec06 ratings (divided by 1000) Number of unmanaged machines and (in brackets) percentage of unmanaged machines (compared to number of machines) Number of stale machines and (in brackets) percentage of stale machines (compared to number of machines) Number of machines that are not in production and (in brackets) percentage of those machines (compared to total HEP spec06) CERN openlab Summer Student Report 2015 4. Usage To generate useful data in an accounting web application, one needs to: 1. Run a generator, in an cron job or manually generate_report.py path/to/file.conf 2. Move the data from the raw JSON to database accumulate.py path/to/file.conf [--date %Y-%m-%d] 3. Host an accounting application and make a HTTP query http://example.cern.ch/accounting?hw_details=A&hw_group_details=A2 5. Conclusion Utilizing micro services, as an infrastructural pattern for building larger application, in this project we have created a way for easier change in the future. This project created a fully automated way to monitor usage of hardware resources. Monitoring of hardware resources can now be done on a daily basis and all unwanted events are minimized.