Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7. Importing and Exporting Data from Various DBs 인공지능연구실 G201549028 조찬연 Artificial Intelligence Laboratory Contents Understanding different types of files Learning about data files as database MySQL Excel MongoDB SQLite PostgreSQL Hive HBase Artificial Intelligence Laboratory Learning about data files as database R studio supports following database system by realated R packages Database system name Useful R packages/ function Text files Text data files MySQL RMySQL Excel Xlsx Mongo Rmongo SQLite RSQLite PostgreSQL RPostgreSQL HDFS RHDFS Hive Rhive HBase RHbase Big Data Analytics with R and Hadoop, 2013, p 81 Learning about data files as database Purpose of this chapter While dealing with the data analytics activities, we need to do data importing, loading, or exporting. Sometimes the same operations need to be iterated with R programming language. So, we can use the available R function for performing the same data activities. Artificial Intelligence Laboratory Understanding different types of files Importing the data into R - CSV, txt, .RDATA, .rda CSV (Comma Separated Values) txt (with Tab Separated Values) .RDATA (R’s native data format) .rda (R’s native data format) Artificial Intelligence Laboratory Understanding different types of files CSV CSV (Comma Separated Values) A comma-separated values (CSV) (also sometimes called characterseparated values) file stores tabular data (numbers and text) in plain-text form. Plain text means that the file is a sequence of characters, with no data that has to be interpreted as binary numbers. A CSV file consists of any number of records, separated by line breaks of some kind; each record consists of fields, separated by some other character or string, most commonly a literal comma or tab. Usually, all records have an identical sequence of fields. https://en.wikipedia.org/wiki/Comma-separated_values Understanding different types of files CSV - Import Artificial Intelligence Laboratory Understanding different types of files CSV - Export Artificial Intelligence Laboratory Understanding different types of files .RDATA .RDATA (R’s native data format) RDATA: Here, the .RDATA format is used by R for storing the workspace data for a particular time period. It is considered as image file. This will store/retrieve all of the data available in the workspace. example Artificial Intelligence Laboratory Understanding different types of files .RDATA - Export Artificial Intelligence Laboratory Understanding different types of files .rda .rda (R’s native data format) .rda: This is also R's native data format, which stores the specific data variable as per requirement. example Artificial Intelligence Laboratory Understanding different types of files .rda - Export Artificial Intelligence Laboratory Understanding MySQL MySQL is the world’s second most widely used relational database management system(RDBMS) and most widely used open-source RDBMS. Since both R and MySQL both are open source, they can be used for building the interactive web analytic applications. https://en.wikipedia.org/wiki/MySQL Understanding MySQL Installing MySQL on Linux Download suitable TAR file from www.mysql.com shell> groupadd mysql shell> useradd -r -g mysql mysql shell> cd /usr/local shell> tar zxvf /path/to/mysql-VERSION-OS.tar.gz shell> ln -s full-path-to-mysql-VERSION-OS mysql shell> cd mysql shell> chown -R mysql mysql shell> scripts/mysql_install_db --user=mysql shell> chown -R root shell> chown -R mysql data shell> bin/mysqld_safe --user=mysql shell> cp support-files/mysql.server /etc/init.d/mysql.server http://dev.mysql.com/doc/refman/5.6/en/binary-installation.html Understanding MySQL Installing RMySQL install.package(“RMySQL”) Artificial Intelligence Laboratory Understanding MySQL Learning to list the tables and their structure Artificial Intelligence Laboratory Understanding MySQL Importing the data into R Artificial Intelligence Laboratory Understanding MySQL Understanding data manipulation Artificial Intelligence Laboratory Understanding Excel Data manipulation, Exporting the data to excel Artificial Intelligence Laboratory Understanding MongoDB MongoDB is one of many cross-platform document-orineted databases. Classified as a NoSQL database, MongoDB eschews the traditional table-base relational database structure in favor of JSON-like documents with dynamic schemas, making the integration of data in certain types of applications easier and faster. Supported with C, C++, C#, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, and Scala. http://dev.mysql.com/doc/refman/5.6/en/binary-installation.html Understanding MongoDB Installing MongoDB on Linux Download suitable TAR file from www.mongodb.org shell> tar zxvf /path/to/mongoDB-VERSION-OS.tar.gz shell> ln -s full-path-to-mongoDB-VERSION-OS mongoDB shell> cd mongoDB shell> mkdir /data/db shell> ./mongod # start a Mongo console shell> ./mongo Artificial Intelligence Laboratory Understanding MongoDB Mapping SQL to MongoDB SQL MongoDB Database Database Table Collection Index Index Row Document Column Field Joining Embedding & Linking Artificial Intelligence Laboratory Understanding MongoDB Understanding data manipulation Artificial Intelligence Laboratory Understanding MongoDB Understanding data manipulation Artificial Intelligence Laboratory Understanding MongoDB Understanding data manipulation Artificial Intelligence Laboratory Understanding SQLite SQLite is a relational database management system contained in a C programming library. In contrast to other database management systems, SQLite is not a clientserver database engine. Rather, it is embedded into the end program. http://en.wikipedia.org/wiki/SQLite Understanding SQLite Installing SQLite Download suitable TAR file from www.SQLite.org shell> tar zxvf /path/to/SQLite-VERSION-OS.tar.gz shell> ln -s full-path-to-mongoDB-VERSION-OS SQLite shell> cd SQLite # start a SQLite console shell> ./SQLite3 Artificial Intelligence Laboratory Understanding SQLite Understanding data manipulation Artificial Intelligence Laboratory Understanding SQLite Understanding data manipulation Artificial Intelligence Laboratory Understanding PostgreSQL PostgreSQL, often simply “Postgres”, is an object-relational database management system(ORDBMS) with an emphasis on extensibility and standards-compliance. As a database server, its primary function is to store data securely, supporting best practices, and to allow for retrieval at the request of other software applications. It can handle workloads ranging from small single-machine applications to large Internet-facing applications with may concurrent users. Recent versions also provide replication of the database itself for availability and scalability. http://en.wikipedia.org/wiki/PostgreSQL Understanding PostgreSQL Installing PostgreSQL Create the file and add a line to /etc/apt/sources.list.d/pgdg.list deb http://apt.postgresql.org/pub/repos/apt trusty-pgdg main shell> wget -quiet -0 - http://www.postgresql.org/media/keys/ACCC4CF8.asc | \ sudo apt-key add sudo apt-get update shell> sudo apt-get install postgresql-9.4 # add user and start a postgresql console shell> su - postgres createuser shell> sudo -u postgres psql Artificial Intelligence Laboratory Understanding PostgreSQL Installing RPostgreSQL and importing the data shell> sudo apt-get install libpq-dev Artificial Intelligence Laboratory Understanding PostgreSQL Installing RPostgreSQL and importing the data Artificial Intelligence Laboratory Understanding PostgreSQL Installing RPostgreSQL and importing the data Artificial Intelligence Laboratory Understanding PostgreSQL Installing RPostgreSQL and importing the data Artificial Intelligence Laboratory Understanding Apache Hive Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, analysis. While initially developed by Facebook, Apache Hive is now used and developed by other companies such as Netflix. Amazon maintains a software fork of Apache Hive that is included in Amazon Elastic MapReduce on Amazon Web Services http://en.wikipedia.org/wiki/PostgreSQL Understanding Apache Hive Installing Apache Hive on Linux Download suitable TAR file from http://hive.apache.org shell> tar zxvf /path/to/apache-hive-VERSION-OS.tar.gz shell> ln -s full-path-to-apache-hive-VERSION-OS apacheHive shell> cd apacheHive shell> ./hiveserver2 Artificial Intelligence Laboratory Understanding Apache Hive Understanding data manipulation Artificial Intelligence Laboratory Understanding Apache HBase HBase is an open source, non-relational, distributes database modeled after Google’s BigTable and written in java. It is developed as part of Apache Software Foundation’s Apache Hadoop project and runs on top of HDFS, providing BitTablelike capabilities for Hadoop. http://en.wikipedia.org/wiki/PostgreSQL Understanding Apache HBase Installing Apache Hbase, thrift on Linux Download suitable TAR file from http://hbase.apache.org shell> tar zxvf /path/to/hbase-VERSION-OS.tar.gz shell> ln -s full-path-to-hbase-VERSION-OS hbase shell> cd hbase shell> ./start-hbase.sh Download suitable TAR file from http://thrift.apache.org shell> tar zxvf /path/to/thrift-VERSION.tar.gz shell> cd full-path-to-thrift-VERSION-OS shell> ./configure shell> make shell> make install Artificial Intelligence Laboratory Understanding Apache HBase Installing Apache Hbase, thrift on Linux shell> sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libboost-system-dev libboost-filesystem-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev shell> wget http://ftp.debian.org/debian/pool/main/a/automake1.14/automake_1.14. 1-3_all.deb sudo dpkg -i automake_1.14.1-3_all.deb Download suitable TAR file from https://github.com/RevolutionAnalytics Artificial Intelligence Laboratory Understanding Apache HBase Importing data in Apache HBase Artificial Intelligence Laboratory