Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Title: Email Trust in a Mobile Cloud using Hadoop Framework Group member: Sayan Kole, Jaya Chakladar Group number: 1 Project goal The goal of this project has been to learn and understand the mapreduce framework and implement a system of parallel processing in the existing mobicloud system. There has been some limited study of usage of mapreduce in a mobile computing environment. With the wide usage of smartphones, web accessibility and increasing user demand, it almost seems logical to use the computing capabilities of the cell phones to achieve higher efficiency. Previous studies have been mainly limited to implement a Hadoop like framework in smart phones. Some the major challenges has been that a lighter device like a cell phone struggles to support a heavy framework like Hadoop, which primarily runs in a data farm like setup. In the current Mobicloud environment, one of the advantages is that every mobile is mapped to a virtual machine. Thus computing can be moved over to the virtual machine. At the onset of the project, we tried to explore possibility of implementing an application that is useful to mobile users and can also harness the computing power of the virtual machines. After a couple of iterations, we decided to explore the email trust application system in a parallel processing environment. The document tries to explain our design, challenges faced and proposed solutions to the problem. Project Tasks This is an exhaustive list of project tasks that we performed in this project. We have explored quiet a bit, and had to redesign our project multiple number of times to help propose some solutions. 1. 2. 3. 4. Install and configure Hadoop in MobiCloud Create a UI web application Create an application to synchronize contact list with virtual machines Develop an application to search for documents a. Create a mapper function to map users with keywords b. Create a reduction function to sort list of users based on their degree of match c. Create and update an Apache HDFS data store that the Hadoop Master uses to access data. 5. System testing, troubleshooting and regression testing 6. Delivery and demo Project tasks allocations Tasks Install and configure Hadoop in MobiCloud Develop UI web application Synchronize phone list with virtual machines Search mapper algorithm Search reduction algorithm HDFS data store creation and updates Testing and problem resolution Delivery and demo Responsible Jaya & Sayan Jaya Sayan Sayan Jaya Jaya & Sayan Jaya & Sayan Jaya & Sayan Used software and hardware Hadoop Database software e.g. MySQL or Apache HDFS 3 or 4 Android phones mapped to virtual machines in 2 different Linux boxes Network setup and requirements Hadoop Master Application HDFS Data Store Web Application to process Requests VM VM VM VM Technical details for each task 1. Install and configure Hadoop in MobiCloud a. Configure single node clusters. b. Make one cluster Master and the other Slave. c. Configure both machines to use a common network e.g. 192.168.0.x/24 d. SSH access 1. Master should be able to ssh to its own user account on the master. 2. Also it should be able to ssh to the slave machine 3. Add hadoop@master public SSH key to the authorized_keys file of hadoop@slave. 4. Last step is to test the SSH setup by connecting with user hadoop from the master to the user account hadoop on the slave Master node consists of job tracker, task tracker, name node and data node Slave node consists of data node and task tracker Job tracker pushes work out to Task tracker node in the cluster and keeps work as close to the data as possible. Task tracker accepts tasks like Map, Reduce from a Job Tracker Data node contains the data to be processed Name node keeps the directory tree of all files in the file system and tracks where across the cluster the file data is kept. 2. Create a UI web application a. The user interface of the application will be web based b. The output will be shown back in the same web page 3. Create an application to synchronize contact list with virtual machines a. Use autosync feature to upload contact list to VM b. Use Google contacts to do automatic synchronization. 4. Develop an application to search for documents a. The user requests a document search using some keywords. The input is handled in a webpage hosted as part of the web application b. The web application hands over the request to the Hadoop Master application c. The master application accesses the contact list of the requesting user d. For each entry in the contact list, it creates a job to map the user to the keywords. e. The reduction function creates a sorted list of users based on their degree of match f. The result set is returned to the web application to update the web page displayed to the user. 5. Creation and updating of Apache HDFS distributed file storage system 6. System testing, troubleshooting and regression testing a. Creation of test cases b. Creation of test data c. Execute tests, and analyze results d. Troubleshoot problems, correct and retest. 7. Delivery and demo Challenges faced during the project: Hadoop is a parallel processing framework, so one has be to be careful about the choice of project that can be implemented The current email trust evaluation algorithm requires to find a path between two users based on criteria like number of hoops, trust rating by the original user of all the nodes along the path etc. Hadoop on the other hand creates a master-slave environment. Thus dependencies amongst the nodes is hard to bypass Solutions proposed: We choose to break up the project into two distinct part. o One would be a user interface based on a single node application o Another is a multi node application for Email trust if feasible. The single node hadoop application is an endeavor to replicate an user interface similar to elastic mapreduce. It enables a user to create a work flow Each work flow consists of datafile, mapper file, reducer file The user can either use a default hadoop setting or a customized setting for the job. The system configures the hadoop environment, creates the appropriate jobs , runs the job and returns result back to the user. Future enhancements: Implement database access, since typically data will be stored in bigger databases Implement Hadoop streaming to support other programming languages Implement Hive and Pig which are SQL like interfaces. Email Trust Algorithm-: Trust file of A User Trust A B C D 0 5 4 99 Trust file of B User Trust A B C D 5 0 4 7 Trust file of C User Trust A B C D 8 2 0 3 Trust file of D User Trust A B C D 6 8 1 0 Thus A gives a trust rating to D based on the formula (of its neighbors)-: {Rating of A to B(5) * Rating of B to D (7) + Rating of A to C(4) * Rating of C to D (3)} / {max(5,7)+max(4,3)} = {5*7 + 4*3}/{7+4} = 47/11 = 4 Problems running this job on hadoop: 1. Since the trust rating which user gives to unknown person is interdependent on the other users trust rating of the unknown user, it is very difficult to parallelize the job. 2. Map function given to individual nodes, hence cannot have shared variables. 3. Running on Virtual Machines makes job harder. 4. Very difficult to debug and find out where exactly the problem lies. 5. Transferring files from local file system to Hadoop DFS is itself very time consuming and frequent connection loss occurs.