Download Title: Email Trust in a Mobile Cloud using Hadoop Framework Group

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Title: Email Trust in a Mobile Cloud using
Hadoop Framework
Group member: Sayan Kole, Jaya Chakladar
Group number: 1
Project goal
The goal of this project has been to learn and understand the mapreduce framework
and implement a system of parallel processing in the existing mobicloud system.
There has been some limited study of usage of mapreduce in a mobile computing
environment. With the wide usage of smartphones, web accessibility and increasing
user demand, it almost seems logical to use the computing capabilities of the cell
phones to achieve higher efficiency. Previous studies have been mainly limited to
implement a Hadoop like framework in smart phones. Some the major challenges
has been that a lighter device like a cell phone struggles to support a heavy
framework like Hadoop, which primarily runs in a data farm like setup. In the current
Mobicloud environment, one of the advantages is that every mobile is mapped to a
virtual machine. Thus computing can be moved over to the virtual machine. At the
onset of the project, we tried to explore possibility of implementing an application
that is useful to mobile users and can also harness the computing power of the
virtual machines. After a couple of iterations, we decided to explore the email trust
application system in a parallel processing environment. The document tries to
explain our design, challenges faced and proposed solutions to the problem.
Project Tasks
This is an exhaustive list of project tasks that we performed in this project. We have
explored quiet a bit, and had to redesign our project multiple number of times to help
propose some solutions.
1.
2.
3.
4.
Install and configure Hadoop in MobiCloud
Create a UI web application
Create an application to synchronize contact list with virtual machines
Develop an application to search for documents
a. Create a mapper function to map users with keywords
b. Create a reduction function to sort list of users based on their degree of
match
c. Create and update an Apache HDFS data store that the Hadoop Master
uses to access data.
5. System testing, troubleshooting and regression testing
6. Delivery and demo
Project tasks allocations
Tasks
Install and configure
Hadoop in MobiCloud
Develop UI web application
Synchronize phone list with
virtual machines
Search mapper algorithm
Search reduction algorithm
HDFS data store creation
and updates
Testing and problem
resolution
Delivery and demo
Responsible
Jaya & Sayan
Jaya
Sayan
Sayan
Jaya
Jaya & Sayan
Jaya & Sayan
Jaya & Sayan
Used software and hardware
Hadoop
Database software e.g. MySQL or Apache HDFS
3 or 4 Android phones mapped to virtual machines in 2 different Linux boxes
Network setup and requirements
Hadoop
Master
Application
HDFS
Data
Store
Web
Application
to process
Requests
VM
VM
VM
VM
Technical details for each task
1. Install and configure Hadoop in MobiCloud
a. Configure single node clusters.
b. Make one cluster Master and the other Slave.
c. Configure both machines to use a common network e.g. 192.168.0.x/24
d. SSH access
1. Master should be able to ssh to its own user account on the master.
2. Also it should be able to ssh to the slave machine
3. Add hadoop@master public SSH key to the authorized_keys file of
hadoop@slave.
4. Last step is to test the SSH setup by connecting with user hadoop from
the master to the user account hadoop on the slave
Master node consists of job tracker, task tracker, name node and data node
Slave node consists of data node and task tracker
Job tracker pushes work out to Task tracker node in the cluster and keeps work as
close to the data as possible.
Task tracker accepts tasks like Map, Reduce from a Job Tracker
Data node contains the data to be processed
Name node keeps the directory tree of all files in the file system and tracks where
across the cluster the file data is kept.
2. Create a UI web application
a. The user interface of the application will be web based
b. The output will be shown back in the same web page
3. Create an application to synchronize contact list with virtual machines
a. Use autosync feature to upload contact list to VM
b. Use Google contacts to do automatic synchronization.
4. Develop an application to search for documents
a. The user requests a document search using some keywords. The input is
handled in a webpage hosted as part of the web application
b. The web application hands over the request to the Hadoop Master application
c. The master application accesses the contact list of the requesting user
d. For each entry in the contact list, it creates a job to map the user to the
keywords.
e. The reduction function creates a sorted list of users based on their degree of
match
f. The result set is returned to the web application to update the web page
displayed to the user.
5. Creation and updating of Apache HDFS distributed file storage system
6. System testing, troubleshooting and regression testing
a. Creation of test cases
b. Creation of test data
c. Execute tests, and analyze results
d. Troubleshoot problems, correct and retest.
7. Delivery and demo
Challenges faced during the project:



Hadoop is a parallel processing framework, so one has be to be careful about the
choice of project that can be implemented
The current email trust evaluation algorithm requires to find a path between two
users based on criteria like number of hoops, trust rating by the original user of
all the nodes along the path etc.
Hadoop on the other hand creates a master-slave environment. Thus
dependencies amongst the nodes is hard to bypass
Solutions proposed:






We choose to break up the project into two distinct part.
o One would be a user interface based on a single node application
o Another is a multi node application for Email trust if feasible.
The single node hadoop application is an endeavor to replicate an user interface
similar to elastic mapreduce.
It enables a user to create a work flow
Each work flow consists of datafile, mapper file, reducer file
The user can either use a default hadoop setting or a customized setting for the
job.
The system configures the hadoop environment, creates the appropriate jobs ,
runs the job and returns result back to the user.
Future enhancements:



Implement database access, since typically data will be stored in bigger
databases
Implement Hadoop streaming to support other programming languages
Implement Hive and Pig which are SQL like interfaces.
Email Trust Algorithm-:
Trust file of A
User Trust
A
B
C
D
0
5
4
99
Trust file of B
User Trust
A
B
C
D
5
0
4
7
Trust file of C
User Trust
A
B
C
D
8
2
0
3
Trust file of D
User Trust
A
B
C
D
6
8
1
0
Thus A gives a trust rating to D based on the formula (of its neighbors)-:
{Rating of A to B(5) * Rating of B to D (7) + Rating of A to C(4) * Rating of C to D (3)} /
{max(5,7)+max(4,3)} = {5*7 + 4*3}/{7+4} = 47/11 = 4
Problems running this job on hadoop:
1. Since the trust rating which user gives to unknown person is interdependent on
the other users trust rating of the unknown user, it is very difficult to parallelize
the job.
2. Map function given to individual nodes, hence cannot have shared variables.
3. Running on Virtual Machines makes job harder.
4. Very difficult to debug and find out where exactly the problem lies.
5. Transferring files from local file system to Hadoop DFS is itself very time
consuming and frequent connection loss occurs.