Download Understanding Online Social Network Usage from a Network

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Piggybacking (Internet access) wikipedia , lookup

TV Everywhere wikipedia , lookup

Deep packet inspection wikipedia , lookup

Net bias wikipedia , lookup

Remote Desktop Services wikipedia , lookup

VSide wikipedia , lookup

SIP extensions for the IP Multimedia Subsystem wikipedia , lookup

Last.fm wikipedia , lookup

Transcript
Understanding Online Social
Network Usage from a Network
Perspective
BY NIKOLAOS ZOURMPAKIS
| HY-558 | May 12, 2016
Summary
Online Social Networks (OSNs) are a vast part of the online community with more than half a
billion users. They form online communities among people with common interests, activities,
backgrounds and/or friendships. However, our understanding of which features attract a user to an
OSN, in the first place, is quite inadequate.
There is an ongoing interest not only from an academic perspective, but also from various entities
such as ISP providers who want to improve their connectivity, researchers and developers trying to
identify trends and possible future designs, as well as the OSNs themselves trying to improve and
scale up their systems.
This study is focused on analyzing the OSNs not from surveys or interviews (limited scope), but by
extracting data from clickstreams by passively monitoring network traffic from four different OSNs
and reverse engineering them into user interactions. Some of the primary goals is to gather details
about feature popularity, session characteristics, impact on the network, user behavior and the
dynamics within OSN sessions.
In order to an OSN session some basic terms have to be established:



The time between login and log out is an authenticated OSN session, while the time
before and after is an offline OSN session.
The overall time from a logout to another logout is an OSN subsession.
Most actions are as request-response pairs (rr-pairs), separated into “active” and “inactive”
in regards to if they were triggered by a user’s click or triggered by the interface itself.
Indirect requests associated with an active request are considered active.
The collection of data was achieved with the enlistment of two different ISP providers from
different geographical regions. Through an HTTP analyzer framework, we extracted rr-pairs trying
to focus on those concerning only the OSN sessions. The grouping was achieved through the
provided OSN cookies. The rr-pairs were separated into actives and indirect using the features of
an OSN as a related category. To validate the results, manual traces were recorded and compared.
Considering the complexity of OSNs the analysis software had to be easily customizable and
highly flexible. Following we dissect our methodology into a number of parts:
OSN session handling: Tracking an OSN user is mostly achieved from the OSN cookie/cookies (not
standard) produced by each of them. It is through them that we can distinguish the authenticated
from the offline periods in a session. Identifying a login/logout is possible by looking for the
appropriate URI. If HTTPS is involved, we augment our HTTP traces with flow traces trying to get
an indication of a possible login/logout. In the cases where a session had started or ended after our
tracing period we either search for the cookie or presume that the last observed request is the end
for the session.
Rr-pairs classification: Inspecting rr-pairs either by URI or by the HTTP referrer header is a good
way to disseminate them and build suitable patterns (each OSN is different in specifics). If a cookie
is missing an rr-pair can be classified along with the last known rr-pair, else it is UNKNOWN.
Misclassification though possible is highly doubtful. Lastly, an rr-pair should be determined if it is
active or indirect.
PAGE 1
Customization: This concerns the manual traces developed for narrowing the trace collection to the
relevant subset of traffic. Through it we were able to identify the various cookies and corresponding
handshakes in the case of HTTPS, and construct the patterns necessary for tracing active rr-pairs
from user actions.
Validation: The logical step was to use our methodology on the manual traces in order to
determine its correct functionality. The end results were as expected with correct assignment to the
appropriate categories (whether for active or indirect requests) with almost none guessed or
assigned as UNKNOWN.
The lessons learned from the above points, where that our approach is viable in a number of OSNs
with the only bottleneck in adjusting the manual traces to the special characteristics of an OSN.
Even in major reorganization (of the OSN) the existing pattern only change by a small amount.
However, the same principles cannot be applied to other WEB 2.0 sites since the advancement of
HTTPS for security reasons, prohibits us from acquiring such network traces.
In regards to the features popularity it differs by location and service and we have to analyze them
from different perspectives:
All OSN requests: There are drastic differences from one category to then next, showing how
difficult it is to approximate actual usage patterns from other techniques like crawlers. The impact
is more apparent when considering the byte distribution between them. The rr-pairs correlating to
photos are in general tiny in number, but when considering the case of uploading high quality pics,
the bandwidth demand is apparent.
Differences across time and between users: The general behavior is as expected with minus
difference here and there. Regarding the users, results point out that they do favor specific features
across OSNs but also consistently user some others same on across all ISPs and OSNs (the
variations in usage of the profile category between all the OSNs and the subsessions within, paint a
homogenous distribution).
When trying to compare the general traffic characteristics of an OSN session to other Web
services in terms of size and duration there are a couple of things that stand out. For the size there
is a heavy tail distribution implying that a small fraction of the OSN sessions are responsible for
most of the bytes on the network. As for the duration, the data confirms that most users do spend
most of their session authenticated (regarding the general amount of time spent in the OSN). One
thing that was pretty common was the repetition of multiple session per IP, a result probably due
to multiple computer per DSL line or multiple users per computer.
Lastly we turn our focus on how the users behave within a session from two standing points:
1. Active vs Inactive time: A user is considered inactive after a set amount of time
(example 5 minutes). This also depends from the total duration of the session, with
shorter ones usually being considered as active in comparison to longer ones who end
up as inactive. The most interesting point is what happens after the inactive periods
where a pattern of preference for specific categories emerges (messaging after 5
minutes, home and offline for 10 minutes or more).
PAGE 2
2.
Feature sequence: There is tendency for specific feature sequences among user
clickstreams with home following messaging and profile being the most favorable one.
Still, the most dominant pattern is for the user to continue using a feature for a
prolonged amount of time before switching to another. The most time consuming
category is, as expected, messaging due to the time it takes to compose a message.
All in all, we presented a customizable methodology for identifying OSN sessions and user
actions, successfully identifying the features that are important to most of the users and pointed
out the differences from other web services.
PAGE 3