Download Genres of Spam: Expectations and Deceptions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Advertising campaign wikipedia , lookup

Advertising wikipedia , lookup

Email wikipedia , lookup

Sustainability advertising wikipedia , lookup

International advertising wikipedia , lookup

Tribe (Internet) wikipedia , lookup

Transcript
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
Genres of Spam: Expectations and Deceptions
Wendy L. Cukier, PhD
Ryerson University
[email protected]
Susan Cody, PhD
Ryerson University
[email protected]
Eva J. Nesselroth, MA Comm
Ryerson University
[email protected]
Abstract
This paper is a pilot study that explores how the
concept of genre can be applied to the massive set of
digital documents known as ‘spam’. The authors studied
300 spam messages collected over 15 weeks from a
university email system. Messages were coded based on
content, form and specific features as well as on the
manifest
relationship
to
existing
genres
of
communication.
The paper argues that spam is not a single genre but
many genres. For the most part, the genres evoked in
spam are adaptations of print to Internet, including
information artifacts, pamphlets, business cards, order
forms, bulletins, advertisements, and “Nigerian letters”.
With spam, however, the concept of genre operates at
several levels. Often, there is a contradiction between
the manifest genre and the underlying purposes. The
paper concludes that spam exploits genre by conforming
to known forms while at the same time breaching those
norms.
1.1 Introduction
Why examine spam? Spam is one of the principal
forms of communication on the Internet today. The rapid
growth of the Internet has promulgated a quick and
adaptive evolution in new forms of spam that take
advantage of the speed, breadth, and accessibility of the
medium. While much attention has been paid to the
prevention of spam, little analysis has been conducted on
the content or purposes of spam.
This paper applies the concept of genre to the analysis
of spam and identifies both the applicability and
limitations of current genre analysis to this electronic
communication category. Throughout this study, we ask
key questions regarding spam and genre. For example,
what is the importance of studying spam? How have
researchers examined spam? What characteristics of
spam have allowed it to elude more inquisitive study?
What contribution can a change of focus make to the
understanding of issues in genre theory and its
application to digital documents? How can this focus
help researchers map the evolution of genres within
digital documents?
Spam is typically treated as a single document set.
This is because it has been viewed primarily from the
perspective of the typical receiver in most deliberative
instances as a “constrained organizational actor” [27].
By definition, genre presupposes or depends upon
contextual or situational coherence, a community, or an
organization [49]. As Orlikowski, Yates, and Yoshioka
indicate, genres of communication are “socially
recognized types of communicative actions — such as
memos, meetings…that are habitually enacted by
members of the community to realize particular social
purposes” [49]. Genre analysis, according to these
authors, is a useful way to examine “how a community
communicates” [49]. Genres may also extend to the
formation of social relationships which can have a
significant effect on the quality of work [7].
Toms and Campbell argue: “Recognizing genre will
facilitate effective user-document interaction” and so a
particular “genre can be seen as an interface metaphor”
[60]. However, the overall character of spam works
against genre recognition. Individual spam genres
employ misdirection or disguise in purpose and/or
function in order to (1) bypass filters; (2) invite the
receiver’s attention; or (3), in the cases of some
advertisement types, to disarm the subject and overcome
resistance to selecting the product/service.
0-7695-2507-5/06/$20.00 (C) 2006 IEEE
1
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
Investigations have dealt with spam in terms of sector
representation, ratios and costs to organizations and,
more recently, researchers have taxonomized types of
spam that present threats to systems. The reasons for
these research choices are measurably evident.
According to MessageLabs, a US consultancy firm, spam
now accounts for around 65% of all email traffic [39].
For the European Commission (EC), the cost of spam to
Internet users worldwide amounts to some 10 billion
Euros per year [39]. A recent study estimated that loss
of productivity due to spam messages represents an
annual cost of about $1930 per employee [39]. In
another study conducted by the Radicati Group,
researchers found that 4% of corporate email users and
11% of consumer users had lost money to an email scam
[53]. As many as 39% had clicked on a link in a spam
message, and 13% of corporate users and 11% of
consumers had purchased a product advertised via spam
[53]. And because of the low cost of producing and
sending spam, responses from as few as .00001 per cent
of targets make this enterprise profitable [43].
Our study shows that genre analysis needs to
recognize that one genre can mask another and that it
may be necessary to penetrate the multiple layers of a
typical spam message to uncover the message’s true
intent. Thus, genre becomes a tool for deception.
Fraudulent emails arrive disguised as ordinary
communications which, at first glance, take the shape of
a recognizable form. For example, a spam “memo” may
evoke the expectations of “memo” in order to catch the
reader’s attention and clinch confidence. A memo is a
recognizable genre and common format of email
messages. The memo may in fact reveal itself to be an
advertisement for a product when further analysis of its
structural content is made. Yet, at the same time, the
memo-advertisement may in fact not be an advertisement
at all. Its real purpose may be to harvest email
addresses, ‘phish’ for consumer information, distribute a
virus, or perpetrate fraud.
2. Literature Review
Because this paper brings together several topics, we
have reviewed the literature on spam, genre, advertising
and the Internet to provide the relevant context for our
analysis.
2.1. Spam
Zeltsin defines spam as including all electronic
messages that are unsolicited or unwanted, sent to a large
number of users (in bulk), without regard to the identity
of the individual user, and usually having commercial
purposes [65]. These can also include viruses that spread
via email, or fraud and scam mechanisms. Early 2002
estimations showed that one out of twelve email
messages fits the description of spam. During 2002,
spam numbers escalated to an average frequency of one
out of three email messages. In 2003, Ferris Research
estimated that an Internet user takes an average of 4.4
seconds to handle a spam message and that
approximately 20 billion such spams are sent each day.
Cumulatively, handling spam approaches 25 million
hours per day [20].
Participants of the 2003 World Summit on the
Information Society (WSIS) recognized that spam is a
“significant and growing problem for users, networks
and the Internet as a whole” [61]. To date, most research
on spam has examined its negative impacts, technical
characteristics, regulatory issues and the technologies to
prevent
it
from
overwhelming
“legitimate”
communications. Many scholars are concentrating on
developing a web spam taxonomy in an effort to identify
instances of spam, to prevent spam and to counterbalance
the effect of spamming [12, 17, 26, 31, 51].
The literature explores emerging forms of spamming
and the ways in which spam is used to collect email
addresses, distribute viruses and perpetrate deceptive
marketing practices or fraud. For example, spam
masquerading as advertising or gibberish is often
designed to circumvent filters with the sole purpose of
“harvesting” email addresses. In “brute force” and
“dictionary” attacks, spam programs send spam to every
possible combination of letters at a domain, or to
common names and words [9].
Spammers continually discover new techniques to
send spam. Spam may mix content and use orthographic
inventions (e.g., ‘sec’s in lieu of ‘sex’) and gibberish
(e.g., Subject: “Pittsburgh pullover diorite chimera
bray”) to avoid lexical detection by filters. Viruses,
worms, and malware, such as Melissa, Love Bug and
MyDoom, also use spamming techniques to propagate
after a user unwittingly activates them. Viruses and
worms may install open proxies that can be used to relay
spam or to install software which transforms a computer
into a “zombie” (i.e., a computer owned by an
unsuspecting user, through which spam is sent).
Phishing attacks steal consumers’ personal identity
data and financial account credentials.
Socialengineering schemes use ‘spoofed’ emails to lead
consumers to counterfeit websites designed to trick
recipients into divulging financial data such as credit
card numbers, account usernames, passwords and social
security numbers. Hijacking brand names of banks, eretailers and credit card companies, phishers often
convince recipients to respond. Technical subterfuge
schemes plant crimeware onto PCs to steal credentials
directly, often using Trojan keylogger spyware.
Pharming crimeware misdirects users to fraudulent sites
or proxy servers, typically through DNS hijacking or
poisoning [4]. Phishing and scams are distributed as
2
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
spam, directly leading to identity theft and fraud.
Phishing spam increased 52 per cent in January, 2004.
The statistics show that the response rate to this type of
fraud is around five per cent [65].
While there is a growing body of research on online
advertising, surprisingly little attention has focused on
examining the types or functions of spam. Most of the
research to date on spam has been motivated primarily
by an interest in technological and regulatory issues. An
early study published in the Association for Computing
Machinery (ACM) classified 400 unique messages sent
to the AT&T and Lucent subdomains under study for
three months in 1997 [3]. The leading categories were
money-making opportunities (36%); and adult
entertainment including singles services, and sexually
oriented products or services (11%) [3]. Regulatory
issues were the main focus of the article.
More recently, the Federal Trade Commission (FTC)
analyzed a sample of 1000 items from 11 million
collected messages and again, investment/business
opportunities (20%), adult-oriented spam (18%), and
finance (17%) were the most common categories [19]
(See Table 1). For example, the FTC study noted, in
spite of the Direct Marketing Association guidelines
which stipulate that messages must provide instructions
for removal, only 36% of messages in the sample
provided these [59]. Further, comments from email
administrators suggest that many of these instructions
were likely faulty or deliberately misleading [19]. The
study also notes that fewer than 10% of the sample
identified the name, postal address, phone number, and
email address of the sender [19]. Similarly, Jacobsson
and Carlsson’s experiment with false email accounts
corroborates the failure of most spam emails to conform
to regulatory requirements including identifying the
sender and providing options to unsubscribe [32].
Again, the focus of these studies was the regulatory
compliance of mass mailings. Despite the preponderance
of these messages, research has been almost exclusively
forensic. Spam content, style, and genre have been
largely overlooked and few researchers have gone
beyond broadly categorizing the types of spam messages.
One exception is Orasan and Krishnamurthy’s
investigation of the linguistic characteristics of junk
email based on an analysis of 673 files, comparing them
to a corpus of leaflets extracted from the BNC [44].
They noted a number of linguistic differences including
shorter sentences, limited vocabulary and increased use
of personal pronouns such as “you” in spam [44]. The
researchers examined the occurrence of key words such
as: free, money, investment, credit, fast, Internet, email,
sex, weight and miracle [44]. However, the article is
primarily descriptive, drawing few implications.
2.2 Genre
2.2.1. Genre History. Northrop Frye, the Canadian
literary scholar, is probably the most cited source on
genre. In the book, Anatomy of Criticism, Frye proposes
that virtually all literature could be categorized according
to universal genres with defined structure, rules and
characteristics [25]. Genres are the literary conventions
or “codes” associated with particular forms — for
example, epic, tragedy, and allegory. Miller defines
genre as “typified rhetorical action based in recurrent
situations” [41]. Swales notes that genres have similar
structures, stylistic features, content and intended
audiences [58]. However, as Chandler notes in “An
Introduction to Genre Theory”, hierarchical taxonomy of
genres is not a neutral or “objective procedure” [10]. “A
genre is ultimately an abstract conception rather than
something that exists empirically in the world”, notes
Feuer [21]. Thus, one theorist’s genre may be another’s
sub-genre or even super-genre; and what is technique,
style, mode, formula or thematic grouping to one, may
be treated as a genre by another. Miller suggests that
“the number of genres in any society…depends on the
complexity and diversity of the society” [41].
Some have focused not only on exploring the
conventions of the form, but also on the “interpretive and
cultural-historical aspects of compound mediation that
are so important in understanding the use of documents”
[56]. In other words, in addition to considering the
questions ‘What is the purpose of this genre?’ and ‘What
material goes into one?’ the academic approach to genre
also explores the social and political dimensions of the
context from which the genre emerged [1, 7].
2.2.2. Genre and Advertisements. The study of
advertising and consumer behavior has a long history
among both marketing and communications scholars. In
1987, Holbrock and Batra explored the effects that
certain types of ads could produce on specific groups of
people [30]. Further analysis of advertisement content
and structure has likewise been studied by Mick and
Mitchell and Olson [40, 42]. Laskey, Day and Crask
argue that advertisements should be categorized not only
by the message content (i.e., what is said) but also by
structured format (i.e., how it is said) [36]. The
message’s construction is an important feature, as
articulated by Eldridge and by Wells, Burnett and
Moriarty [18, 63].
Scholars of advertising have
employed genre, narrative and other concepts as analytic
tools. Stern, for example, explored the use of classical
allegory
in
contemporary
advertising
[57].
Communications scholars have used genre to explore
various forms of media, for example, when examining
the advertising aimed at children [28]. Puto and Wells
distinguished information and transformation advertising
3
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
and Holbrook and Hirschheim examined the role of
fantasies, feelings and fun in advertising [52, 31].
2.2.3. Genre and Electronic Communication. Genre
and genre repertoire have been proposed as analytic tools
for investigating the structuring of communicative
practices within a community. For example, Orlikowski
and Yates propose that genres of organizational
communications — memos, meetings, expense forms,
training seminars — are habitually enacted by members
of a community for particular social purposes [46].
Subsequently, they examined the communication
exchanged by a group of distributed knowledge workers
in a multiyear, inter-organizational project and suggest
that the group’s communicative practices evolved in
response to community norms, project events, time
pressure and media capabilities [45]. Building on
structuration theory, Orlikowski and Yates describe
iterative relationships between communications genres
and organizational practices [45]. This approach has
subsequently been applied in other contexts; for
example, Crowston and Williams initially analyzed 100
websites as a pilot project, and subsequently extended
their analysis to 1000 websites [13]. They conclude that
genres provide a useful tool for analyzing uses of the
Web [13]. In Crowston’s subsequent work, he notes the
limitations of top-down genre analysis using pre-existing
categories and explains that bottom-up analysis allows
for multi-dimensional definitions of genre as they
become apparent, thus providing more flexibility in the
face of limited forms [12]. Others have examined
personal home pages as an emerging genre. Structural
features included the presence of personal information,
formulaic welcome messages and iconographic technical
features. Davidson explores genres in the context of
medical information systems [15].
Herring et al.
examine Weblogs in the context of traditional and new
media [29]. Atunes and Costa explore genres of
electronic meeting communications [6]. Akesson et. al
explore the genre of on-line newspapers examining
content, form and functionality [3]. Genre hierarchies,
embedded genre and genre systems, genre repertoires,
and genre change are discussed by Crowston and
Williams [13].
2.3. Advertising and the Internet
Scholars have only recently begun to examine issues
related to advertising on the Internet. For example,
Palmer conducted a genre-based analysis of target
advertisements or “netvertising” focusing on linguistic
characteristics [50]. Moreover, Shamdasani, et. al look
at the ways in which website reputation and the
relevance between the website content and banner ad
product category match [54]. They differentiate between
high-involvement products which are relevance-driven
and low-involvement products which are reputationdriven. In addition, Kunz and Osborne analyze new
formats of advertising on the Internet with a focus on
banner and streaming media ads and their impact [35].
McMillan explores the array of forms and ways in which
online
advertising
differs
from
conventional
advertisements, particularly in terms of the compression
of the selling cycle, interactivity, intrusiveness, and the
capacity for personalization [38]. She builds a typology
of Internet advertising based on function such as brand
building messages, corporate communications, direct
response messages, and electronic transactions. Her twoby-two matrix differentiates location (external/internal)
and purpose (communication/call to action) [38].
3. Our Study
Our sample comprises 300 spam items received in a
single organizational account over a limited period of
time. This set evaded the organization’s spam filters.
We analyzed the set using a series of levels, including
sector, heading, relationship between heading and
content, rhetorical purpose, content, structure, tone and
other features and forms characteristic of genre analysis.
We compare our results to previous studies on electronic
media and genre.
3. 1. Purpose
This project represents the first phase of a longitudinal
study of spam in which we explore the potential
application of genre analysis. In this paper, we attempt
to situate spam in the context of traditional as well as
emerging electronic communication genres. We seek to
characterize the properties of spam based on a
temporally limited sample. Our principal purpose is to
provide an empirical snapshot of spam with the intention
of examining its uses of genre. This is intended to
represent the first part of a larger longitudinal study
which will analyze the evolution, adaptation and uses of
genre.
3.2. Data Collection
Spam messages that passed through the filter at a
large Canadian university were collected over a 15 week
period (February 21, 2005 – June 12, 2005). A total of
300 messages were collected for analysis. These were
messages that had passed through the spam filter process
and represent a tiny fraction of the spam received by the
university.
The university’s email server receives approximately
200,000 email messages each day. Of these, two thirds
are typically blocked by the server using Postfix and
another 10% are flagged as spam. The university uses a
4
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
multi-layered approach to the spam problem. First,
black-lists of known sources of spam or viruses are used
to block messages from high risk sites. Next, Postfix, the
mail transfer application, applies a basic filter and in
cases of viruses or spam attacks can be used to delete
messages of a particular type. Third, a virus check is
performed by the server. At the fourth level of filtration,
SpamAssassin, a Perl based software, will perform as a
spam filter. If an email fails some of these tests, it may
be sent to the recipient’s inbox with **SPAM** noted in
the subject line. However, only one quarter of the spam
used in this study was flagged with this subject header.
Also, the occurrence of flagged words (e.g., ‘penis’) will
lessen an email’s chances of reaching the inbox.
Recognizing the range of factors that affect the
amount and type of spam an individual may receive
(Internet use habits, newsgroups, technical characteristics
of spam filters, etc.), we do not propose that this set of
messages is a representative sample of all possible
variations of spam sent to a broad range of recipients.
However, the researchers find that the overall
composition of the spam in the sample roughly reflects
the composition of spam recorded in much larger studies
[3]. Subsequent work will explore the differences
between the messages tracked by the filter and those
which were not.
3.3. Data Analysis
The methodology used combined both quantitative
and qualitative content analysis. Different approaches to
“reading” text are not mutually exclusive and applying
multiple perspectives to text may address the limitations
of individual techniques in isolation. While critical
theorists have tended to reject quantitative strategies for
determining the content or meaning of media messages
(given the importance of considering both the manifest
and latent meanings), even Kracauer granted that
quantitative studies might serve as a supplement to
qualitative analysis [34]. The sheer volume of mass
media texts poses problems in terms of heterogeneity as
well as quantity.
As a starting point and for the purpose of coding, we
used the predefined categories of spam from the FTC
study [19] combined with Crowston and Kwasnik’s
notions of facetted genre classification [12]. The coders
created new categories where none existed and for
hybrids (that appeared to be combinations of categories).
These messages were coded twice by three coders for
sector or subject, source information, function, “genre”,
format, subject line/body relationships, structure,
addressee, signer, action desired, tone and regulatory
compliance. The coding results were cross-checked for
inter-rater reliability. Subsets representing each sector
and genre were further analyzed using qualitative
discourse analysis to explore recurrent themes and
connotative use of language. Coding for sector or
subject drew on previous studies [12] with space
provided for ‘other’.
3.4. Findings
3.4.1 Categories of spam (sector or subject) As noted,
the most common categories of spam in our study
paralleled the categories in the FTC study. These are
listed in Table 1 below.
Financial and business
opportunities were the most common categories, defined
by sector or subject, followed by adult-oriented products
and services.
Table 1: Common categories of spam
(defined by sector or subject)
Category
Financial
Business
Opportunity
AdultOriented
Health
(including
prescriptions)
Computer
hard/software
Sales
Other
SubCategory
Mortgage
Loans
Other
Adultoriented
Male
enhancement
Unknown
Recruiting
Gaming
Misc.
News/sports
Politics
Our Study (2005)
Total %
per cat.
FTC
(2003)
N=
1000
26.7%
20%
22.7%
18%
21.7%
21.7%
10%
45
15.0%
15%
7%
7
23
5
5
4
3
1
2.3%
7.7%
1.7%
1.7%
1.3%
1.0%
0.3%
2.3%
16%
13.7%
10%
N=
300
26
4
23
%
8.7%
1.3%
7.7%
24
8.0%
35
11.7%
30
10.0%
65
* Male enhancement advertisements could be classified as
adult-oriented or health. We have combined them with adultoriented for the purposes of comparison.
There were differences between the composition of
our sample and the much larger random sample used by
the FTC.
For example, our sample contained
considerably fewer product and service advertisements
and more health-related spam. This might have been a
reflection of individual Internet behavior (i.e., visiting
websites that create a ‘cookie’ which will attract specific
kinds of spam) as well as the filter used by the
organization.
3.4.2. Manifest “Genre” While we initially thought that
we would be able to define the spam genre through
content and discourse analysis, it became apparent that
spam is not a single genre but a collection of genres.
5
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
Among the spam, we found messages that resembled a
wide range of well-established document genres (see
Table 2).
The most common genre of spam, at least on the
surface, is a personalized memo which includes a
description of a product or service with an embedded
URL for more information. Almost sixty percent
(59.7%) of the spam was in this form.
The next most common form of spam appeared to be
a letter, which might describe a service, but more often
appear to be a scam to obtain personal information. For
example, we found nine examples of spam versions of
“Nigerian Letters”.
Spam also took the form of “confirmations” of orders
or preapproved applications modeled on standard invoice
and purchase order forms. For example, “You have been
preapproved – your new application number 34”; “Your
new application number 56”; or “PGF ALERT: Purchase
Order Created for You”, again invoking a wellestablished business communications form.
Testimonials, a common form of advertising and
direct mail, were used in 2% of the cases (e.g., “I have
always worried about the size of my penis…”). Only 2%
of the spam analyzed resembled conventional
promotional pamphlets and only 2.7% resembled
conventional display advertisements.
News bulletins (2.7%) such as announcements on
stock prices as well as warnings and announcements
(1.7%) such as “Remove all these popup messages
today!” and “Microsoft virus warning – September 8th”
were found. Newsletters (2%) and catalogs (1%) also
appeared. There were contest winners (e.g., “Winner –
winning notification”) reflecting a common form of
direct mail.
Table 2: Manifest “genres” of spam
Types of Genre
Memo (with URL/link)
Letter
Order confirmation, preapproved, application no.
Gibberish
Display advertisement
News bulletin
Newsletter
Pamphlet (HTML)
Testimonial
Announcement/warning
Press release
Catalog
Contest winner
HTML code
No. of data
179
24
%
59.7%
8.0%
24
17
8
8
6
6
6
5
4
3
2
2
8.0%
5.7%
2.7%
2.7%
2.0%
2.0%
2.0%
1.7%
1.3%
1.0%
0.7%
0.7%
URL submission form
Article
Business card
Form
Order forms/price list
2
1
1
1
1
300
0.7%
0.3%
0.3%
0.3%
0.3%
100%
At the same time, we also found spam that was not
easily linked to existing genres for example, “gibberish”
spam (5.7%) which consisted of combinations of words
and/or letters which appeared to be random; for example,
“gibbon perplexbarrett inactive briberyknead…”. This
form of communication is the most apparent example of
a new genre — gibberish spam is principally designed to
circumvent filters to confirm email addresses or to
deliver viruses to unsuspecting recipients.
3.4.3. Regulatory Compliance. Previous studies of
spam, as noted above, revealed that a small percentage of
spam complies with regulatory requirements found in the
United States. In our study, we found that less than 1%
of messages informed recipients of their right to noncontact and just over one third (36.7%) included noncontact information, some of it very disguised. Sixtythree per cent of the messages we examined did not
comply with privacy requirements at all. Again this is, at
least in part, because many of the spam communications
were not what they initially appeared to be.
3.4.4. Unclear or Misleading Spam. Previous studies
have revealed that most spam messages are deceptive to
some degree. The FTC report on a sample of 1000
incidents of spam finds that 66% of messages contained
false “from” lines, “subject” lines, or message text [19].
We found that the subject line was indicative of the
contents of the message in more than one third (36.9%)
of the messages examined. These included subject lines
such as: “prescription drugs”, “replica rolex”, “job
offer”, “name brand software”. In another 30.6% of
cases, the subject line was loosely associated with the
content of the message without specifying up front what
was being sold. For example, the subject line “men’s
silent secret” preceded an email selling impotence drugs.
In 27.1% of the cases, the subject line was a teaser
apparently intended to arouse interest: “Good idea”,
“Fact or Fiction”, “Power, Possibilities, Opportunities”
are examples. In some cases (2.7%), the subject had no
relationship to the content (for example, promises of sex
were in the subject line while software was in the text) or
there was simply no subject line at all (2.7%).
There is other evidence of deceptive and misleading
spam. We were unable to accurately categorize the
genuine intent of all emails, but in 55% of the cases, the
provided links did not work when tested after the study
period, suggesting that they were in fact not what they
6
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
purported to be. Approximately 15% of the links within
the spam emails were live working links, and 30% of the
links redirect the user to another site.
Table 3: Different Subject Lines of Spam
Category of Subject lines
Subject line directly reflects content
Subject line associated with content
Subject line is a teaser
Subject line has no relation to
content
No subject line
%
36.9%
30.6%
27.1%
2.7%
2.7%
100%
3.4.5. Use of Language. One of the most intriguing
features of spam as advertisement (rather than spoof,
Trojan horse, virus, or phishing device), is its reliance on
extraordinary compression and exclusivity of text. Most
images are available only sequentially rather than
synchronously, accessed through the following steps of
text acceptance: 1) subject line, 2) message, and 3) link.
Spam advertisers use a multi-layered and gnomic textual
approach. They make mimetic use of email as a medium
of intimacy, interiorism, and breaking down of social
barriers
(public/private,
conscious/deliberate
vs.
unconscious/ impulsive domains, etc.). Spammers use
words that might be part of the realm of resistance to an
inquiry, objection, or serious consideration of a proposal
or purchase. They “neutralize” these words/concepts by
misappropriating or contextualizing them in the
“message” language. By classifying characteristics that
have so far escaped notice, researchers can contribute to
the understanding of genre and to the practice of genre
analysis.
Many spam messages use language as a technique
for thwarting spam filters. We found that 33% of
messages added language, quotations or “alphabet soup”.
For example, a spam entitled “Top meds bought online”,
with the text “Same medicine, different price!” and a
link, also included the following gnomic saying,
“television has brought murder back into the home —
back where it belongs.” A Google search on these
additional texts revealed a wide range of sources from
the Bible to quotations from Eleanor Roosevelt.
3.4.6. Evolving Genres. It is apparent that traditional
modes of correspondence have emerged in the electronic
medium of spam. While there is not enough space in this
paper to explore the use of language in more detail, it
must be noted that every genre of spam operates
according to a certain semiotic logic of recognizable
signs and indices. For example, the “Nigerian letter” is
an identifiable genre in print communications. In the
mid-1990s, there was an explosion of direct mail scams,
many originating or purporting to originate in Nigeria.
The proliferation of these letters was so great that The
United States Secret Service actually issued an “Advance
Fee Fraud Advisory” regarding Nigerian letters [61].
The Federal Trade Commission also identified the
“Nigerian letter” in its study of false claims in spam [19].
These letters are unusually longer than other forms of
spam, often exceeding two printed pages. The language
in this genre of spam is usually ornate and pleading,
playing upon the recipient’s altruism, conscience, or
even greed. Generally, these letters ask the reader to
send money or banking information in order to release
funds from the estate of a deceased wealthy diplomat.
Often, these letters prey upon the reader’s vulnerability
and pity. For instance, one letter asks the reader to
sympathize with the ‘obnoxious’ treatment of women in
her country (Sierra Leone), since the writer was not
permitted to inherit her late husband’s estate because she
had no male children. Only with the support of the
reader, to whom she promised a portion, could she access
the funds. Another letter from Sudan asks the reader for
support in the aftermath of the “Attack of tsunami”. This
genre, like many others, is a carry-over from traditional
postal letters, but email affords the senders a much
broader pool of recipients as well as greatly reduced cost.
In addition, we saw evidence that the electronic format
was able to quickly adapt to changing current events.
3.4.7. Multiple Layers of Genre. Typical genre
analysis may fail to adequately decode the intent of spam
messages because spam uses recognizable cultural
markers or indices to frame the genre in a way that fools
the reader. The memo format is used to imply that the
recipient has a business or personal relationship with the
sender. Informal subject lines such as “Hi there” and
“Haven’t heard from you in awhile” are used to imply
the messages are coming from friends, though they may
contain adult content or ads for software. Very few of
the sampled emails actually resemble advertisements.
Therefore, the form of one genre is used in place of
another as a mask to fool the recipient.
However, the deceptive use of manifest genre goes far
beyond the masquerading of advertisements as memos,
letters, confirmation forms etc. In many cases, the memo
(which is actually an advertisement) is not really
intended to sell anything, but rather is meant to verify an
email address, collect personal financial information or
distribute a virus. Hence, there are multiple layers of
genre, used to mask one deception over yet another. Our
study provides further evidence that Crowston and
Kwasnik’s [12] notions of the limitations of top-down
analysis of spam using pre-existing categories is
restrictive and provides further support for bottom-up
analysis to account for emerging forms.
7
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
4. Conclusions and Implications
What makes the electronic spam message resistant to
current genre analysis is that its “communities” are
multiple, multi-layered, sometimes overt and covert, and
sometimes private and public.
As such, a bulk
advertisement emailed indiscriminately is in some ways
similar to print “junk” or direct mail.
Because spam has been defined to include anything
from viruses to joke forwards from friends, seeking a
rigid definition of it — or, as Zeltsin suggests, of
“unsolicited, or of “bulk” — works against an open
inquiry and openness to “the evolution of the
phenomenon.”[65]. Some bona fide new forms emerge
with hidden purposes. For example, “ruse” spam uses
lures, misdirection, or a string of nonsense to procure
actual email addresses, sorted out behind the ostensible
screen.
Our analysis, while preliminary, suggests that spam
covers a range of genres, serving a wide range of
purposes. While some characteristics are common to
many types of spam, there is more evidence to suggest
that spam memos, advertisements, letters, and contest
announcements are significantly different forms, even
though in many cases they may serve similar purposes.
Moreover, in the case of spam, “genre” is not fixed. One
person’s unwelcome spam may be another person’s
welcomed opportunity. To one “organizational actor”
[27] an apparent spam “theme” may be oppressive and
intrusive, suggesting that the organization is heedless of
members’ environmental “hygiene”. To another, the
recurrence of a theme may conjure a community of the
like-minded, a sub-culture.
Spam genres are actually hybrids. For the most part,
the messages resemble traditional genres in their
manifest form in order to increase the likelihood of
eliciting certain behaviors; however, the actual purposes
of the spam are often radically different from what they
seem to be. While spam clearly embraces a range of
genres, these operate on a variety of levels. Further work
focusing on analyzing the function, content and style of
spam may be valuable to understanding its position in the
“ecology” of genres of electronic communications.
5. References
[1] Agre, P.E., “Designing Genres for New Media: Social,
Economic, and Political Contexts”. 1997. From:
http://dlis.gseis.ucla.edu/people/pagre/genre.htm
[2] Åkesson, M., Ihlström, C. and Svensson, J., “Genre
Structured Design Patterns – The Case of Online Newspapers”,
Universiteit van Halmstad, 2003. From:
http://w3.msi.vxu.se/users/per/IRIS27/iris27-1106.pdf
[3] Anon., “What does spam advertise?” Association for
Computing Machinery.
Communications of the ACM.
Vol.41, Iss. 8, New York, Aug 1998, 80.
[4] Anti-Phishing Working Group, 2005.
From:
http://www.antiphishing.org
[5] Askehave, I. and Nielsen, A. E. “Digital genres: a
challenge to traditional genre theory”, Information Technology
& People , Vol. 18 No. 2, 2005, 120-141.
[6] Atunes, P. and Costa, C. J., “From Genre Analysis to the
Design of Meetingware”, Proceedings of the International
ACM SIGGROUP Conference on Supporting Group Work,
2003, 302.
[7] Bergquist, M. and Ljungberg, J., “Genres in Action:
Negotiating Genres in Practice”. Proceedings from The 32nd
Hawaii international Conference on System Sciences-Volume
2, Hawaii, 1999.
[8] Carliner, S. and Bosworth, T., “Genre: A Useful Construct
for Researching Online Communication for the Workplace”,
Information Design Journal, Vol. 12, Iss. 2, John Benjamins
Publishing, 2004, 124.
[9] Center for Democracy & Technology, “Why am I Getting
All this Spam? Unsolicited Commercial E-mail Research Six
Month Report”, March, 2003. From:
http://www.cdt.org/speech/spam/030319spamreport.shtml
[10] Chandler, D., “An Introduction to Genre Theory”. From:
www.aber.ac.uk/media/Documents/intgenre. Aug. 11, 1997.
[11] Cranor, L.F.
and La Macchia, B.A., “Spam!”,
Communications of the ACM, Vol. 41, Iss. 8, August, 1998,
74. From: http://lorrie.cranor.org/pubs/spam/spam/htm
[12] Crowston, K. and Kwasnik, B.H., “A Framework for
Creating a Facetted Classification for Genres: Addressing
Issues of Multidimensionality”. Proceedings from The 37th
Hawaii International Conference on Systems Science, Hawaii,
January, 2004.
[13] Crowston, K. and Williams, M., “Reproduced and
Emergent Genres of Communication on the World Wide Web”,
The Information Society, Vol. 16, Iss. 20, 2000, 1.
[14] Crowston, K. and Williams, M., “The Effects of Linking
on Genres of Web Documents”, Presented at The Hawaii
International Conference on Systems Science, Hawaii, January
1999.
[15] Davidson, E. J., “Analyzing Genre of Organizational
Communication in Clinical Information Systems”, Information
Technology & People, Vol. 13, Iss. 3, West Linn, 2000,
196.
[16] Doring, N., “Personal Home Pages on the Web: A Review
of Research”, Journal of Computer Mediated Communication,
Vol 7, Iss. 3, 2002.
[17] Drucker, H., Wu, D., and Vapnik, N., “Support vector
machines for spam categorization”, IEEE Trans. Neural
Networks, Vol. 10, 1999, 1048.
[18] Eldridge, C., “The Role of Advertising” in Advertising’s
Role in Society, Wright, J.S., and Mertes, J.F. (Eds.), West
Publishing Co., St. Paul, 1974.
[19] Federal Trade Commission, “False Claims in Spam, A
Report by the FTC’s Division of Marketing Practices”, April
30, 2003. From:
http://www.ftc.gov/reports/spam/030429spamreport.pdf
[20] Ferris Research, http://www.ferris.com/
[21] Feuer, J., “Genre Study”, in Channels of Discourse,
Reassembled: Television and Contemporary Criticism, 2nd ed,
pg. 138, Chapel Hill, University of North Carolina Press, 1992.
[22] Firth, D. and Lawrence, C., “State of Research Review:
Genre Analysis in Information Systems Research”, Journal of
8
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
Information Technology Theory and Application (JITTA).
forthcoming
[23] Firth, K., Shaw, P. and Cheng, H., “The Construction of
Beauty: A Cross-Cultural Analysis of Women’s Magazine
Advertising”, Journal of Communication, Vol. 55, Iss. 1, New
York, Mar 2005, 56,.
[24] Freedman. A. & Medway. P. “Locating genre studies:
antecedents and prospects”, in Genre and the new rhetoric,
Freedman. A. & Medway. P. (Eds.), Taylor and Francis,
London, 1994.
[25] Frye, N., The Anatomy of Criticism, Princeton University
Press, Princeton, New Jersey, 1957.
[26] Gyongyi, Z., and Garcia-Molina, H., “Web Spam
Taxonomy”, Technical Report TR 2004-25, Stanford
University, 2004.
[27] Hasselbladh, H., and Kallinikos, J., “The Project of
Rationalization:
A
Critique
and
Re-appraisal
of
Neoinstitutionalism in Organization Studies”, Organization
Studies, Vol. 21, Iss. 4, 2000, 697.
[28] Hawkins, R.P., Pingree, S., “Television and Behaviour:
Ten years of scientific progress and implications for the
eighties: Vol. 2,” National Institute of Mental Health, 1982,
224-247.
[29] Herring, S.C., Scheidt, L.A., Bonus, S. and Wright, E.
“Bridging the Gap: A Genre Analysis of Weblogs”,
Proceedings of The 37th Hawaii International Conference on
System Sciences, 2004.
[30] Holbrook, M.B. and Batra, R., “Assessing the Role of
Emotions as Mediators of Consumer Responses to
Advertising”, Journal of Consumer Research, December 14,
1987, 404.
[31] Holbrook, M.B. and Hirschman, E.C., “The Experiential
Aspects of Consumption: Consumer Fantasies, Feelings and
Fun”, Journal of Consumer Research, September 9 1982, 132.
[32] Jacobsson, A., and Carlsson, B., “Privacy and Unsolicited
Commercial E-mail”, Proceedings of the Seventh Nordic
Workshop on Secure IT Systems, Gjövik, Norway, 2003.
[33] Jung, J., “An Empirical Study of Spam Traffic and the use
of DNS Black Lists”, Proceedings of the 4th ACM SIGCOMM
conference on Internet measurement, 2004.
[34] Kracauer, S., “The challenge of qualitative content
analysis”, Public Opinion Quarterly, Vol. 16, 1952, 631.
[35] Kunz, M. B., and Osborne, P., “What Impact will New
Standards have on Internet Advertising?”, Proceedings of
Society for Marketing Advances, New Orleans, LA, 2001.
[36] Laskey, H.A., Day, E., and Crask, M.R., Social
Communication in Advertising: Persons, Products, and Images
of Well-Being, Methuen, Toronto, 1989.
[37] Martin, B.A.S., Van Durme, J., Raulas, M. And Merisavo,
M. “Email Advertising, Exploratory Insights from Finland”,
Journal of Advertising Research, Vol. 43, Iss.3, 2003, 293.
[38] McMillan, S., “Internet Advertising: One Face or Many?”,
in Internet Advertising: Theory and Research, Schumann, D.
and Thorson, E. (Eds.), 2nd ed., forthcoming.
[39] MessageLabs, 2005. From: http://www.messagelabs.com
[40] Mick, D.G., “Toward a Semiotic of Advertising Story
Grammars”, in Marketing Signs: New Directions in the Study
of Signs for Sale, Umiker-Sebeok, J., (ed.), Mouton de Gruyter,
Berlin,1987.
[41] Miller, C.R., “Genre as Social Action”, in Genre and the
new Rhetoric, Freedman and Medway (Eds.), Taylor & Francis,
London,1994.
[42] Mitchell, A.A. and Olsen, J.C., “Are Product Attribute
Beliefs the Only Mediator of Advertising Effects on Brand
Attitude?” Journal of Marketing Research, 18 August, 1981,
318-332.
[43] Orange Coast IBM PC User Group.
[44] Orasan, C. and Krishnamurthy, R., “A Corpus-based
investigation of junk emails”, in Proceedings of Language
Resources and Evaluation Conference (LERC-2002), Las
Palmas, Spain, 2002.
[45] Orlikowski, W.J. and Yates, J., “Genre Repertoire: The
Structuring of Communicative Practices in Organizations”,
Administrative Science Quarterly. Vol. 39, Iss. 4, Ithaca,
December 1994, 541.
[46] Orlikowski, W.J. and Yates, J., “Genres of Organizational
Communication a Structurational Approach to Studying
Communication and Media”, Academy of Management, The
Academy of Management Review. Vol. 17, Iss. 2, Briarcliff,
April 1992, 299.
[47] Orlikowski, W.J. and Yates, J. and Okamura, K.,
“Explicit and Implicit Structuring of Genres in Electronic
Communication: Reinforcement and Change of Social
Interaction”, Organization Science, Vol. 10, Iss. 1, Jan-Dec
1999, 83.
[48] Orlikowski, W.J. and Yates, J. and Okamura, K.,
“Constituting Genre Repertoires: Deliberate and Emergent
Patterns of Electronic Media Use”, Academy of Management
Journal, Best Paper Proceedings, Briarcliff Manor, 1995, 353.
[49] Orlikowski, W., Yates, J., and Yoshioka, T., Communitybased interpretive schemes: Exploring the use of cyber
meetings with a global organization, MIT, Cambridge, 2000.
[50] Palmer, J.C., "Netvertising and ESP: Genre-Based
Analysis of Target Advertisements and its Application in the
Business English classroom”, Iberica, Vol. 1, 1999, 39.
[51] Pelletier, L., Almhana, P., and Choulakian V., “Adaptive
Filtering of Spam”, Communication Networks and Services
Research, 2004.
[52] Puto, C. P. and Wells, W.D., “Informational and
Transformational Advertising: The Differential Effects of
Time”, Advances in Consumer Research, Vol. 11, Polvo, UT,
1984, 638.
[53] Radicati Group, Email Hygiene Survey Results From:
http://www.radicati.com/email-survey2005.shtml. 2005.
[54] Shamdasani, P., Stanaland, A., and Tan, J. “Location,
Location, Location: Insights for advertising placement on the
Web”, Journal of Advertising Research, Vol. 41, Iss. 4, 2001,
7.
[55] Spinuzzi, C., “Describing Assemblages: Genre Sets,
Systems, Repertoires, and Ecologies”, Computer Writing and
Research Lab, White paper #040505-2, Austin, Texas. May 5,
2004.
[56] Spinuzzi, C., Tracing Genres through Organizations: A
Sociocultural approach to Information Design, MIT Press,
Boston, 2003.
[57] Stern, B., “Other-speak: Classical Allegory and
Contemporary Advertising”, Journal of Advertising, Vol. 19,
Iss. 3, 1990, 14.
[58] Swales, J.M. Genre Analysis. English in Academic and
Research Settings, Cambridge University Press, Cambridge,
1990.
[59] The Direct Marketing Association. From:
http://www.the-dma.org/guidelines/onlineguidelines.shtml
9
Proceedings of the 39th Hawaii International Conference on System Sciences - 2006
[60] Toms, E.G. and Campbell, D.G., “Genre as Interface
Metaphor: Exploiting Form and Function in Digital
Environments”, Proceedings of The 32nd Hawaii International
Conference on System Sciences, January, Hawaii, 1999.
[61] United States Secret Service, “Public Awareness Advisory
Regarding ‘4-1-9’ or ‘Advance Fee Fraud’ Schemes”, From:
http://www.secretservice.gov/alert419.shtml
[62] Viser, V., “Thematics and Products in American Magazine
Advertising Containing Children, 1940-1950”, Communication
Quarterly, Vol. 47, Iss. 1, 1999, 118.
[63] Wells, W.D., Burnett, J. and Mortiary, S., Advertising
Principles and Practice, Prentice-Hall, Englewood Cliffs, NJ,
1989.
[64] World Summit on Information Society Declaration,
“Declaration of Principles: Building the Information Society: A
Global Challenge in the New Millennium”, World Summit on
the Information Society, December 12, 2003. From:
http://www.itu.int/wsis/docs/geneva/official/dop.html.
[65] Zeltsin, Z., “General Overview of Spam and Technical
Measures to Mitigate the Problem” ITU-T SG 17 Interim
Rapporteur Meeting November, 2004.
10