Download Trends in News Reporting

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Trends in News Reporting
Galileo Namata, Yee Lin Tan
{namatag, yeelin}@cs.umd.edu
February 27, 2007
INTRODUCTION
News sources, including newspapers, websites, blogs, and news shows, have limited
space and time to discuss daily events. Referred to as the “news hole,” news sources
must be selective about what to cover and in the case of international news, what
countries to cover. Consequently, there exists a discrepancy on how often different
countries are mentioned with respect to each other in news sources. This in turn affects
public opinion as well as trade, aid, and international intervention in the case of
conflicts1. In this paper, we expand on an earlier work by Ethan Zuckerman and explore
this discrepancy using the Hierarchical Clustering Explorer (HCE) tool2 on the Global
Attention Profile Dataset3. We focus on identifying clusters and trends in the news
reporting with respect to other data about each country (such as GDP, population, oil
production and consumption, and number of conflict casualties), as well as provide
comments about our experience with using the HCE tool.
DATASET
The Global Attention Profile Dataset (GAP) is an automatically generated compilation of
the number of times a country is mentioned in select news media outlets. The data is
presented daily in world maps colored by the number of times each country is mentioned,
as well as in a series of web pages listing the raw value of each count in an HTML table.
Although there are data from 2001 and 2003, the data is only available on a daily basis
from September 2004 onward. We focus on the data from 2005 and 2006 since we have
a full year’s worth of data for those years. There are 26 news sources mentioned in 2006
though a number of news sources have been added over time. In order to more easily use
the datasets, we converted the full collection of raw data into a MySQL database which
we can query for specific groups and data. We also wanted to analyze the data in terms
of other country statistics, such as Gross Domestic Product (GDP), population, oil
production and consumption, and conflict deaths. For this we acquired data from the
World Bank4, CIA World Fact Book5 and Ploughshares6.
1
http://h2odev.law.harvard.edu/ezuckerman/paper.pdf
http://www.cs.umd.edu/hcil/hce/
3
http://h2odev.law.harvard.edu/ezuckerman/
4
http://www.worldbank.org/data
5
https://www.cia.gov/cia/publications/factbook/
6
http://www.ploughshares.ca/imagesarticles/ARC06/map.pdf
2
1
ANALYSIS
News Source Correlation – US News, You’ve Seen One, You’ve Seen Them All
Figure 1. News Source Correlation ranking, 2005.
Figure 2. News Source Correlation ranking, 2006.
Two of the largest news sources in the United States, Fox News and ABC News, are
consistently among the most correlated news sources. Associated Press (AP) and
Reuters, the two syndicated news sources, also correlate highly in 2006. Overall, Figures
1 and 2 show that the news coverage of the selected news sources highly correlate with
each other. This is surprising given that the news sources come in many different forms,
including search engine results, blogs, and two foreign newspapers.
2
Figure 3. News Source Clustering by Country Coverage, 2006.
The dendrogram in Figure 3 further confirms that most news sources coverage are very
similar to each other. The three major news players - BBC, New York Times, and CNN are distinctly different in coverage compared to other news sources. Not surprisingly, the
three search engines - Google, Yahoo, and Alta Vista – also compose of a highly
correlated group that is different from the other news sources. This implies that
consumers of Fox or ABC News who are interested in more heterogeneous news
coverage may want to sample from different clusters of news media outlets such as the
BBC-New York Times-CNN cluster or the search engine cluster.
Country Attribute Correlation – Sorry, You’re Not Rich Enough to be Mentioned
Figure 4. Coverage Correlation to GDP, 2005.
The primary correlation with news coverage is financial, whether via GNI or the GDP of
nation. Population is not a factor for most news sources, with the exception of Xinhua
and Hindustan, whose primary audience, China and India, are two of the most populated
countries. Of all the news sources analyzed, BBC’s coverage is least correlated with
GDP, oil production and consumption. Instead, population appears to have a more
significant influence as far as BBC is concerned. This is consistent with Zuckerman’s
3
hypothesis that BBC has a tendency to give more attention to former British colonies,
many of which have large populations but small GDPs.
Country Correlation - Country Coverage Linked
Figure 5. Country Clustering by News Coverage for 2005 and 2006.
Correlations on countries taken over a two-year range reveal two notable types of
trending. The first type of trend is the pairing of two highly correlated countries which
either move in unison or remain static across the years. An example of this first trend
type is the pairing of the US and Iraq. The ongoing war with Iraq has tied the news
coverage of the US and Iraq for both years and identified both as significant news
contributors. The second type of trend is one where a single country spikes in coverage
for a specific time duration. An example of the second trend type is the UK coverage
between 2005 and 2006. The UK received much more coverage in 2005 likely due to the
London bombing. The same can be said about Lebanon whose coverage in 2006 is much
higher, potentially due to the conflict with Israel in 2006.
4
Conflict Coverage – The Deaths of the Poor Goes Unnoticed in Conflicts
Figure 6. Conflict Countries Clustering by News Coverage in 2005.
Figure 7. Comparison of News Coverage to Conflict Severity by Country, 2005.
From the example in the previous section about Lebanon and the UK, news coverage can
increase due to conflict. Ideally this means that despite the propensity toward covering
countries with high GDP in general, in cases where large losses in human life results
from violent conflict, international news will take note and adjust coverage accordingly.
Some of the high casualty countries did receive a lot of coverage, but that is not always
the case. Six out of eight countries with the highest number of deaths by conflict (over
100,000 deaths), did not receive significant coverage. Of these six, five are poor, lowvisibility African countries. On the contrary, six countries with potentially higherprofiled conflicts but overall fewer deaths received more attention. This shows that the
number of human deaths does not directly map to increased coverage, but other factors
related to the ability to create news sensation play a greater role.
5
HCE TOOL CRITIQUE
The Hierarchical Clustering Explorer is an excellent tool for the exploration of
multidimensional data. The capabilities it provides definitely has its place in data
analysis tools. The tool, however, is still lacking in a few aspects compared to more
developed tools like Spotfire.
First, HCE has a number of bugs that causes occasional crashes. Upon opening and
before loading any data, clicking through the various tabs in the lower panel results in a
crash. For example, repeated in and out clickings of the Scatterplot Ordering tab would
result in an application error, causing it to shut down. Program-terminating errors also
occur when attempting to perform ordering on very high dimensional data in the
Scatterplot Ordering tab. These errors can be addressed by checks on the data to make
sure large datasets do not cause an overflow, or in the least, result in only a warning and
not an application crash. In situations where an application error does occur, HCE should
ideally provide informative error messages so that users can take corrective measures.
A few additional features can greatly improve the tool. First, HCE does not remember
the option settings between analyses and does not provide a way to return to previous
views during the same analysis. Both of these functionalities would be very helpful when
users are exploring advanced features and may want to undo certain actions. Similarly, it
would be nice to be able to export all the visualizations provided by the tool, including
the Color Mosaic, Histogram Ordering, etc., to image files for use in presentations. The
tool currently does not provide much support for filtering of data. When importing large
datasets, there may be outliers or noise causing a number of the displays, such as the
scatterplots, to be skewed. The ability to remove those data points, instead of having to
process the data outside and then reload it, would be beneficial.
With regard to the display, the dendrogram is unreadable for very similar items. The
ability to zoom into or resize the branches of the dendrogram would allow users to focus
in on the details of the clustering. For the Table View, it is hard to read across a whole
row when there are many attributes. Alternating colors between rows or pinning down
the ID column while moving the rest of the columns would save users from having to
keep track of each line or remember all the IDs. For large datasets, the ability to search
on a particular field would be also be helpful in locating a data point in the different
visualizations such as the dendrogram, color mosaic, and the table view.
Additionally, the ability to adjust the color mapping range, which allows users to see
subtle differences in ranges of interest in the dendrogram, should be extended to the
Score Overview in Scatterplot Ordering tab. This is essential for skewed data
distributions to avoid a situation where a large part of the Score Overview is filled with
all green or red, indicating that most of the values are near extremes.
6