Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Trends in News Reporting Galileo Namata, Yee Lin Tan {namatag, yeelin}@cs.umd.edu February 27, 2007 INTRODUCTION News sources, including newspapers, websites, blogs, and news shows, have limited space and time to discuss daily events. Referred to as the “news hole,” news sources must be selective about what to cover and in the case of international news, what countries to cover. Consequently, there exists a discrepancy on how often different countries are mentioned with respect to each other in news sources. This in turn affects public opinion as well as trade, aid, and international intervention in the case of conflicts1. In this paper, we expand on an earlier work by Ethan Zuckerman and explore this discrepancy using the Hierarchical Clustering Explorer (HCE) tool2 on the Global Attention Profile Dataset3. We focus on identifying clusters and trends in the news reporting with respect to other data about each country (such as GDP, population, oil production and consumption, and number of conflict casualties), as well as provide comments about our experience with using the HCE tool. DATASET The Global Attention Profile Dataset (GAP) is an automatically generated compilation of the number of times a country is mentioned in select news media outlets. The data is presented daily in world maps colored by the number of times each country is mentioned, as well as in a series of web pages listing the raw value of each count in an HTML table. Although there are data from 2001 and 2003, the data is only available on a daily basis from September 2004 onward. We focus on the data from 2005 and 2006 since we have a full year’s worth of data for those years. There are 26 news sources mentioned in 2006 though a number of news sources have been added over time. In order to more easily use the datasets, we converted the full collection of raw data into a MySQL database which we can query for specific groups and data. We also wanted to analyze the data in terms of other country statistics, such as Gross Domestic Product (GDP), population, oil production and consumption, and conflict deaths. For this we acquired data from the World Bank4, CIA World Fact Book5 and Ploughshares6. 1 http://h2odev.law.harvard.edu/ezuckerman/paper.pdf http://www.cs.umd.edu/hcil/hce/ 3 http://h2odev.law.harvard.edu/ezuckerman/ 4 http://www.worldbank.org/data 5 https://www.cia.gov/cia/publications/factbook/ 6 http://www.ploughshares.ca/imagesarticles/ARC06/map.pdf 2 1 ANALYSIS News Source Correlation – US News, You’ve Seen One, You’ve Seen Them All Figure 1. News Source Correlation ranking, 2005. Figure 2. News Source Correlation ranking, 2006. Two of the largest news sources in the United States, Fox News and ABC News, are consistently among the most correlated news sources. Associated Press (AP) and Reuters, the two syndicated news sources, also correlate highly in 2006. Overall, Figures 1 and 2 show that the news coverage of the selected news sources highly correlate with each other. This is surprising given that the news sources come in many different forms, including search engine results, blogs, and two foreign newspapers. 2 Figure 3. News Source Clustering by Country Coverage, 2006. The dendrogram in Figure 3 further confirms that most news sources coverage are very similar to each other. The three major news players - BBC, New York Times, and CNN are distinctly different in coverage compared to other news sources. Not surprisingly, the three search engines - Google, Yahoo, and Alta Vista – also compose of a highly correlated group that is different from the other news sources. This implies that consumers of Fox or ABC News who are interested in more heterogeneous news coverage may want to sample from different clusters of news media outlets such as the BBC-New York Times-CNN cluster or the search engine cluster. Country Attribute Correlation – Sorry, You’re Not Rich Enough to be Mentioned Figure 4. Coverage Correlation to GDP, 2005. The primary correlation with news coverage is financial, whether via GNI or the GDP of nation. Population is not a factor for most news sources, with the exception of Xinhua and Hindustan, whose primary audience, China and India, are two of the most populated countries. Of all the news sources analyzed, BBC’s coverage is least correlated with GDP, oil production and consumption. Instead, population appears to have a more significant influence as far as BBC is concerned. This is consistent with Zuckerman’s 3 hypothesis that BBC has a tendency to give more attention to former British colonies, many of which have large populations but small GDPs. Country Correlation - Country Coverage Linked Figure 5. Country Clustering by News Coverage for 2005 and 2006. Correlations on countries taken over a two-year range reveal two notable types of trending. The first type of trend is the pairing of two highly correlated countries which either move in unison or remain static across the years. An example of this first trend type is the pairing of the US and Iraq. The ongoing war with Iraq has tied the news coverage of the US and Iraq for both years and identified both as significant news contributors. The second type of trend is one where a single country spikes in coverage for a specific time duration. An example of the second trend type is the UK coverage between 2005 and 2006. The UK received much more coverage in 2005 likely due to the London bombing. The same can be said about Lebanon whose coverage in 2006 is much higher, potentially due to the conflict with Israel in 2006. 4 Conflict Coverage – The Deaths of the Poor Goes Unnoticed in Conflicts Figure 6. Conflict Countries Clustering by News Coverage in 2005. Figure 7. Comparison of News Coverage to Conflict Severity by Country, 2005. From the example in the previous section about Lebanon and the UK, news coverage can increase due to conflict. Ideally this means that despite the propensity toward covering countries with high GDP in general, in cases where large losses in human life results from violent conflict, international news will take note and adjust coverage accordingly. Some of the high casualty countries did receive a lot of coverage, but that is not always the case. Six out of eight countries with the highest number of deaths by conflict (over 100,000 deaths), did not receive significant coverage. Of these six, five are poor, lowvisibility African countries. On the contrary, six countries with potentially higherprofiled conflicts but overall fewer deaths received more attention. This shows that the number of human deaths does not directly map to increased coverage, but other factors related to the ability to create news sensation play a greater role. 5 HCE TOOL CRITIQUE The Hierarchical Clustering Explorer is an excellent tool for the exploration of multidimensional data. The capabilities it provides definitely has its place in data analysis tools. The tool, however, is still lacking in a few aspects compared to more developed tools like Spotfire. First, HCE has a number of bugs that causes occasional crashes. Upon opening and before loading any data, clicking through the various tabs in the lower panel results in a crash. For example, repeated in and out clickings of the Scatterplot Ordering tab would result in an application error, causing it to shut down. Program-terminating errors also occur when attempting to perform ordering on very high dimensional data in the Scatterplot Ordering tab. These errors can be addressed by checks on the data to make sure large datasets do not cause an overflow, or in the least, result in only a warning and not an application crash. In situations where an application error does occur, HCE should ideally provide informative error messages so that users can take corrective measures. A few additional features can greatly improve the tool. First, HCE does not remember the option settings between analyses and does not provide a way to return to previous views during the same analysis. Both of these functionalities would be very helpful when users are exploring advanced features and may want to undo certain actions. Similarly, it would be nice to be able to export all the visualizations provided by the tool, including the Color Mosaic, Histogram Ordering, etc., to image files for use in presentations. The tool currently does not provide much support for filtering of data. When importing large datasets, there may be outliers or noise causing a number of the displays, such as the scatterplots, to be skewed. The ability to remove those data points, instead of having to process the data outside and then reload it, would be beneficial. With regard to the display, the dendrogram is unreadable for very similar items. The ability to zoom into or resize the branches of the dendrogram would allow users to focus in on the details of the clustering. For the Table View, it is hard to read across a whole row when there are many attributes. Alternating colors between rows or pinning down the ID column while moving the rest of the columns would save users from having to keep track of each line or remember all the IDs. For large datasets, the ability to search on a particular field would be also be helpful in locating a data point in the different visualizations such as the dendrogram, color mosaic, and the table view. Additionally, the ability to adjust the color mapping range, which allows users to see subtle differences in ranges of interest in the dendrogram, should be extended to the Score Overview in Scatterplot Ordering tab. This is essential for skewed data distributions to avoid a situation where a large part of the Score Overview is filled with all green or red, indicating that most of the values are near extremes. 6