Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Determining Trends in Seed Dispersal: A Pilot Study Using Data Mining Techniques Selina A. Ruzi1 1Program in Ecology, Evolution, and Conservation Biology, UIUC, [email protected] Results/ Conclusions Objectives To determine whether data mining is a viable technique to determine trends in research on seed dispersal, particularly defining areas of recent interest and knowledge gaps. Methods • The Scopus database was searched using the following criterion: • Search term = “seed dispersal” • Document type = Article • Year published = 2016 or 2000 • Subject area = Life sciences • 163 abstracts were exported from the 2016 publishing year (1 late 2015) and 170 abstracts from 2000 • Each line of the csv file contained an abstract from a different article • Python was used to read in the csv file line by line, extracting out the abstract, separating the abstract by words, and then identifying whether specific search words (Table 1) were present in the abstract and to count the number of times they appear if they were present in the abstract • Word counts, author, title, year, and abstract were exported into a newly created csv file • Word counts were manipulated using the R program • Graphing and other future analyses to be done using the R program Figure 1. Figure 1: The percentage of articles in which each of these categories appears based on the number of abstracts sampled for either 2000 (170 total) or 2016 (163 total including one from late 2015). • The category that appears the most times in abstracts from both 2016 and 2000 publishing years is the “Fate” category appearing in 29% and 26% of the abstracts respectively (Figure 1). • The category that most increased in abundance based on the percentage of abstracts it appeared in in 2016 versus 2000 was the “Tropics” category (Figure 2). Figure 2. • The category that has decreased in abundance the most based on the percentage of abstracts it appeared in in 2016 versus 2000 was the “Distances” category (Figure 2). • A larger sampling size and more analyses are needed to be able to really say if data mining is a valuable tool to determine trends in seed dispersal research through time. However, initial broad trends are able to be identified using these methods. Figure 2: The difference in the percentage of abstracts the categories appear in from 2016 to 2000. Blue indicates categories that have increased in abundance and red indicates categories that have decreased in abundance. Category Words Searched Category Words Searched Abiotic Active Abiotic, Abiotic-factor, Abiotic-factors Active, Active-dispersal Germination Insect Ant Aril Beetle Ant, Ants, Formicidae Aril, Arils Beetle, Beetles, Dung-beetle, Dung-beeltes Mammal Myrmecochorous Non-myrmecochorous Biotic Bird Chem Distance Effect Elaiosome Fate Biotic, Biotic-factor, Biotic-factors Bird, Birds, Avian Chemical, Chemistry, Chemicals Distance, Distances Effectiveness Elaiosome, Elaiosomes Fate, Destiny, Predation, Germination, Eaten, Loss, Predated Passive Predation Primary Rodent Secondary Temperate Tropics Germination, Germinated Insect, Insects, Ant, Ants, Formicidae, Beetle, Beetles, Dung-beetle, Dung-beetles Mammal, Mammals, Rodent, Rodents Myrmecochore, Myrmecochores, Myrmecochorous Non-myrmecochore, Non-myrmecochores, Nonmyrmecochorous Passive, Passive-dispersal Predation, Predated, Eaten Primary, Primary-dispersal Rodent, Rodents Secondary, Secondary-dispersal Temperate Neotropics, Neotropic, Neotropical, Tropics, Tropical, Paleotropical, Paleotropics, Paleotropic Table 1: Categories and words searched within to further subset the abstracts. Words in black were searched in the Python script. Words in red will be added to the Python script in future analyses along with other categories. Future Directions • Include more articles from more years • Determine a better way to conduct word counts • Expand the search area from solely abstracts to other areas of the papers • Make the python and R codes more efficient • Fix bugs in current coding Acknowledgements Thanks to the instructors of the Focal Point: Data Science Across Disciplines class. This work was part of a Focal Point grant funded by the Graduate College at the University of Illinois at Urbana-Champaign.