Download Scalable Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Scalable Data Mining
Gianmarco De Francisci Morales
Abstract
Research activity on scalable data mining and big data.
1
Research Statement
My research focuses on scalable data mining. In particular, my interest is
in mining the Web by using data-intensive scalable computing systems. As
such, my research develops along two main axes: (i) tapping into the Web as
a source of data and (ii) dealing with its immense scale. These two aspects
combined create what people call “big data.” I usually apply my research
to help people dealing with information overload.
In the first area, I am interested by the insights that come from exploring
news and social networks. I enjoy finding creative ways to extract new
information from the data. In the second area, I am thrilled by the challenges
created by the sheer size of the data. I like developing new scalable data
mining algorithms that take advantage of parallel and streaming solutions.
Inspired by Ernest Hemingway, my research in 6 words:
Big data: because more is different.
References
[1] R. Baraglia, G. De Francisci Morales, and C. Lucchese. Document
Similarity Self-Join with MapReduce. In ICDM ’10: 10th IEEE International Conference on Data Mining, pages 731–736, 2010.
[2] R. Baraglia, C. Lucchese, and G. De Francisci Morales. Large-scale
Data Analysis on the Cloud. In XXIV Convegno Annuale del CMGItalia, [best paper award] 2010.
[3] A. Bifet and G. De Francisci Morales. Big Data Stream Learning with
SAMOA. In ICDM ’14: 14th IEEE International Conference on Data
Mining, pages 1199–1202, [demo] 2014.
1
[4] A. Bifet, G. De Francisci Morales, J. Read, G. Holmes, and
B. Pfahringer. Efficient Online Evaluation of Big Data Stream Classifiers. In KDD ’15: 21th ACM International Conference on Knowledge
Discovery and Data Mining, pages 59–68, 2015.
[5] R. Blanco, G. De Francisci Morales, and F. Silvestri. Towards Leveraging Closed Captions for News Retrieval. In WWW ’13: 22nd International World Wide Web Conference, pages 135–136, [poster] 2013.
[6] R. Blanco, G. De Francisci Morales, and F. Silvestri. IntoNews: Online
News Retrieval using Closed Captions. IP&M: Information Processing
& Management, 51(1):148–162, 2015.
[7] F. Bonchi, G. De Francisci Morales, A. Gionis, and A. Ukkonen. Activity Preserving Graph Simplification. DMKD: Data Mining and Knowledge Discovery, 27(3):321–343, 2013.
[8] F. Bonchi, G. De Francisci Morales, and M. Riondato. Centrality measures on big graphs: Exact, approximated, and distributed algorithms.
In WWW ’16: 25th International World Wide Web Conference, pages
1017–1020, [tutorial] 2016.
[9] I. Bordino, G. De Francisci Morales, I. Weber, and F. Bonchi. From
Machu Picchu to “rafting the urubamba river”: Anticipating information needs via the Entity-Query Graph. In WSDM ’13: 6th ACM International Conference on Web Search and Data Mining, pages 275–284,
2013.
[10] C. Castillo, G. De Francisci Morales, M. Mendoza, and N. Khan. Says
Who? Automatic Text-Based Content Analysis of Television News.
In MNLP ’13: 1st Workshop on Mining unstructured big data using
Natural Language Processing @CIKM, pages 53–60, 2013.
[11] C. Castillo, G. De Francisci Morales, and A. Shekhawat. Online Matching of Web Content to Closed Captions in IntoNow. In SIGIR ’13: 36th
ACM International Conference on Research and Development in Information Retrieval, pages 1115–1116, [demo] 2013.
[12] V. Catania, G. De Francisci Morales, A. G. D. Nuovo, M. Palesi, and
D. Patti. High Performance Computing for Embedded System Design:
A Case Study. In DSD ’08: 11th Conference on Digital System Design
Architectures, Methods and Tools, pages 656–659, 2008.
[13] V. Catania, A. G. D. Nuovo, M. Palesi, D. Patti, and G. De Francisci Morales. An Effective Methodology to Multi-objective Design
of Application Domain-specific Embedded Architectures. In DSD ’09:
12th Conference on Digital System Design, Architectures, Methods and
Tools, pages 643–650, 2009.
2
[14] M. Das, G. De Francisci Morales, A. Gionis, and I. Weber. Learning to
Question: Leveraging User Preferences for Shopping Advice. In KDD
’13: 19th ACM Conference on Knowledge Discovery and Data Mining,
pages 203–211, 2013.
[15] G. De Francisci Morales. Cloud Computing for Large Scale Data Analysis. Technical report, IMT Institute for Advanced Studies, February
2010.
[16] G. De Francisci Morales. SAMOA: A Platform for Mining Big Data
Streams. In RAMSS’13: 2nd International Workshop on Real-Time
Analysis and Mining of Social Streams @WWW, pages 777–778, [extended abstract] 2013.
[17] G. De Francisci Morales and A. Bifet. SAMOA: Scalable Advanced
Massive Online Analysis. JMLR: Journal of Machine Learning Research, 16(Jan):149–153, 2015.
[18] G. De Francisci Morales, A. Bifet, L. Khan, J. Gama, and W. Fan.
IoT Big Data Stream Mining. In KDD ’16: 22nd ACM International
Conference on Knowledge Discovery and Data Mining, [tutorial] 2016.
[19] G. De Francisci Morales and A. Gionis. Streaming Similarity Self-Join.
PVLDB: Proceedings of the VLDB Endowment, 9(10):792–803, 2016.
[20] G. De Francisci Morales, A. Gionis, and C. Lucchese. From Chatter
to Headlines: Harnessing the Real-Time Web for Personalized News
Recommendation. In WSDM ’12: 5th ACM International Conference
on Web Search and Data Mining, pages 153–162, 2012.
[21] G. De Francisci Morales, A. Gionis, and M. Sozio. Social Content
Matching in MapReduce. PVLDB: Proceedings of the VLDB Endowment, 4(7):460–469, 2011.
[22] G. De Francisci Morales, C. Lucchese, and R. Baraglia. Scaling Out
All Pairs Similarity Search with MapReduce. In LSDS-IR ’10: 8th
Workshop on Large-Scale Distributed Systems for Information Retrieval
@SIGIR, pages 25–30, 2010.
[23] G. De Francisci Morales and A. Shekhawat. The Future of Second
Screen Experience. In Workshop on Exploring and Enhancing the User
Experience for Television @CHI, 2013.
[24] K. Garimella, G. De Francisci Morales, A. Gionis, and M. Mathioudakis. Exploring Controversy in Twitter. In CSCW ’16: 19th ACM
Conference on Computer-Supported Cooperative Work and Social Computing, pages 33–36, [demo] 2016.
3
[25] K. Garimella, G. De Francisci Morales, A. Gionis, and M. Mathioudakis. Quantifying Controversy in Social Media. In WSDM ’16:
9th ACM International Conference on Web Search and Data Mining,
pages 33–42, 2016.
[26] K. Garimella, G. De Francisci Morales, A. Gionis, and M. Sozio. Scalable Facility Location for Massive Graphs on Pregel-like Systems. In
CIKM ’15: 24th ACM International Conference on Information and
Knowledge Management, pages 273–282, 2015.
[27] S. Gonzalez-Bailon, G. De Francisci Morales, M. Mendoza, N. Khan,
and C. Castillo. Cable News Coverage and Online News Stories: A
Large-Scale Comparison of Media Bias. SSRN: Social Science Research
Network, 2389525(Feb), 2014.
[28] C. M. Iacono-Manno, M. Fargetta, R. Barbera, A. Falzone, G. Andronico, S. Monforte, A. Muoio, R. Bruno, P. D. Primo, S. Orlando,
E. Leggio, A. Lombardo, G. Passaro, G. De Francisci Morales, and
S. Blandino. The Sicilian Grid Infrastructure for High Performance
Computing. International Journal of Distributed Systems and Technologies, 1(1):40–54, 2010.
[29] N. Kourtellis, G. De Francisci Morales, and F. Bonchi. Scalable Online
Betweenness Centrality in Evolving Graphs. TKDE: IEEE Transaction
on Knowledge and Data Engineering, 27(9):2494–2506, 2015.
[30] N. Kourtellis, G. De Francisci Morales, and F. Bonchi. Scalable Online Betweenness Centrality in Evolving Graphs. In ICDE ’16: 32nd
IEEE International Conference on Data Engineering, pages 1580–1581,
[extended abstract] 2016.
[31] D. Marron, A. Bifet, and G. De Francisci Morales. Random Forests
of Very Fast Decision Trees on GPU for Mining Evolving Big Data
Streams. In ECAI ’14: 21st European Conference on Artificial Intellingence, pages 615–620, 2014.
[32] M. A. U. Nasir, G. De Francisci Morales, D. Garcı́a-Soriano, N. Kourtellis, and M. Serafini. Partial Key Grouping: Load-Balanced Partitioning
of Distributed Streams. arXiv:1510.07623, 2015.
[33] M. A. U. Nasir, G. De Francisci Morales, D. Garcı́a-Soriano, N. Kourtellis, and M. Serafini. The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines. In ICDE ’15: 31st
IEEE International Conference on Data Engineering, pages 137–148,
2015.
4
[34] M. A. U. Nasir, G. De Francisci Morales, N. Kourtellis, and M. Serafini.
When Two Choices Are not Enough: Balancing at Scale in Distributed
Stream Processing. In ICDE ’16: 32nd IEEE International Conference
on Data Engineering, pages 589–600, 2016.
[35] D. M. Shankaralingappa, G. De Francisci Morales, and A. Gionis. Extracting Skill Endorsements from Personal Communication Data. In
CIKM ’16: 25th ACM International Conference on Information and
Knowledge Management, 2016.
[36] B. Thomee and G. De Francisci Morales. Automatic Discovery of Global
and Local Equivalence Relationships in Labeled Geo-Spatial Data. In
HT ’14: 25th ACM Conference on Hypertext and Social Media, pages
158–168, 2014.
[37] A. T. Vu, G. De Francisci Morales, J. Gama, and A. Bifet. Distributed
Adaptive Model Rules for Mining Big Data Streams. In BigData ’14:
2nd IEEE International Conference on Big Data, pages 345–353, 2014.
5
Related documents