Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Big Signal Processing for Multi-Aspect Data Mining Evangelos E. Papalexakis, Carnegie Mellon University http://www.cs.cmu.edu/˜epapalex What does a social graph between people who call each other look like? How does it differ from one where people instantmessage or e-mail each other? Social interactions, along with many other real-word processes and phenomena, have different Chapter 2 people calling each other will likely differ aspects, such as the means of communication. In the above example, the activity of from the activity of people instant-messaging each other. Nevertheless, each aspect of the interaction is a signature of the same underlying social phenomenon: formation of social ties and communities. Taking into account all aspects of social interaction Preliminaries results in more accurate social models (e.g, communities). The main thesis of my work is that many real-world problems, such as the aforementioned, benefit from jointly modeling and analyzing the multi-aspect data associated with the underlying phenomenon we seek to uncover. My research bridges Signal Processing and Data Science through designing and developing scalable and interpretable algorithms for mining big multi-aspect data, to addressOurhigh-impact real-world methods have a specific emphasis applications. on Tensor analysis. Thus, here, we provide a very brief, high overview of how Tensors can be used as an exploratory analysis tool, using as a motivating example th a Knowledge Base. Tensor analysis is by no means a new problem, however, our on-going and prop work is novel in the context of the applications that we are interested in, as well as the new models algorithms that we develop. THESIS WORK My thesis work is broken down to algorithms with contributions mostly in tensor analysis as well as other fields such as control system identification [10], and multi-aspect data mining applications. Matrices record dyadic properties, like “people recommending products”. Tensors are the n-mode ge alizations, capturing 3- and higher-way relationships. For example “subject-verb-object” triplets natu lead to a 3-mode tensor. A) Algorithms Concept1 Concept2 Concept R The primary computational tool in my work is tensor decomposition. Tensors are multicR c1 c2 bR b1 b2 dimensional matrices, where each aspect of the data is mapped to one of the dimensions verbs ⇡ + + ... + or modes. In order to analyze a tensor, we compute a decomposition or factorizationsubjects X" X aR a1 a2 (henceforth we use the terms interchangeably), which gives a low-dimensional embedobjects ding of all the aspects. I focused on the Canonical or PARAFAC decomposition, which Figure 1: Canonical or PARAFAC decomdecomposes the tensor into a sum of outer products of latent factors (seeFigure also2.1: Figure 1): PARAFAC decomposition of a three-way tensor of the NELL dataset as sum of R outer products ( X≈ R X r=1 position into sum of R rank-one compo- one tensors), reminiscent of the rank-R singular value decomposition of a matrix. Each component correspon nents. component is a latent concept a latent concept, e.g. ”leaders”, ”cars”, Each ”tools” and so on. ar ◦ br ◦ cr or a co-cluster. where ◦ denotes outer product, i.e., [a ◦ b ◦ c](i, j, k) = a(i)b(j)c(k). Tensor Informally, each latent factor is a co-cluster of the aspects decomposition as soft clustering: For instance, given a “subject-verb-object” tensor, one of the tensor. The advantages of this decomposition over other existing ones are (each factor corresponds to a each one of t decompose it intointerpretability a sum of a (usually) small number of triplets of vectors; intuitively, triplets corresponds to a different concept, e.g., “politicians”, and “tools”. Each vector of co-cluster), and strong uniqueness guarantees for the latent factors.Tensor decompositions are very powerful tools, “countries”, with numerous triplet may be viewed as a soft clustering indicator: suppose that a, b, c are the vectors of the “politici applications (see details below). There is increasing interest in their application to BigtoData problems, both from academia and respectively. The triplet that correspond the “subject”, “verb” and “object” dimensions (or modes) willapplicability indicate the membership of all the subjects to theproblems, “politicians” cluster, and b and c will do so fo industry. However, algorithmically, there exist challenges which limit their to real-world, big data pertaining the verbs and objects. to scalability and quality assessment of the results. Below I outline how my work addresses those challenges, towards a broad For example, see Figure 2.1. The triplet of vectors a1 , b1 , c1 will correspond to the first concept ( adoption of tensor decompositions in big data science. “leaders-organizations”); subjects/rows with high score on a1 will be the leaders, like “obama”, “mer “eric-schmidt”, objects/columns with high score on b1 will be organizations, like “usa”, “germa A1) Parallel, and Scalable Tensor Decompositions Consider a multi-aspect tensor dataset that is too big to fit in the main memory of a single machine. The5 data may have large “ambient” dimension (e.g., a social network can have billions of users), however the observed interactions are very sparse, resulting in extremely sparse data. This data sparsity can be exploited for efficiency. In [9] we formalize the above statement by proposing the concept of a triple-sparse algorithm where 1) the input data are sparse, 2) the intermediate data that the algorithm is manipulating or creating are sparse, and 3) the output is sparse. Sparsity in the intermediate data is crucial for scalability. In [15] we show that the intermediate data created by a tensor decomposition algorithm designed for dense data can be many orders of magnitude larger than the original data, rendering the analysis prohibitive. Sparsity in the results is a great advantage, both in terms of storage but most importantly in terms of interpretability, since sparse models are easier for humans to inspect. None of the existing state of the art algorithms, before[9], fulfilled all three requirements for sparsity. In [9] we propose PAR C UBE, the first triple-sparse, parallel algorithm for tensor decomposition. Figure 2(a) depicts a highlevel overview of PAR C UBE. Suppose that we computed a weight of how important every row, column, and “fiber” (the third mode index) of the tensor is. Given that weight, PAR C UBE takes biased samples of rows, columns, and fibers, extracting a small tensor from the full data. This is done repeatedly with each sub-tensor effectively explores different parts of the data. Subsequently, PAR C UBE decomposes all those sub-tensors in parallel generating partial results. Finally, PAR C UBE merges the partial results ensuring that partial results corresponding to the same latent component are merged together. The power behind PAR C UBE is that, even though the tensor itself might not fit in memory, we can choose the sub-tensors appropriately so that they fit in memory, and we can compensate by extracting many independent sub-tensors. PAR C UBE converges to the same level of sparsity as [13] (the first tensor decomposition with latent sparsity) and furthermore PAR C UBE’s approximation error converges to the one of the full decomposition (in cases where we are able to run the full decomposition). This demonstrates that PAR C UBE’s sparsity maintains the useful information in the data. In [11], we extend the idea of [9], introducing T URBO -SMT, for the case of Coupled Matrix-Tensor Factorization (CMTF), where a tensor and a matrix share one of the aspects of the data, achieving up to 200 times faster execution with comparable accuracy to the baseline, on a single machine. Subsequently, in [17] we propose PARAC OMP, a novel parallel architecture for tensor decomposition in similar spirit as PAR C UBE. Instead of sampling, PARAC OMP uses random 1 Evangelos E. Papalexakis - Research Statement Quality Assessment Scalability! 4 10 !#" Ү# 2 $# !" Time (sec) 10 0 10 −2 10 Ү# 100x larger data! Baseline−1 Baseline−2 ICASSP 15 −4 10 !"# (a) The main idea behind PAR C UBE: Using biased sampling, extract small representative sub-sampled tensors, decompose them in parallel, and carefully merge the final results into a set of sparse latent factors. 2 3 10 4 10 5 10 I=J=K 10 (b) Computing the decomposition quality for tensors for two orders of magnitude larger tensor than the state of the art (I, J, K are the tensor dimensions). Figure 2 projections to compress the original tensor to multiple smaller tensors. Thanks to compression, in [17] we prove that PARAC OMP can guarantee the uniqueness of the results (cf. [17] for exact bounds and conditions). This is a very strong guarantee on the correctness and quality of the result. In addition to [9, 11, 17], which introduce a novel paradigm for parallelizing and scaling up tensor decomposition, in [15] we developed the first scalable algorithm for tensor decompositions on Hadoop which was able to decompose problems larger by at least two orders of magnitude than the state of the art. Subsequently [4] we developed a Distributed Stochastic Gradient Descent method for Hadoop that scales to billions of parameters. A2) Unsupervised Quality Assessment of Tensor Decompositions Real-world exploratory analysis of multi-aspect data is, to a great extent, unsupervised. Obtaining ground truth is a very expensive and slow process, or in the worst case impossible; for instance, in Neurosemantics where we research how language is represented in the brain, most of the subject matter is uncharted territory and our analysis drives the exploration. Nevertheless, we would like assess the quality of our results in absence of ground truth. There is a very effective heuristic in Signal Processing and Chemometrics literature by the name of “Core Consistency Diagnostic” (cf. Bro and Kiers, Journal of Chemometrics, 2003) which assigns a “quality” number to a given tensor decomposition and gives information about the data being inappropriate for such analysis, or the number of latent factors being incorrect. However, this diagnostic has been specifically designed for fully dense and small datasets, and is not able to scale to large and sparse data. In [8], exploiting sparsity, we introduce a provably exact algorithm that operates on at least two orders of magnitude larger data than the state of the art (as shown in Figure 2(b)), which enables quality assessment on large real datasets for the first time. Impact - Algorithms • PAR C UBE [9] is the most cited paper of ECML-PKDD 2012 with 46 citations at the time of writing, where the median number of citations for ECML-PKDD 2012 is 5. Additionally, PAR C UBE has already been downloaded more than 80 times by universities and organizations from 23 countries. • T URBO -SMT [11] was selected as one of the best papers of SDM 2014, and will appear in a special issue of the Statistical Analysis and Data Mining journal. • PARAC OMP [17] has appeared in the prestigious IEEE Signal Processing Magazine. B) Applications My work has focused on: 1) Multi-Aspect Graph Mining, 2) Neurosemantics, and 3) Knowledge on the Web. B1) Multi-Aspect Graph Mining 0 0 0 0 20 20 20 20 60 80 0 60 80 20 40 60 Node Groups 2 40 80 0 Node Groups 40 Node Groups Node Groups Node Groups In [5] we introduce G RAPH F USE, a tensor based community detection algorithm which uses different aspects of social interaction and outperforms state of the art in community extraction accuracy. Figure 3 shows G RAPH F USE at work, identifying communities in R EALI TY M INING , a real dataset of multi-aspect interactions between stu(a) calls (b) proximity (c) sms (d) friends dents and faculty at MIT. The communities are consistent across as- Figure 3: Results on the four views of the R EALITY M INING pects and agree with ground truth. Another aspect of a social network multi-graph. Red dashed lines outline the clustering found by is time. In [3] we introduce C OM 2, a tensor based temporal commu- G RAPH F USE. nity detection algorithm, which identifies social communities and their behavior over time. In [7] we consider language as another aspect, where we identify topical and temporal communities in a discussion forum of Turkish immigrants in the Netherlands, and in [12] we consider location (which has become extremely pervasive recently) analyzing data from Foursquare, identifying spatial and 40 60 80 20 40 60 Node Groups 80 0 40 60 80 20 40 60 Node Groups 80 0 20 40 60 Node Groups 80 to r (and typically r is much smaller than the dimensions of the matrix we are solving for, i.e. we are forcing the solution model. In particular, the brain activityStatement vector y that we obse Evangelos E. Papalexakis - Research to be low rank). Similar to the LS case, here we minimize simply the collection of values recorded by the m sensors, p the sum of squared errors, however, the solution here is low on a person’s scalp. rank, as opposed to the LS solution which is (with very high In M ODEL0 , we attempt to model the dynamics of the s full rank. temporal patternsprobability) of users’ activity in Foursquare. Not necessarily restricted to social networks, directly. in my work I haveby also analyzed measurements However, doing so, we are directin However network intuitive, graphs, the formulation M ODEL0and turns out to attacks[6, be attention to an observable proxy of the process that we are t multi-aspect computer detectingofanomalies network 16]. rather ineffective in capturing to estimate (i.e. the functional connectivity). Instead, it is Impact - Multi-Aspect Graph Mining the temporal dynamics of the recorded brain activity. As an example of its failure to model brain activity beneficial to model the direct outcome of that process. Ideall • C OM 2 [3] won the best student paper award at PAKDD 2014. successfully, Fig. 2 shows the real and predicted (using LS and would like to capture the dynamics of the internal state of the • G RAPH F USE [5]activity has been 80 times 21 countries. CCA) brain fordownloaded a particular more voxelthan (results by LSfrom and CCA son’s brain, which, in turn, cause the effect that we are meas • Our work in [16]toisthe deployed the 2Institute Information Industry in Taiwan, detecting real network intrusion attempts. are similar one in by Fig. for all for voxels). By minimizing with our MEG sensors. • Our work in [6] was selected, onealgorithms of the bestthat papers ASONAM appear in Springer’s Encyclopedia Social the sum of squared errors,as both solveoffor M ODEL0 2012, to Let us assume that there are n hiddenfor (hyper)regions of the b resortAnalysis to a simple line that increases very slowly over time, thus Network and Mining. which interact with each other, causing the activity that we ob having a minimal squared error, given linearity assumptions. in y. We denote the vector of the hidden brain activity as B2) Neurosemantics size n ⇥ 1. Then, by using the same idea as in M ODEL0 , we Real and predicted MEG brain activity How is knowledge represented in the human brain? Which regions have high activity and information flow, when of a concept such formulate the temporal evolution the hidden brain activity as “food” is shown to a human subject? Do all human subjects’ brains behave similarly in x(t this+context? Consider the + following 1) = A[n⇥n] ⇥ x(t) B[n⇥s] ⇥ s(t) experimental setting, where multiple human subjects are shown a set of concrete English nouns (e.g. “dog”, “tomato” etc), and we one step clo Having introduced the above equation, we are measure each person’s brain activity using various techniques (e.g, fMRI or MEG).modelling In this experiments, human subjects, semantic the underlying, hidden process whose outcome w stimuli (i.e., the nouns), and measurement methods are all different aspects of the same phenomenon: mechanisms serve.underlying However, an issue that wethe have yet to address is the fac x is not observed and we have no means of measuring it. We that the brain uses to process language. to resolve this issue by theTo measurement proc In [11], we seek to identify coherent regions of the brain that are activated forpose a semantically coherent setmodelling of stimuli. that itself, i.e. model the transformation of a hidden brain activity end we combine fMRI measurements with semantic features (in the form of simple questions, such as Can you pick it up?) for the tor to its observed counterpart. We assume that this transform same set of nouns, which provide useful information to the decomposition which might be missing from the fMRI data, as well is linear, thus we are able to write as constitute a human readable description of the semantic context of each latent group. A very exciting example of our results y(t) = C[m⇥n] x(t) can be seen in Figure 4(a), where all the nouns in the “cluster” are small objects, the corresponding questions reflect holding or Figure 2: up, Comparison true brain activity andregion brain activity Putting everything together, we end upwas withthe the following picking such objects and mostofimportantly, the brain that wasgenhighly active for this set of nouns and questions erated using the LS, and CCA solutions to M ODEL0 . Clearly, equations, which constitute our proposed model G E BM: premotor cortex, which is associated with holding or picking small items up. This result is entirely unsupervised and agrees with M ODEL0 is not able to capture the trends of the brain activity, x(t tasks + 1) (cf. = future A[n⇥n] ⇥ x(t)and + drive B[n⇥s] ⇥ s(t) Neuroscience. usminimizing confidence the thatsquared the same technique can used in more complex research) and This to thegives end of error, produces anbe almost y(t) = C ⇥ x(t) [m⇥n] neuroscientific discovery. straight line that dissects the real brain activity waveform. “apple”! “Is it edible?” (y/n)! Additionally, we require the hidden functional connectivity “knife”! “Can it hurt you?” (y/n)! Parietal lobe! trix Frontal lobe! A to be sparse because, intuitively, not all (hidden) regio (movement)! the brain interact directly with each other. Thus, given the a Nouns Questions Nouns QuestionsFL! PL! Nouns Questions Nouns Questions (attention)! beetle can ityou cause you pain? can itit cause you bed does use electricity? glass can pick itbeetle up? beetle can beetle can it cause you pain? approach: GeBM OL! TL!pain? beetle can itit cause cause you you pain? pain? bear 3.2 does it Proposed grow? formulation of G E BM, we seek to obtain a matrix A sparse en 4 3 house can you sit on it? tomato can you hold it in one hand? MEG! pants do you see it daily? cow is it alive? while obeying the dynamics dictated by model. Sparsity is k carODEL does it cast Formulating the problem asthan M nota shadow? able to meet the re0 is is it smaller a golfball?’ bee is it conscious? coat was it ever alive?bell voxel 1! Nouns Nouns Questions Nouns Questions Nouns Questions providing more 2 3 insightful and easy to interpret functional co …" not exQuestions Nouns Questions Nouns Questions Nouns Questions quirements forQuestions our desired solution. However, we have 24 41 bed does it use electricity? glass the canspace you pickof it up? can it cause you pain? bear does it grow? tivity matrices, since an exact zero on the connectivity matri Real and our predicted hausted possible formulations that live within setMEG brain activity house can you sit on it? hand? do you see it daily? cow is it alive? tomato can you hold it in one 0.35 0.35 0.35 0.4 Occipital lobe! plicitly of alive? simplifying assumptions. Indoes thisit cast section, we describe G E BM, 3 2 states that there is no direct interaction between neuron s it conscious? coat a shadow? bell is it smaller than a car golfball?’ was it ever voxel 2! 0.35 (vision)! 0.3 0.3 0.3 the contrary, a very small value in the matrix (if the matrix our proposed approach which, under the assumptions that we have 0.3 0.25 0.25 0.25 Temporal lobe! real sparse) is ambiguous and could imply either that the interact already made in Section 2, is able to meet our requirements remarkGeBM (language)! 0.25 0.2 0.2 0.2 0.2 negligible and thus could be ignored, or that there indeed is ably well. 0.15 0.15 0.15 0.15 with very small weight between the two neurons. In order to come up with 0.1 a more accurate0.1 model, it is useful to voxel 306! …" 0.1 0.1 The key ideas behind G E BM are: look more carefullyPremotor at the system that we are attempting to Cortexactual 0.05 0.05 0.05 0.05 0.4 real LS CCA 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 …" …" 1 Real and predicted MEG brain activity 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.3 0.3 0.25 0.25 0.2 0.2 0 0.15 0.035 0.05 50 50 0.05 0.04 0.04 100 0.05 Real and predicted MEG brain activity 0.03 150 150 0.3 0.3 0.25 0.25 0.2 0.2 0.03 150 0.4 0.4 0.3 0.3 0.02 200 0.01 250 0.15 0.3 0.2 0.2 20 40 0 0.15 150 Group1 200 250 100 150 200 Group 2 250 40 0 20 40 19 0 20 0.35 40 Real and predicted MEG brain activityMEG brain activity Real and predicted MEG brain activity Real and predicted 0.4 0.35 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.35 0.3 0.35 0.35 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0.05 0.05 0.35 250 0.01 0.1 0.1 300 0 50 0 0.1 20 0.35 0.35 0.05 0.05 0.005 0.1 300 0 100 0 0.35 0.35 0.35 0.35 0.01 0.005 50 40 0.2 0 200 0.02 0.01 250 0.4 0.3 20 0.015 0.015 200 0.02 0.4 16 0.03 0 0.025 100 0.025 0.02 40 0.1 0.05 0 Real40and predicted MEG brain activity 0 20 40 0 20 …" 100 20 0.2 0.1 18 0.1 0.1 0 0.3 0.2 0.035 0.15 0.1 0.05 50 0.03 0.4 0.3 8 0.2 0.04 0.4 0.1 0 0 0 0 20 20 40 0 0 40 0 0 0 0 20 20 20 40 40 0 0 040 0 0 0 20 20 20 40 40 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.1 0.1 0.05 0.05 0.05 00 0040 0 real GeBM real GeBM real GeBM 0.05 0 20 20 204040 0 0 40 20 40 300 50 0.05 100 0 150 20 200 40 Group 3 0.05 0 250 0 50 0 20 0 (a) The premotor cortex, having high activation here, is associated with motions such as holding and picking small items up. 40 0 100 0 150 20 200 Group 4 20 40 250 40 0 0 0 0 20 40 20 40 0 0 20 40 0 0 20 40 (b) Given MEG measurements (time series of the magnetic activity of a particular region of the brain), G E BM learns a graph between physical brain regions. Furthermore, G E BM simulates the real MEG activity very accurately. Figure 4: Overview of results of the Neurosemantics application. In a similar experimental setting, where the human subjects are also asked to answer a simple yes/no question about the noun they are reading, in [10] we seek to discover the functional connectivity of the brain for the particular task. Functional connectivity is an information flow graph between different regions of the brain, indicating high degree of interaction between (not necessarily physically connected) regions while the person is reading the noun and answering the question. [10] we propose G E BM, a novel model for the functional connectivity which views the brain as a control system and we propose a sparse system identification algorithm which solves the model. Figure 4(b) shows an overview of G E BM: given MEG measurements (time series of the magnetic activity of a particular region of the brain), we learn a model that describes a graph between physical brain regions, and simulates real MEG activity very accurately. Impact - Neurosemantics • G E BM [10] is taught in class CptS 595 at Washington State University. • Our work in [11] was selected as one of the best papers of SDM 2014 B3) Knowledge on the Web Knowledge on the Web has multiple aspects: real-world entities such as Barack Obama and USA are usually linked in multiple ways, e.g., is president of, was born in, and lives in. Modeling those multi-aspect relations a tensor, and computing a low rank decomposition of the data, results in embeddings of those entities in a lower dimension, which can help discover semantically and 3 Evangelos E. Papalexakis - Research Statement contextually similar entities, as well as discover missing links. In [9] we discover semantically similar noun-phrases in a Knowledge Base coming from the Read the Web project at CMU: http://rtw.ml.cmu.edu/rtw/. Language is another aspect: many web-pages have parallel content in different languages, however, some languages have higher representation than others. How can we learn a high quality joint latent representation of entities and words, where we combine information from all languages? In [14] we introduce the notion of translation-invariant word embeddings where we compute multi-lingual embeddings, forcing translations to be “close” in the embedding space. Our approach outperforms the state of the art. Yet another aspect of an real-world entity on the web is the set of search engine results for that entity, which is the biased view of each search engine, as a result of their crawling, indexing, ranking, and potential personalization algorithms, for that query. In [1] we introduce T ENSOR C OMPARE, a tool which measures the overlap in the results of different search engines. We conduct a case study on Google and Bing, finding high overlap. Given this high overlap, how can we use different signals, potentially coming from social media, in order to provide diverse and useful results? In [2] we follow-up designing a Twitter based web search engine, using tweet popularity as a ranking function, which does exactly that. This result has huge potential for the future of web search, paving the way for the use of social signals in the determination and ranking of results. Impact - Knowledge on the Web • In [14] we are the first to propose the concept of translation-invariant word embeddings. • Our work in [1] was selected to appear in a special issue of the Journal of Web Science, as one of the best papers in the Web Science Track of WWW 2015. FUTURE RESEARCH DIRECTIONS As more field sciences are incorporating information technologies (with Computational Neuroscience being a prime example from a long list of disciplines), the need for scalable, efficient, and interpretable multi-aspect data mining algorithms will only increase. 1) Long-Term Vision: Big Signal Processing for Data Science The process of extracting useful and novel knowledge from big data in order to drive scientific discovery is the holy grail of data science. Consider the case where we view the knowledge extraction process through a signal processing lens: suppose that a transmitter (a physical or social phenomenon) is generating signals which describe aspects of the underlying phenomenon. An example signal is a time-evolving graph (a time-series of graphs) which can be represented as a tensor. The receiver (the data scientist in our case), combines all those signals with ultimate goal the reconstruction (and general understanding) of their generative process. The “communication channel” wherein the data are transmitted can play the role of the measurement process where loss of data occurs, and thus we have to account the channel estimation into our analysis, in order to reverse its effect. We may consider that the best way to transmit all the data to the receiver is to find the best compression or dimensionality reduction of those signals (e.g., Compressed Sensing). There may be more than one transmitters (if we consider a setting where the data are distributed across data centers) and multiple receivers, in which case privacy considerations come into play. In my work so far I have established two connections between Signal Processing and Data Science (Tensor Decompositions & System Identification), contributing new results in both communities and have already had significant impact, demonstrated, for instance, by the amount of researchers using and extending my work. These two aforementioned connections are but instances of a vast landscape of opportunities for cross-pollination which will advance the state of the art of both fields and drive scientific discovery. I envision my future work as a three-way bridge between Signal Processing, Data Science, and high impact real-world applications. 2) Mid-Term Research Plan With respect to my shorter term plan (first 3-5 years), I provide a more detailed outline of the thrusts and challenges that I am planning to tackle, both regarding algorithms and multi-aspect data applications. Algorithms: Triple-Sparse Multi-Aspect Data Exploration & Analysis with Quality Assessment In addition to improving the state of the art for tensor decompositions, I plan to explore alternatives for multi-aspect data modeling and representation learning, which can be used with tensor decompositions. Some of the challenges are: Algorithms, models, and problem formulation: What is the appropriate (factorization) model? Are there any noisy aspects that may hurt performance? As the ambient dimensions of the data grow, and the number of aspects increases, data become much sparser. Sparsity, as we saw is a blessing when it comes to scalability, however it can also be a curse when it comes to detecting meaningful relatively dense patterns in subspaces within the data. Scalability & Efficiency: Data size will continue to grow and we need faster and more scalable algorithms to handle the size and complexity of the data. This will involve adjusting existing algorithms or proposing new algorithms that adhere to the triple-sparse paradigm. Furthermore, the particular choice of a distributed system can make a huge difference depending on the application. For instance, Map/Reduce works well in batch tasks, whereas it has well known weaknesses in iterative algorithms. Other systems, e.g., Spark could be more appropriate for such tasks, and in the future I plan to research the capabilities of current high-end distributed systems, in relation to the algorithms I develop. (Unsupervised) Quality Assessment: In addition to purely algorithmic approaches, it is instrumental to work in collaboration with field scientists and experts, and incorporate in the quality assessment elements that experts indicate as important. Finally, there are a lot of exciting routes of collaboration with Human-Computer Interaction and Crowdsourcing experts, harnessing the “wisdom 4 Evangelos E. Papalexakis - Research Statement of the crowds” in assessing the quality of our analysis, especially in applications such as knowledge on the web and multi-aspect social networks, where non-experts may be able to provide high quality judgements. Application: Neurosemantics In the search for understanding how semantic information is processed by the brain, I am planning to broaden the scope of the Neurosemantics applications, considering aspects such as language: are same concepts in different languages mapped in the same way in the brain? Are cultural idiosyncrasies reflected on the way that speakers of different languages represent information? Furthermore, I will consider more complex forms of stimuli (such as phrases, images, and video) and richer sources of semantic information, e.g, from Knowledge on the Web. There is profound scientific interest in answering the above research questions a fact also reflected on how well funded of a research area this is (e.g., see Brain Initiative http://braininitiative.nih.gov/) Application: Urban & Social Computing Social and physical interactionsof people in an urban environment, is an inherent multi-aspect process, that ties the physical and the on-line domains of human interaction. I plan to investigate human mobility patterns, e.g. through check-ins in on-line social networks, in combination with their social interactions and the content they create on-line, with specific emphasis on multi-lingual content which is becoming very prevalent due to population mobility. I also intend to develop anomaly detection which can point to fraud (e.g., people buying fake followers for their account). Improving user experience through identifying normal and anomalous patterns in human geo-social activity is an extremely important problem, both in terms of funding and research interest, as well as implications on revolutionizing modern societies. Application: Knowledge on the Web Knowledge bases are ubiquitous, providing taxonomies of human knowledge and facilitating web search. Many knowledge bases are extracted automatically, and as such they are noisy and incomplete. Furthermore, web content exists in multiple languages, which is inherently imbalanced and may result in imbalanced knowledge bases, consequently leading to skewed views of web knowledge per language. I plan to continue my work on web knowledge, devising and developing techniques that combine multiple, multilingual, structured (e.g., knowledge bases) and unstructured (e.g., plain text) sources of information on the web, aiming for high quality knowledge representation, as well as enrichment and curation of knowledge bases. 3) Funding, Collaborations, and Parting Thoughts During my studies, I have partaken in grant proposal writing, contributing significantly to a successful NSF/NIH BIGDATA grant proposal (NSF IIS-1247489 & NIH 1R01GM108339-1) which resulted in $894,892 to CMU ($1.6M total) and has funded most of my PhD studies. In particular, I worked on outlining and describing the important algorithmic challenges as well as the applications we proposed to tackle. Being a major contributor to the proposal was a unique opportunity for me, giving me freedom to shape my research agenda. I have also been fortunate to have collaborated with a number of stellar researchers both in academia and industry, many of whom have been my mentors in research. Research agenda is very often shaped in wonderful ways through such collaborations, as well as through mentoring and advising graduate students. To that end, I intend to keep nurturing and strengthening my on-going collaborations, and pursue collaboration with scholars within and outside my field, always following my overarching theme: Bridging Signal Processing and Big Data Science for high-impact, real-world applications. Selected References [1] Rakesh Agrawal, Behzad Golshan, and Evangelos E. Papalexakis. A study of distinctiveness in web results of two search engines. In WWW’15 Web Science Track, (author order is alphabetical). [2] Rakesh Agrawal, Behzad Golshan, and Evangelos E. Papalexakis. Whither social networks for web search? In ACM KDD’15, (author order is alphabetical). [3] Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos E. Papalexakis, and Danai Koutra. Com2: Fast automatic discovery of temporal (’comet’) communities. In Advances in Knowledge Discovery and Data Mining. Springer, 2014. [4] Alex Beutel, Abhimanu Kumar, Evangelos E. Papalexakis, Partha Pratim Talukdar, Christos Faloutsos, and Eric P Xing. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In SIAM SDM’14, 2014. [5] Evangelos E. Papalexakis, Leman Akoglu, and Dino Ienco. Do more views of a graph help? community detection and clustering in multigraphs. In IEEE FUSION’13. [6] Evangelos E. Papalexakis, Alex Beutel, and Peter Steenkiste. Network anomaly detection using co-clustering. In Encyclopedia of Social Network Analysis and Mining. Springer, 2014. [7] Evangelos E. Papalexakis and A. Seza Doğruöz. Understanding multilingual social networks in online immigrant communities. WWW ’15 Companion. [8] Evangelos E. Papalexakis and C. Faloutsos. Fast efficient and scalable core consistency diagnostic for the parafac decomposition for big sparse tensors. In IEEE ICASSP’15. [9] Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D Sidiropoulos. Parcube: Sparse parallelizable tensor decompositions. In ECMLPKDD’12. [10] Evangelos E. Papalexakis., Alona Fyshe, Nicholas D. Sidiropoulos, Partha Pratim Talukdar, Tom M. Mitchell, and Christos Faloutsos. Goodenough brain model: Challenges, algorithms and discoveries in multisubject experiments. In ACM KDD’14. [11] Evangelos E. Papalexakis, Tom M Mitchell, Nicholas D Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, and Brian Murphy. Turbosmt: Accelerating coupled sparse matrix-tensor factorizations by 200x. In SIAM SDM’14. [12] Evangelos E. Papalexakis, Konstantinos Pelechrinis, and Christos Faloutsos. Location based social network analysis using tensors and signal processing tools. In IEEE CAMSAP’15. [13] Evangelos E. Papalexakis, Nicholas D Sidiropoulos, and Rasmus Bro. From k-means to higher-way co-clustering: multilinear decomposition with sparse latent factors. IEEE Transactions on Signal Processing, 2013. [14] Matt Gardner, Kejun Huang, Evangelos E. Papalexakis., Xiao Fu, Partha Talukdar, Christos Faloutsos, Nicholas Sidiropoulos, and Tom Mitchell. Translation invariant word embeddings. In EMNLP’15. [15] U Kang, Evangelos E. Papalexakis, Abhay Harpale, and Christos Faloutsos. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In ACM KDD’12. [16] Ching-Hao Mao, Chung-Jung Wu, Evangelos E. Papalexakis, Christos Faloutsos, and Tien-Cheu Kao. Malspot: Multi2 malicious network behavior patterns analysis. In PAKDD’14. [17] N Sidiropoulos, Evangelos E. Papalexakis, and C Faloutsos. Parallel randomly compressed cubes: A scalable distributed architecture for big tensor decomposition. IEEE Signal Processing Magazine, 2014. 5