* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Transcript
Learning of Multimodal Representations With Random Walks on the Click Graph Abstract—In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. With the click data collected from the users’ searching behavior, existing approaches take either one-to-one paired data (text–image pairs) or ranking examples (text–query–image and/or image–query–text ranking lists) as training examples, which do not make full use of the click data, particularly the implicit connections among the data objects. In this paper, we treat the click data as a large click graph, in which vertices are images/text queries and edges indicate the clicks between an image and a query. We consider learning a multimodal representation from the perspective of encoding the explicit/implicit relevance relationship between the vertices in the click graph. By minimizing both the truncated random walk loss as well as the distance between the learned representation of vertices and their corresponding deep neural network output, the proposed model which is named multimodal random walk neural network (MRW-NN) can be applied to not only learn robust representation of the existing multimodal data in the click graph, but also deal with the unseen queries and images to support cross modal retrieval. We evaluate the latent representation learned by MRW-NN on a public large-scale click log data set Clickture and further show that MRW-NN achieves much better cross modal retrieval performance on the unseen queries/images than the other state-of-the-art methods. EXISTING SYSTEM: One of the fundamental problems in image search is to rank image documents according to a given textual query. Existing search engines highly depend on surrounding texts for ranking images, or leverage the query-image pairs annotated by human labelers to train a series of ranking functions. However, there are two major limitations: 1) the surrounding texts are often noisy or too few to accurately describe the image content, and 2) the human annotations are resourcefully expensive and thus cannot be scaled up. We demonstrate in this paper that the above two fundamental challenges can be mitigated by jointly exploring the cross-view learning and the use of click-through data. The former aims to create a latent subspace with the ability in comparing information from the original incomparable views (i.e., textual and visual views), while the latter explores the largely available and freely accessible clickthrough data (i.e., “crowd sourced” human intelligence) for understanding query. Specifically, we propose a novel cross-view learning method for image search, named Click-through-based Cross view Learning (CCL), by jointly minimizing the distance between the mappings of query and image in the latent subspace and preserving the inherent structure in each original space. On a large-scale click-based image dataset, CCL achieves the improvement over Support Vector Machine based method by 4.0% in terms of relevance, while reducing the feature dimension by several orders of magnitude (e.g., from thousands to tens). Moreover, the experiments also demonstrate the superior performance of CCL to several state-ofthe-art subspace learning techniques. DISADVANTAGE: Former aims to create a latent subspace with the ability in comparing information from the original incomparable views (i.e., textual and visual views), while the latter explores the largely available and freely accessible click-through data (i.e., “crowd sourced” human intelligence) for understanding query. PROPOSED SYSTEM: In this work, we have presented a new approach to learning latent representation of the multimodal data from a click graph. By the minimization of the random walk error and the WU et al.: LEARNING OF MULTIMODAL REPRESENTATIONS WITH RANDOM WALKS 641 regularization penalty from the output of the modal-specific neural networks, the learned model has the ability not only to represent the explicit connections and the implicit connections of the vertices in the click graph with low-dimensional continuous vectors, but also to map the unseen queries and images to the latent subspace to support cross-modal retrieval. We have demonstrated the effectiveness of the learned representation by the proposed method MRW-NN and shown its superior to the comparative methods on cross-modal retrieval on a large-scale click log dataset. ADVANTAGE: The learned model has the ability not only to represent the explicit connections and the implicit connections of the vertices in the click graph with low-dimensional continuous vectors, but also to map the unseen queries and images to the latent subspace to support cross-modal retrieval. SYSTEM REQUIREMENTS HARDWARE REQUIREMENTS: Processor - Pentium –IV Speed - 1.1 Ghz Ram - 1GB Hard Disk - 200GB Key Board - Standard Windows Keyboard Mouse - Two or Three Button Mouse Monitor - SVGA SOFTWARE REQUIREMENTS: Operating System : Windows 7 Coding Language : Java Database : My SQL