Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Database model wikipedia , lookup

Transcript
Learning of Multimodal Representations With
Random Walks on the Click Graph
Abstract—In multimedia information retrieval, most classic approaches tend to represent
different modalities of media in the same feature space. With the click data collected from the
users’ searching behavior, existing approaches take either one-to-one paired data (text–image
pairs) or ranking examples (text–query–image and/or image–query–text ranking lists) as training
examples, which do not make full use of the click data, particularly the implicit connections
among the data objects. In this paper, we treat the click data as a large click graph, in which
vertices are images/text queries and edges indicate the clicks between an image and a query. We
consider learning a multimodal representation from the perspective of encoding the
explicit/implicit relevance relationship between the vertices in the click graph. By minimizing
both the truncated random walk loss as well as the distance between the learned representation of
vertices and their corresponding deep neural network output, the proposed model which is named
multimodal random walk neural network (MRW-NN) can be applied to not only learn robust
representation of the existing multimodal data in the click graph, but also deal with the unseen
queries and images to support cross modal retrieval. We evaluate the latent representation
learned by MRW-NN on a public large-scale click log data set Clickture and further show that
MRW-NN achieves much better cross modal retrieval performance on the unseen queries/images
than the other state-of-the-art methods.
EXISTING SYSTEM:
One of the fundamental problems in image search is to rank image documents according to a
given textual query. Existing search engines highly depend on surrounding texts for ranking images, or
leverage the query-image pairs annotated by human labelers to train a series of ranking functions.
However, there are two major limitations: 1) the surrounding texts are often noisy or too few to accurately
describe the image content, and 2) the human annotations are resourcefully expensive and thus cannot be
scaled up. We demonstrate in this paper that the above two fundamental challenges can be mitigated by
jointly exploring the cross-view learning and the use of click-through data. The former aims to create a
latent subspace with the ability in comparing information from the original incomparable views (i.e.,
textual and visual views), while the latter explores the largely available and freely accessible clickthrough data (i.e., “crowd sourced” human intelligence) for understanding query. Specifically, we propose
a novel cross-view learning method for image search, named Click-through-based Cross view Learning
(CCL), by jointly minimizing the distance between the mappings of query and image in the latent
subspace and preserving the inherent structure in each original space. On a large-scale click-based image
dataset, CCL achieves the improvement over Support Vector Machine based method by 4.0% in terms of
relevance, while reducing the feature dimension by several orders of magnitude (e.g., from thousands to
tens). Moreover, the experiments also demonstrate the superior performance of CCL to several state-ofthe-art subspace learning techniques.
DISADVANTAGE:
 Former aims to create a latent subspace with the ability in comparing information from the
original incomparable views (i.e., textual and visual views), while the latter explores the largely
available and freely accessible click-through data (i.e., “crowd sourced” human intelligence) for
understanding query.
PROPOSED SYSTEM:
In this work, we have presented a new approach to learning latent representation of the
multimodal data from a click graph. By the minimization of the random walk error and the WU
et al.: LEARNING OF MULTIMODAL REPRESENTATIONS WITH RANDOM WALKS 641
regularization penalty from the output of the modal-specific neural networks, the learned model
has the ability not only to represent the explicit connections and the implicit connections of the
vertices in the click graph with low-dimensional continuous vectors, but also to map the unseen
queries and images to the latent subspace to support cross-modal retrieval. We have
demonstrated the effectiveness of the learned representation by the proposed method MRW-NN
and shown its superior to the comparative methods on cross-modal retrieval on a large-scale
click log dataset.
ADVANTAGE:
 The learned model has the ability not only to represent the explicit connections and the
implicit connections of the vertices in the click graph with low-dimensional continuous
vectors, but also to map the unseen queries and images to the latent subspace to support
cross-modal retrieval.
SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS:
 Processor
-
Pentium –IV
 Speed
-
1.1 Ghz
 Ram
-
1GB
 Hard Disk
-
200GB
 Key Board
-
Standard Windows Keyboard
 Mouse
-
Two or Three Button Mouse
 Monitor
-
SVGA
SOFTWARE REQUIREMENTS:
 Operating System
:
Windows 7
 Coding Language
:
Java
 Database
:
My SQL