Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What is coming… Today: Probabilistic models Improving classical models Latent Semantic Indexing Relevance feedback (Chapter 5) Monday Feb 5 Chapter 5 continued Wednesday Feb 7 Web Search Engines Chapter 13 & Google paper Annoucement: Free Food Event Where & When: GWC 487, Tuesday Feb 6th, 12:15-1:30 What: Pizza, Softdrinks Catch: A pitch on going to graduate school at ASU CSE… Meet admissions committee, some graduate students, some faculty members Silver-lining: Did we mention Pizza? Also not as boring as time-sharing condo presentations. Problems with Vector Model No semantic basis! Keywords are plotted as axes But are they really independent? Or they othogonal? No support for boolean queries How do you ask for papers that don’t contain a keyword? Probabilistic Model Objective: to capture the IR problem using a probabilistic framework Given a user query, there is an ideal answer set Querying as specification of the properties of this ideal answer set (clustering) But, what are these properties? Guess at the beginning what they could be (i.e., guess initial description of ideal answer set) Improve by iteration Probabilistic Model An initial set of documents is retrieved somehow User inspects these docs looking for the relevant ones (in truth, only top 10-20 need to be inspected) IR system uses this information to refine description of ideal answer set By repeting this process, it is expected that the description of the ideal answer set will improve Have always in mind the need to guess at the very beginning the description of the ideal answer set Description of ideal answer set is modeled in probabilistic terms Probabilistic Ranking Principle Given a user query q and a document dj, estimate the probability that the user will find the document dj interesting (i.e., relevant). The model assumes that this probability of relevance depends on the query and the document representations only. Ideal answer set is referred to as R and should maximize the probability of relevance. Documents in the set R are predicted to be relevant. But, how to compute probabilities? what is the sample space? The Ranking Probabilistic ranking computed as: sim(q,dj) = P(dj relevant-to q) / P(dj non-relevant-to q) This is the odds of the document dj being relevant Taking the odds minimize the probability of an erroneous judgement Definition: wij {0,1} P(R | vec(dj)) : probability that given doc is relevant P(R | vec(dj)) : probability doc is not relevant The Ranking sim(dj,q) = P(R | vec(dj)) / P(R | vec(dj)) = [P(vec(dj) | R) * P(R)] [P(vec(dj) | R) * P(R)] ~ P(vec(dj) | R) P(vec(dj) | R) P(vec(dj) | R) : probability of randomly selecting the document dj from the set R of relevant documents The Ranking sim(dj,q) ~ P(vec(dj) | R) P(vec(dj) | R) Where vec(dj) is of the form (k1, k2,k3....kt) Using pair-wise independence assumption among keywords ~ [ P(ki | R)] * [ P(ki | R)] [ P(ki | R)] * [ P(ki | R)] P(ki | R) : probability that the index term ki is present in a document randomly selected from the set R of relevant documents The Ranking sim(dj,q) ~ log [ P(ki | R)] * [ P(kj | R)] [ P(ki | R)] * [ P(kj | R)] ~ K * [ log P(ki | R) + log P(ki | R) P(ki | R) ] P(ki | R) ~ wiq * wij * (log P(ki | R) + log P(ki | R) ) P(ki | R) P(ki | R) where P(ki | R) = 1 - P(ki | R) P(ki | R) = 1 - P(ki | R) The Initial Ranking sim(dj,q) ~ ~ wiq * wij * (log P(ki | R) + log P(ki | R) ) P(ki | R) P(ki | R) Probabilities P(ki | R) and P(ki | R) ? Estimates based on assumptions: P(ki | R) = 0.5 P(ki | R) = ni N where ni is the number of docs that contain ki Use this initial guess to retrieve an initial ranking Improve upon this initial ranking Pluses and Minuses Advantages: Docs ranked in decreasing order of probability of relevance Disadvantages: need to guess initial estimates for P(ki | R) method does not take into account tf and idf factors Alternative Probabilistic Models Probability Theory Semantically clear Computationally clumsy Why Bayesian Networks? Clear formalism to combine evidences Modularize the world (dependencies) Bayesian Network Models for IR Inference Network (Turtle & Croft, 1991) Belief Network (Ribeiro-Neto & Muntz, 1996) Bayesian Networks Definition: Bayesian networks are directed acyclic graphs (DAGS) in which the nodes represent random variables, the arcs portray causal relationships between these variables, and the strengths of these causal influences are expressed by conditional probabilities. Bayesian Networks yi : parent nodes (in this case, root nodes) x : child node y1 y2 yi cause x Y the set of parents of x The influence of Y on x x can be quantified by any function F(x,Y) such that x F(x,Y) = 1 0 < F(x,Y) < 1 For example, F(x,Y)=P(x|Y) yt Bayesian Networks x1 Given the dependencies declared in a Bayesian Network, the x3 x2 expression for the joint probability can be computed as a product of local conditional probabilities, for example, x4 P(x1, x2, x3, x4, x5)= P(x1 ) P(x2| x1 ) P(x3| x1 ) P(x4| x2, x3 ) P(x5| x3 ). P(x1 ) : prior probability of the root node x5 Bayesian Networks x1 In a Bayesian network each variable x is conditionally independent of all its non-descendants, given its parents. x3 x2 x4 For example: P(x4, x5| x2 , x3)= P(x4| x2 , x3) P( x5| x3) x5 An Example Bayes Net •Typically, networks written in causal direction wind up being most compact •need least number of probabilities to be specified Two Models dj k1 k2 … q ki … kt k1 and q ki k2 kt q2 or or q1 I “Inference Network model” d1 dj dn “Belief network model” ku Comparison Inference Network Model is the first and well known • Used in Inquery system Belief Network adopts a set-theoretic view • a clearly defined sample space • a separation between query and document portions • is able to reproduce any ranking produced by the Inference Network while the converse is not true (for example: the ranking of the standard vector model) Belief Network Model As the Inference Network Model Epistemological view of the IR problem Random variables associated with documents, index terms and queries Contrary to the Inference Network Model Clearly defined sample space Set-theoretic view Different network topology Belief Network Model The Probability Space Define: K={k1, k2, ...,kt} the sample space (a concept space) u K a subset of K (a concept) ki an index term (an elementary concept) k=(k1, k2, ...,kt) a vector associated to each u such that gi(k)=1 ki u ki a binary random variable associated with the index term ki , (ki = 1 gi(k)=1 ki u) Belief Network Model A Set-Theoretic View Define: a document dj and query q as concepts in K a generic concept c in K a probability distribution P over K, as P(c)=uP(c|u) P(u) P(u)=(1/2)t P(c) is the degree of coverage of the space K by c Belief Network Model Network topology q query side k1 ki k2 kt ku document side d1 dj dn Belief Network Model Assumption P(dj|q) is adopted as the rank of the document dj with respect to the query q. It reflects the degree of coverage provided to the concept dj by the concept q. Belief Network Model q The rank of dj k1 P(dj|q) = P(dj q) / P(q) ~ P(dj q) d ~ u P(dj q | u) P(u) ~ u P(dj | u) P(q | u) P(u) ~ k P(dj | k) P(q | k) P(k) 1 ki k2 dj kt dn ku Belief Network Model For the vector model Define Define a vector ki given by ki = k | ((gi(k)=1) (ji gj(k)=0)) in the state ki only the node ki is active and all the others are inactive Belief Network Model For the vector model Define (wi,q / |q|) if k = ki gi(q)=1 P(q | k) = 0 if k ki v gi(q)=0 P(¬q | k) = 1 - P(q | k) (wi,q / |q|) is a normalized version of weight of the index term ki in the query q Belief Network Model For the vector model Define (wi,j / |dj|) if k = ki gi(dj)=1 P(dj | k) = 0 if k ki v gi(dj)=0 P(¬ dj | k) = 1 - P(dj | k) (wi,j / |dj|) is a normalized version of the weight of the index term ki in the document d,j Bayesian Network Models Computational costs Inference Network Model one document node at a time then is linear on number of documents Belief Network only the states that activate each query term are considered The networks do not impose additional costs because the networks do not include cycles. Bayesian Network Models Impact The major strength is net combination of distinct evidential sources to support the rank of a given document.