A BAYESIAN APPROACH TO INFORMATION RETRIEVAL FROM SETS OF ITEMS Katherine A. Heller1 , Zoubin Ghahramani2,3 (1) Gatsby Computational Neuroscience Unit, University College London, London WC1N 3AR, UK (2) Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK http://learning.eng.cam.ac.uk/zoubin (3) Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
Abstract We consider the problem of retrieving items from a concept or cluster, given a query consisting of a few items from that cluster. We formulate this as a Bayesian inference problem based on models of human categorization and generalization and describe a very simple algorithm for solving it. Our algorithm ends up with a score which can be evaluated exactly using a single sparse matrix multiplication. This makes it possible to apply the method to retrieval from very large datasets (i.e. millions of items). We evaluate our algorithm on three problems: retrieving movies from a database of movie preferences, finding sets of similar authors based on their word usage in a scientific conference, and finding completions of sets of words appearing in encyclopaedia articles. Compared to “Google Sets”, we show that our “Bayesian Sets” retrieval method gives very reasonable set completions. Finally, we show how the Bayesian Sets algorithm can form the basis of a Content-Based Image Retrieval (CBIR) system. I will describe and demonstrate this Bayesian CBIR system and mention a range of other applications of our approach.
Key Words: Information retrieval, Vision, Image Retrieval, Google
Mono and Bistatic SAR Imaging geometries and the Fourier domain data. ..... which gives the possibility of jointly segmenting and reconstruction [18, 19, 20, 21].
it can answer queries about concept satisfiability (whether P(e|CT) = 0), about con- cept overlapping (how close e is to a concept C as P(e|C,CT)), and about ...
on the case where the data in all segments are modeled by Gaussian ... which corresponds to a particular value of the hidden variable, the time series is ...
One of the barriers to the application of Bayesian identification is ..... Equation 7 provides the joint probability density function (PDF) of the four elastic constants ..... Yin, W., Automated strain analysis system: development and applications.
late the free energy difference, âA, between states of a system of interest. Impor- tant examples ... large number of potential drugs for binding with the protein target.
Otherwise, we must consider the border of the border of y, i.e.,. Border2(y) · a, instead of Border(y) · a and repeat this process, until. Borderk (y) · a is a prefix of y ...
Information Retrieval (Keyword Based Search) can be ... storage and transmission, instead of the traditional paper ... and stored as images in databases. Optical ...
f and the measurement system is called Forward problem. â· Infering on ... Making an image using a camera, a microscope or a telescope. â» f(x, y) real .... More specific and specialized priors, particularly through the ...... modeling of HR image
If it is non-empty, we slide the word so that we compare x[sk x (i)] and t[j]. ... Knuth-Morris-Pratt/Maximum disjoint borders (cont). Therefore γx(|y|) =..
some original proposals implicating Topic Detection, Lexical Tuning and Intelligent ... service systems, Information Extraction, Human-computer interaction.
Statistical Sense Disambiguation with Relatively Small Corpora using. Dictionary Definitions. Proceedings of the 33rd Annual Meeting of the ACL. MUC-5 (1993) ...
needs more advanced tools to be explored. We focus on association .... present during all KDD process: upstream to apprehend the data and to carry out the.
Rep. RR-2706. T. Lehmann, C. Gonner, and K. Spitzer, âSurvey: interpolation ... R. R. Schultz, L. Meng, and R. L. Stevenson, âSubpixel motion estimation ...
iii) a Bernoulli variable qn which is always equal to zero except when a change point occurs. The rest .... Choosing a prior pdf for t is also usual in classical approach. A simple ..... (Berlin), pp. 244â254, Japanese Society for Artificial Intell
3 Department of Electrical Engineering and Computer Science, Robert R. McCormick School of Engineering and Applied Science,. Northwestern University ... within the general framework of frequency-domain multi- channel signal processing .... tuting thi
Abstract. Change points detection in time series is an important area of research in statistics, has a long history and has many applications. However, very often ...
Oct 31, 2008 - Property x · ǫ = ǫ · x = x holds for all strings x. ...... the same pair of nodes and listing the labels, separated by commas: q0 q1 q2. 0. 1. 0. 1. 0,1.
two systems, say 0 and 1, described by the partition functions Q0 and Q1, ... then P0(âU) would be a Gaussian, as a consequence of the central limit theorem. In ... The above considerations raise a question: how to determine the optimal N and ...
Interestingly, many evolutionary approaches also feature local search tech- ... prominently featured in the literature related to structure learning, to the au- ... E. P(A=e1) = 0.75. P(A=e2) = 0.25. Fig. 1. On the left, a directed acyclic graph. On
We regard S as a manifold with a local coordinate system {Î; ξ. 1. ,...,ξ n} g. F. = (g. F ij. ) is the Fisher metric (Fisher information matrix) of S def. ââ g. F ij. (ξ) :=.
A common objective of molecular simulations in chemistry and biology is to calculate .... We follow a standard Bayesian approach to find the optimal N. The data consist .... In the second equality we used (11), and in the third we took advantage of t