|
VB_approach
The variational Bayesian approach
Typically, the likelihood function contains high-order interaction terms between subsets of the unknown model parameters ϑ (e.g., because of nonlinearities in the observation function g). This implies that the high-dimensional integrals required for Bayesian parameter estimation and model comparison cannot be evaluated analytically. Also, it might be computationally very costly to evaluate them using numerical brute force or Monte-Carlo sampling schemes. This motivates the use of variational approaches to approximate bayesian inference (Beal, 2003). In brief, variational Bayes (VB) is an iterative scheme that indirectly optimizes an approximation to both the model evidence p(y|m) and the posterior density p(ϑ|y,m). The key trick is to decompose the log model evidence into:
where q(ϑ) is any density over the model parameters, D_KL is the Kullback-Leibler divergence and the so-called free energy F(q) is defined as:
where the expectation 〈.〉q is taken under q. One can see that maximizing the functional F(q) with respect to q indirectly minimizes the Kullback-Leibler divergence between q(ϑ) and the exact posterior p(ϑ|y,m). The decomposition of the log evidence is complete in the sense that if q(ϑ)=p(ϑ|y,m), then F(q)=ln p(y|m). The iterative maximization of free energy is done under simplifying assumptions about the functional form of q, rendering q an approximate posterior density over model parameters and F(q) an approximate log model evidence (actually, a lower bound). Typically, one first partitions the model parameters ϑ into distinct subsets and then assumes that q factorizes into the product of the ensuing marginal densities. This assumption of “mean-field” separability effectively replaces stochastic dependencies between model variables by deterministic dependencies between the moments of their posterior distributions:
where I have used a bi-partition of the parameter space (ϑ={ϑ1,ϑ2}) and the right-hand term of Equation can be broken down into a weighted sum of the moments of the distribution q(ϑ2 ). The equation above can be generalized to any arbitrary mean-field partition and captures the essence of the variational Bayesian approach. The resulting VB algorithm is amenable to analytical treatment (the free energy optimization is made with respect to the moments of the marginal densities), which makes it generic, quick and efficient.
In addition to the mean-field trick, VBA relies upon a further parametric approximation, which essentially consists in summarizing the marginal posterior by their two first-order moments (mean and variance). This effectively means performing a local gaussian approximation. | |