Prior Knowledge
The computation of a posterior distribution over parameters requires the definition of a prior distribution on those parameters and hyper-parameters: p(ϑ|m).
Parameters
Here we make the simplifying assumption that distributions are Gaussian. Those distributions are fully parameterized by their first two moments.
The choice of priors is important. Priors can vary in how informative they are. If very informative, they will have a strong contribution to the posterior (like a bias toward the prior). In the case of uninformative or ‘flat’ priors, they will have little influence on the posterior. Here, the informativeness is related to the covariance matrix of the Gaussian distribution (Informative = low variance, Uninformative = high variance)
Hyper-parameters
Prior knowledge on stochastic innovations and observation noise are embedded in the gamma hyper-parameters. To rule out stochastic innovation, we want E(α|a,b)=0 and V(α|a,b)= 0 This can be achieved by setting b=∞ and 0≤a<∞. Hyperparameters for observation noise can be set to achieve any expected value and variance E(σ|a,b) and V(σ|a,b)
Whatever the priors, their contribution to the posterior tends to be null as the size of data tends to infinity.