# Prior knowledge elicitation: The past, present and future

## Details

Title : Prior knowledge elicitation: The past, present and future Author(s): Petrus Mikkola et al

## What

## Why

## How

## TLDR

## And?

## Rough Notes

- The
**fundamental goal**of the expert knowledge/prior elicitation is to help practitioners design models that better capture the essential properties of the modelled problem. (Right after Eq 2.1 the**goal**is to elicit the prior distribution from an expert). - 3 primary reasons for limited use:
- Technical.
- Practical.
- Societal.

- 7 key dimensions of the
*prior elicitation hypercube*:- Properties of the prior distribution itself.
- Model family and prior elicitation method's dependence on it.
- Underlying elicitation space.
- How the method interprets information provided by the expert.
- Computation.
- Form and quantity of interaction with the expert(s).
- Assumed capability of the expert in terms of domain knowledge and statistical understanding of the model.

- Grounds for prior specification: Encoding subjective knowledge, and the use cases of non-informative priors, regularizing priors, reference priors.
- Authors assume the likelihood partially determines the scale and range of reasonable values for the prior.
- Authors assume an expert, and an analyst. The former is from who we want to elicit knowledge from, the latter is the facilitator, an expert of the elicitation process who could also be a statistician.
- Cases where it is better to query the expert about for e.g.
**model observables**- here the underlying elicitation space is the**observable space**, the model observables (e.g. model outcomes) are the variables that can be observed and directly measured, contrast to**latent variables**(e.g. model parameters) that only exist within the context of the model and are not directly observed. - Since early prior elicitation research, the dominant approach has been "fitting a simple and convenient distribution to match the elicited summaries". This fitting approach does not assume any specific mechanism on how expert data is generated, e.g. inconsistencies are reconciled by least-squares optimization. In elicitation,
**overfitting**means eliciting more summaries than needed to fit a parametric distribution, and inconsistencies may occur - this may be better since the fitted compromise may in practice yield a more faithful representation of the expert's knowledge. An alternative to the fitting approach is to treat the elicitation process as another Bayesian inference problem, where the analyst's posterior belief about the expert's knowledge is updated inlight of received expert data - similar to*supra-Bayesian pooling*where knowledge from multiple experts are aggregated. - Single expert elicitation can be divided into
**one-shot elicitation**or**iterative elicitation**. Iterative elicitation is different from**interactive elicitation**, which may involve for e.g. a visualization of a prior distribution based on a slider position controlled by the expert. - Elicitation methods that differ by type of elicited summary: The
**interval method (V-method)**, asks for quantiles and the**fixed interval method (P method)**which asks about probabilities, or mix of both asking for a quantile-probability tuple, called the**PV method**. - Variable interval method described in (Oakley 2010):
- Analyst elicits median, lower and upper quartiles. This assessment task can could be presented in the form of a gamble if expert is not comfortable with directly providing quantiles.
- Analyst asks expert to reflect on their choices and check for consistency.
- Analyst fits a parametric distribution, minimizing squared errors between the elicited quartiles and the quartiles of the CDF of a parametric prior.
- Present the distribution back to the expert, allowing them to modify their original judgements.

- Fixed intervel method described in (Oakley 2010; O’Hagan 1998):
- Analyst elicits lower and upper bounds, mode, and 5 probabilities of \(\theta\), for e.g. \(\mathbb{P}(\theta_{min} < \theta < \theta_{mode}), \mathbb{P}(\theta_{min}<\theta<\frac{\theta_{mode}+\theta_{min}}{2})\).
- Analyst fits a parametric distribution minimizing sum of squared differences between elicited probabilities and the corresponding probabilities implied by the prior distribution. A default choice for the prior here is a Beta distribution with scaled support \((\theta_{min},\theta_{max})\).

- Histogram method:
- Analyst asks about minimum and maximum of \(\theta\), splits the range into sub-intervals then asks the expert to assign probabilities to each interval.
- Elicited histogram can be fit via nonparametric or parametric prior.

- Roulette method:
- Similar to histogram method, expert is asked to construct the histogram via gambling devices such as dice, coins etc.
- Expert allocates chips to bins, the probability of a parameter lying in a particular bin is interpreted as the proportion of chips allocated to that bin.

- For multivariate priors, a common strategy is to elicit \(\mathbf{\theta}\) by multiple independent univariate elicitations, where \(\mathbf{\theta}\) is transformed into \(\mathbf{\phi}\) where the coordinates are independent - this process is called
**elaboration**. Independence refers to subjective independence, where information about one coordinate will not affect the expert's opinion on others. E.g. if \(\theta_1,\theta_2\) are treatment effects of 2 drugs, then \(\mathbf{\phi} =(\theta_1, \theta_2 / \theta_1)\) may be considered since the relative effect \(\theta_2/\theta_1\) is likely to be subjectively independent of \(\theta_1\). - The analyst could also elicit each coordinate \(\theta_i\) conditioned on all others, i.e. the prior \(\mathbb{P}(\theta_i|\mathbf{\theta}_{-i})\).
- Gaussian copula based elicitation (Clemen and Reilly 1999): Elicits multivariate \(\mathbb{P}(\mathbf{\theta})\) based on Sklar's theorem and the fact that this joint can be written as a product of the marginals and the copula density, given that the marginal densities and the copula density are differentiable.
- Direct elicitation of multivariate priors: One method is based on the so-called median deviation concordance (MDC) probability, where the MDC is defined as the probability that 2 variables will both fall either above or below their respective median. Another method is to use isoprobability contours of a joint CDF, which is the collection of parameters that have the same cumulative probability.
- Eliciting methods for the Dirichlet distribution often rely on eliciting Beta marginals - here it is important to make sure the elicited Beta marginals are consistent with a Dirichlet distribution, for example the 2 parameters of a Beta marginal must sum to the same value for all marginals and the expectations of all marginals must sum to 1.
- Scoring rule based elicitation: Formulate assessment tasks so that they encourage the expert to provide careful assessment from which the subjective true probabilistic judgement can be recovered. These methods can be used for elicitation or evaluation of the performance of an elicitation method.
- Elicitation on the observable space falls mainly into
**predictive elicitation**, involving a setting where the expert is asked about the median, quantiles, mode or expectation of the response variable \(y\) at various design points, and the underlying model comes from the family of generalized linear models. - If we suppose \(\mathbb{P}(y)\) is to be elicited from the expert, and the likelihood \(\mathbb{P}(y|\theta)\) is specified by the analyst, with the analyst looking for the unknown prior \(\mathbb{P}(\theta)\), the equation \(\mathbb{P}(y) = \int \mathbb{P}(y|\theta)\mathbb{P}(\theta) d\theta\) is known as a Fredholm integral equation of the first kind (also mentioned in Vapnik's work). From here, additional regularity assumptions can be made to solve for the prior, e.g. Tikonov regularization. The analyst could also assume a parametric prior \(\mathbb{P}(\theta|\lambda)\) in which case the problem reduces to finding the optimal \(\lambda\).
- One method that avoids arbitrary choice of prior among all possible ones is choosing the prior based on maximum entropy distribution, leading to
**maximal entropy priors**. - When active learning is applied to prior elicitation, the authors call it active prior elicitation, or just
**active elicitation**. - Convergence guarantees in prior elicitation tasks are unknown to the authors.
- The authors propose viewing the prior elicitation event as an interplay of the expert and the analyst with the following characteristics:
- Analyst poses queries to the expert and gathers the expert's input into a dataset \(D\). The analyst's goal is to infer the distribution of the parameters \(\theta\), conditional on the experts input data, \(\mathbb{P}(\theta|D)\).
- The expert answers the analyst's queries based on their domain knowledge. The expert's input is modelled through the user model \(\mathbb{P}(z|q)\), that is the conditional probability of the expert's input \(z\) given the analyst's query \(q\). Thus \(D\) consists of \((z_i,q_i)_{i=1}^N\) where all \(q_i\) are treated as fixed. (? Does this mean the analyst cannot pose questions based on the expert's response ?)
- Expert data could be elicited from the observable space or parameter space, giving \(D_y, D_{\theta}\) respectively. We assume the analyst updates their knowledge according to Bayes rule, treating elicitation as a posterior inference problem: \[ \mathbb{P}(\theta|D_y,D_{\theta}) = \frac{\mathbb{P}(D_y|\theta)\mathbb{P}(D_\theta|\theta)\mathbb{P}(\theta)}{\mathbb{P}(D_y,D_\theta)}\] The conditional independence of the data states that given that a fixed parameter \(\theta\) exists which the expert thinks to be true, the mechanism how the expert reveals their knowledge about \(\theta\) is independent between the 2 elicitation spaces.
- This framework also requires specifying \(\mathbb{P}(z|q,\theta)\) which describes at the individual query \(q\) level how the expert responds if they think \(\theta\) is true. Since \(q\) is fixed, it is also the likelihood for a single datapoint \((z,q)\). The user model is \(\mathbb{P}(z|q)=\int \mathbb{P}(z|q,\theta)\mathbb{P}(\theta)d\theta\).

- Active elicitation: What is the optimal strategy to select a sequence of queries \((q_i)_{i=1}^N\)?. When the analyst poses a query \(q\), they anticipate that the expert's input \(z\) is distributed according to \(\mathbb{P}(z|q)\). The analyst can then apply the user model to choose the most informative queries, for e.g. if the analyst aims to maximize the expected information gain of \(\mathbb{P}(\theta|D)\) with respect to a new query \(q\), the user model is needed for anticipating the corresponding yet unseen response \(z\), which involves taking an expectation over \(\mathbb{P}(z|q)\).
- AI-assisted elicitation: A user model combined with an active learning criterion for selecting next queries. In theory possible to accommodate for expert's biases and incapabilities.
- MCMC with people (MCMCP) where participants take the place of the acceptance function, where their prior beliefs are constructed as the stationary distribution of the Markov chain that their judgements eventually converge to.

## Thoughts

- Is there any relationship between prior elicitation and bootstrapping in compilers?