A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships

Details

Title : A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships Author(s): Gregor F Cooper

What

Why

How

TLDR

Take the setting of observational data. In general correlation does not imply causation, but observational data can constrain causal relationships among variables. The most important general idea of this paper is that information about statistical independence and dependence among a set of variables can be used to constrain the possible causal relationships among a subset of those variables. Here, the authors refer to \(X\) as a cause of \(Y\) if a hypothetical idealized RCT would conclude that there exists some manipulation of \(X\) that leads to a change in \(\mathbb{P}(Y)\). By constraint-based, the author refers to a 2 step procedure in which statistical tests are used to establish CI relations among variables, and those relationships are used to constrain the types of causal relationships that can exist among the model variables. and make the following assumptions:

No missing data.
Discrete values (not required, makes things convenient).
Underlying causal processes can be modelled by some Bayesian Network (BN) \(G\), which may contain hidden variables not in the model variables \(\textbf{V}\).
Causal faithfulness: CI relations in the distribution imply d-separation in the graphical model.
No selection bias: The graph structure to be discovered does not affect the sampling of observed data.
Valid CI testing.
A known uncaused entity: There is a known \(W \in \textbf{V}\) that is not caused by another variable in \(\mathbf{V}\).

Here, \(X \rightarrow Y\) denotes direct causation of \(Y\) by \(X\), relative to other variables. Suppose there is a variable \(U\) which is a mediator, i.e \(X \rightarrow U \rightarrow Y\) - here, \(X\) is an indirect cause of \(Y\).

The author then lists all possible DAGs for \(W,X\), \(W,Y\), and \(X,Y\), including the case where there might be a hidden confounder. (\(W\) is as per the last assumption above). This results in 4,4, 6 DAGs respectively. From this, the author constructs all 96 possible DAGs for these variables, and list them alongside 3 possible conditions:

\(D(W,X)\) - \(W,X\) are not d-separated, i.e. they are dependent.
\(D(X,Y)\) - \(X,Y\) are not d-separated, i.e. they are dependent.
\(I(W,Y|X)\) - \(W,Y\) are d-separated by \(X\), i.e. they are conditionally independent given \(X\).

From the 96 DAGs, the author singles out those for which the above 3 conditions hold, and call such a DAG a positive pattern. In each of those DAGs, \(X\) causes \(Y\). Algorithm needs to condition on atmost one variable at a time.

The Local Causal Discovery (LCD) algorithm thus relies on searching for positive patterns that exist among all triplets of variables in \(\textbf{V}\). Note that a positive pattern implies \(X\rightarrow Y\) but the converse does not hold in general.

In 96 of the DAGs, there are 32 of which where \(X \rightarrow Y\), which is a limitation. Also, the local nature of the algorithm means that for a chain \(W \rightarrow X \rightarrow Y \rightarrow Z \rightarrow A\), the LCD algorithm will output all 6 edges in the direction of causation.

And?

Thoughts

Reread Section 5.2 on Bayesian methods, treasure trove of papers cited there.