Foundations and new horizons for causal inference¶
本文是翻译 workshop 论文的工作。
Bernhard等人欧洲团队的“Foundations and new horizons for causal inference”研讨会特别好!The talks and discussions at the workshop will help to shape the field in the coming years.
快速开始¶
(因果推断始于经济和生物统计等学科,它刚刚才开始成为人工智能的一个重要工具,数学基础依旧很零碎)While causal inference is established in some disciplines such as econometrics and biostatistics, it is only starting to emerge as a valuable tool in areas such as machine learning and artificial intelligence. The mathematical foundations of causal inference are fragmented at present.
(本研讨会的目的是解决上面问题)The aim of the workshop Foundations and new horizons for causal inference was to - unify existing approaches and mathematical foundations as well as - exchange ideas between different fields.
We regard this workshop as successful in that it brought together researchers from different disciplines who were able to learn from each other not only about different formulations of related problems, but also about solutions and methods that exist in the different fields.
该研讨会的目标之一是召集来自不同领域的研究人员 to facilitate communication and cross-pollination. 在这方面,研讨会无疑是非常成功的。它吸引了来自人工智能,生物统计学,计算机科学,经济学,流行病学,机器学习,数学和统计学的研究人员。New collaborations were initiated between researchers who probably would not have crossed paths were it not for this workshop. 这次成功的很大一部分原因是该研讨会是在奥伯沃尔法赫数学大师的主持下举行的。Four broad areas of causal inference were discussed at the workshop.
Mathematical foundations. 纯粹的统计模型旨在描述数据生成过程的基础分布。 然而,因果模型超越了这个目标,尝试建立该系统扰动的模型。Formulating such models, including the notion of interventions, thus lies at the core of causality research. Even though several frameworks exist, this is still a topic of current research, in particular, when considering dynamical models, extreme valued processes or the question of which variables to include in the model, say. Talks covering this topic include the ones from Niels Hansen, Dominik Janzing, Steffen Lauritzen, Karthika Mohan, Emilija Perkovic, Rajen Shah, Ilya Shpitser, Jin Tian, and Sebastian Weichwald.
Causal discovery. 尽管许多因果推论方法都假定因果模型为已知的因果结构(通常是有向无环图),但使用因果发现从复杂数据(例如生物学应用中的时间序列)估算结构方面也引起了人们的极大兴趣。 Research goals include to develop methods that are robust with respect to model misspecification, scale to large data sets, deal with the existence of hidden variables or incorporate the information of interventional experiments. Talks covering this topic include the ones from Mathias Drton, Aapo Hyvarinen, Nicola Gnecco, Marloes Maathuis, Linbo Wang, and Kun Zhang.
Machine Learning and causality. There is growing interest to adjust Machine Learning methods from a purely association-based learning approach towards causal inference. The hope is to obtain methods for classical machine learning problems such as prediction or semi-supervised learning that generalize better to test data (that may come from the same or from a related distribution as the training data) or are more sample- efficient. Moreover, causality can provide means to better understand classical machine learning paradigms and their applicability.
Applications. Numerous applications were discussed, including personalised medicine, biological causal network discovery and climate science. Talks covering applications include the ones from Gregory Cooper, Sara Geneletti, Jakob Runge, and Sach Mukherjee.
(机器学习方法目前已成功应用于许多方面)Machine learning methods are currently successfully applied to a wide range of applications. Impressive empirical results are obtained in areas such as image classification or speech recognition. Many scientific problems, however, go beyond the task of iid prediction. In some domains such as public health, biology or Earth system science, we are usually interested in finding policies that yield a better outcome. In other areas, we expect that the test data will differ significantly from the training data. Causal concepts have the potential to play a role in solving many of these problems. We therefore expect to see more research on causality. While many of the goals connected to research on causality are ambitious, any advance in this area will potentially have a large impact not only in mathematics but in the natural sciences in general(因果一领域的任何进步都可能对数学以及整个自然科学产生巨大影响).
研讨会召集了因果推理方面的专家,他们从事计量经济学,机器学习,统计学和自然科学的基础和应用。研讨会上的报告和讨论将有助于在未来几年内塑造这一领域。
Bin Yu, Hypothesis generation through 数据科学的三项原则: predictability, computability and stability (PCS)
Gregory F. Cooper, Instance-Specific Causal Bayesian Network Structure Learning
Sach Mukherjee, Towards scalable causal learning
Linbo Wang, Causal Inference with Unmeasured Confounding: A New Look at Instrumental Variable
Kun Zhang, Towards more reliable causal discovery and prediction
L ́eon Bottou, Questions about ML and AI
David Blei, The Blessing of Multiple Causes: Extended Abstract
Ilya Shpitser, Identification And Estimation Via A Modified Factorization Of A Graphical Model
Christina Heinze-Deml, Conditional variance penalties and domain shift robustness
Dominik Janzing, Causal Regularization
Vanessa Didelez, Causal mediation with longitudinal mediator and survival outcome
Jin Tian (joint with Juan D. Correa, Elias Bareinboim) Identification of Causal Effects in the Presence of Selection Bias … … 50
Emilija Perkovi ́c (joint with Leonard Henckel, Marloes H. Maathuis) Graphical criteria for efficient total effect estimation in causal linear models
Karthika Mohan, Graphical Models for Missing Data
Julius von Ku ̈gelgen, Semi-Supervised Learning, Causality and the Conditional Cluster Assumption
Sebastian Weichwald, Causal Consistency of SEMs & Causal Models as Posets of Distributions
Michele Sebag, Structural agnostic modeling: An information theoretic approach to causal learning
数据可续三原则:可预测,可计算,稳健(PCS)¶
(我提出一个 PCS 框架, 被ML启发,用稳健性增加可预测性和可计算性)We propose a framework in [4] that draws from three principles of data science: predictability, computability, and stability (PCS) to extract reliable, reproducible information from data and guide scientific hypothesis generation. The PCS framework builds on key ideas in machine learning, using predictability as a reality check and evaluating computational considerations in data collection, data storage, and algorithm design. It augments predictability and computability with an overarching stability principle, which expands statistical uncertainty considerations to assesses how results vary with respect to choices (or perturbations) made across the data science life cycle.
(迭代随机森林)As a case study of PCS, we propose in [2] iterative Random Forests (iRF). Genomics has revolutionized biology, enabling the interrogation of whole tran- scriptomes, genome-wide binding sites for proteins, and many other molecular processes.
Questions about ML and AI¶
L ́eon Bottou, Questions about ML and AI
本报告的目的是 to explain the relevance of causation to research in artificial intelligence.
(ML 和 AI 之间存在鸿沟。第一部分说明ML中需要问题是因果问题,第二部分说明因果为AI提供了路线图)Despite the promises of pundits, there is indeed a large gap between the technological capabilities of machine learning (ML) and the vague and elusive goals of artificial intelligence (AI). The first part of the talk reviews some of the common issues with ML methods and shows how they display many of the characteristic issues one encounters in causal inference research. The second part of the talk is an attempt to name many of the nuances of causation in the hope to provide a roadmap to approach artificial intelligence.
Success and shortcomings of ML, 优点和缺点非常清楚!
In conclusion, although they can precisely replicate the observed training distribution, ML systems lack in common sense because they cannot easily infer what could have been observed under closely related circumstances.
The many faces of causation
On the one hand, the above description of the ML shortcomings emphasizes their similarity with fundamental issues in causation. On the other hand, none of these problems come with a causal graph or with well defined interventions. This means that we may not be able to understand them using solely the manipulative definition of causation that is common statistics. Fortunately, an abundant literature in epistemology, metaphysics, and psychology offers alternative ways to understand causation, a catalogue of ideas for future research. The following is an attempt to name some of them.
一方面,以上对ML缺点的描述强调了它们与因果关系中基本问题的相似性。另一方面,none of these problems come with a causal graph or with well defined interventions. 这意味着我们可能无法仅使用常见统计的因果关系的操纵性定义来理解它们。幸运的是,认识论,形而上学和心理学方面的大量文献提供了理解因果关系的替代方法,因果关系是未来研究的思路可能选项。The following is an attempt to name some of them.
Manipulative causation focuses on predicting the outcome of well defined interventions on a causal system.
Causal invariance investigates which properties of a system are conserved when affected by explicit or implicit interventions.
Causal reasoning focuses on causal statements as elements of reasoning chains. Statements that cannot be verified experimentally acquire value when they take part in chains that make verifiable predictions.
Causal explanation provides causal commentaries that help understanding an observed phenomena but may not be complete enough to make sensible predictions [11].
Dispositional causation and affordances associates objects with the causal relationship they enable [8, 4].
Causal intuition take advantage of observed data to suggest short lists of plausible causal models whose validity can later be investigated using more direct experiments.
可识别性¶
Ilya Shpitser, Identification And Estimation Via A Modified Factorization Of A Graphical Model
众所周知,在没有 hidden common causes 的情况下,可以通过称为g公式的 truncated factorization 来识别因果效应。我们的工作表明,
much of modern non-parametric identification theory may be rephrased as a more complex truncated factorization derived from the factorization of the observed marginal of a hidden variable graphical model defining the nested Markov model.
Further, viewing identified functionals as a modified factorization directly leads to maximum likelihood inference for causal parameters in hidden variable models.
The nested Markov model is defined on an acyclic directed mixed graph (ADMG) obtained from a hidden variable DAG by the latent projection operation [5].
中介分析¶
In causal mediation analysis, we are interested in understanding different mechanisms (causal pathways) of a treatment or exposure affecting some outcomes. Often this is formalised in terms of (in)direct causal effects — popular notions of these are based on so-called ‘nested counterfactuals’, \(Y(a, M(a'))\). Identification relies crucially on a cross-world independence \(Y(a,m)⊥M(a′)\). Because of this, the concepts of natural (in)direct effects run into difficulties of interpretation in the particular context of survival analyses, where \(Y\) is a survival time and the mediator is a whole process \(\{M_t\}\). These problems are:
Problem 1: If survival is shorter, say, under \(A = a′\) than under \(A = a\), then the second index of \(Y(a, M_t(a′))\) is ‘incomplete’; the nested counterfactual is not well-defined.
Problem 2: Later survival as well as later measurements of the mediator process depend on prior survival. Hence, prior survival acts like a post-treatment confounder and, so, identifiability fails.
In this work, I propose an alternative approach that does not suffer from such shortcomings [1]: this novel approach follows Robins and Richardson [2], where mechanisms need to be specified allowing a separation into the different treatment paths, formalized using an augmented directed acyclic graph (DAG).
The proposed new approach solves a crucial conceptual problem of mediation analysis with a survival outcome and can be extended to yield much needed clar- ification in competing risks settings [4]. It is founded in decision theory, avoids genuine counterfactual (cross-world) assumptions and, even in non-survival con- texts, constitutes an interesting alternative to the prevailing structural equation modelling.
选择偏差¶
Cause-and-effect relations are one of the most valuable types of knowledge sought after throughout the data-driven sciences since they translate into stable and generalizable explanations as well as efficient and robust decision-making capabilities. Inferring these relations from observational data, however, is a challenging task. Two of the most common barriers to this goal are known as confounding and se- lection biases. The former stems from the systematic bias introduced during the treatment assignment, while the latter comes from the systematic bias during the collection of units into the sample. We consider the problem of identifying causal effects when both confounding and selection biases are simultaneously present. Specifically, given qualitative causal assumptions in the form of a causal graph G and observational distribution P (possible under confounding bias and selection bias), we study whether a causal effect P(y|do(x)) is computable from P.
缺失数据处理¶
Graphical Models for Missing Data
Missing data (also known as incomplete data) are data in which values of one or more variables in a dataset are observed for some samples and missing for the rest. Missingness, which is a rather common phenomenon in practice, can occur due to several reasons such as an ill-designed questionnaire and reluctance of subjects to answer questions on sensitive topics (e.g. income, religion, sexual orientation etc.). Table 1 exemplifies a dataset over two variables in the ideal scenario of no missingness, whereas table 2 exemplifies a dataset with missing values that one would find in the real world. m in table 2 denotes a missing value.
因果一致性和表示学习¶
Causal Consistency of SEMs & Causal Models as Posets of Distributions
We can often describe the same system with reference to different terminology, levels of detail, and concepts. We can, for example, reason about individual neurons’ firing rates, about average blood oxygen levels in different brain regions, or about electromagnetic activity of so-called cortical dipoles and about how any of those maintain faster reaction times or certain movements. We discuss the following conceptual challenge that is fundamental to causal modelling of real-world systems such as, for example, the brain: (因果表示问题)How can we formally characterise the relata, aggregate features, and representations that are suitable for a pragmatically useful causal model and how do different description levels relate to one another? The variables we can and do measure do not necessarily lend themselves as is for a causal description.
This take on the interplay between causal reasoning and variable transforma- tions enables one to in principle consider and identify transformations that exhibit desired properties, e. g. that allow for ‘simpler’ (in terms of complexity), more ‘interpretable’ (in terms one would need to define precisely), or more ‘robust’ (against interventional regime changes) causal models as compared to using the plain observed variables. For example, [2] considers the consistent abstraction of causal models via appropriate variable transformations. Robustness to domain shifts resulting from interventions is considered in [3]: The authors argue in favour of a representation that is consistent with the underlying causal structure in order for a learner to adapt faster to new environments and to thus obtain good transfer.(Bengio 的因果表示学习) Future research may discuss how to soften the restrictive requirements for a transformation to be exact and how to sensibly arrive at a notion of approximate transformations and a meaningful causal interpretation thereof.