{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"# Ch16 工具变量法"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"## 内容摘要"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"到目前为止,本书中描述的因果推论方法都基于一个不可检验的关键假设:all variables needed to adjust for confounding and selection bias have been identified and correctly measured. 如果这个假设是错误的,并且在一定程度上一直是这样,那么我们的因果估计中就会有剩余偏差。\n",
"\n",
"It turns out that there exist other methods Instrumental variable estimation is one of those methods. Economists and other social scientists reading this book can breathe now. We are finally going to describe a very common method in their fields, a method that is unlike any other we have discussed so far.\n",
"\n",
"事实证明,存在其他方法可以that can validly estimate causal effects under an alternative set of assumptions that do not require measuring all adjustment factors. 工具变量估计是这些方法之一。读这本书的经济学家和其他社会科学家 can breathe now. 我们最终将在他们的领域中描述一种非常通用的方法,该方法与我们迄今为止讨论的任何其他方法都不相同。"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### 16.1 模型假设\n",
"\n",
"The three intrumental conditions.\n",
"\n",
"`\n",
"Estimand expression:\n",
"Expectation(Derivative(y, [Z0, Z1])*Derivative([v0], [Z0, Z1])**(-1))\n",
"Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0,Z1})\n",
"Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→{v0}, then ¬({Z0,Z1}→y)\n",
"`"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:dowhy.causal_model:Model to find the causal effect of treatment ['v0'] on outcome ['y']\n",
"INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['Unobserved Confounders', 'W3', 'W0', 'W2', 'W1', 'W4']\n",
"WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.\n",
"INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.\n",
"INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['Z0', 'Z1']\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Estimand type: nonparametric-ate\n",
"### Estimand : 1\n",
"Estimand name: backdoor\n",
"Estimand expression:\n",
" d \n",
"─────(Expectation(y|W3,W0,W2,W1,W4))\n",
"d[v₀] \n",
"Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W0,W2,W1,W4,U) = P(y|v0,W3,W0,W2,W1,W4)\n",
"### Estimand : 2\n",
"Estimand name: iv\n",
"Estimand expression:\n",
"Expectation(Derivative(y, [Z0, Z1])*Derivative([v0], [Z0, Z1])**(-1))\n",
"Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0,Z1})\n",
"Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→{v0}, then ¬({Z0,Z1}→y)\n",
"\n"
]
}
],
"source": [
"from dowhy import CausalModel\n",
"import dowhy.datasets\n",
"\n",
"# Load some sample data\n",
"data = dowhy.datasets.linear_dataset(\n",
" beta=10,\n",
" num_common_causes=5,\n",
" num_instruments=2,\n",
" num_samples=1000,\n",
" treatment_is_binary=True)\n",
"\n",
"# Create a causal model from the data and given graph.\n",
"model = CausalModel(\n",
" data=data[\"df\"],\n",
" treatment=data[\"treatment_name\"],\n",
" outcome=data[\"outcome_name\"],\n",
" graph=data[\"gml_graph\"])\n",
"\n",
"# Identify causal effect and return target estimands\n",
"identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)\n",
"print(identified_estimand)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from graphviz import Source\n",
"Source(data['dot_graph'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### 16.2 The usal IV estimand\n",
"\n",
"The usual IV estimand for a dichotomous instrument. \n",
"\n",
"\n",
"$$\n",
"\\frac{E[Y|Z=1] - E[Y|Z=0]}{E[A|Z=1] - E[A|Z=0]}\n",
"$$\n",
"\n",
"在我们的例子中:\n",
"\n",
"`Expectation(Derivative(y, [Z0, Z1])*Derivative([v0], [Z0, Z1])**(-1))`\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### 16.3 A fourth identifying condition: homogeneity"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"上述的三个条件 (i)-(iii) 并不能保证 that the IV estimand is the average causal effect of treatment $A$ on $Y$. A fourth condition, effect homogeneity (iv), is needed. 在这里,我们按照历史出现的顺序阐述 four possible homogeneity conditions (iv)."
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"- The most extreme, and oldest, version of homogeneity condition (iv) is constant effect of treatment $A$ on outcome $Y$ across individuals.\n",
"- A second, less extreme homogeneity condition (iv) for dichotomous $Z$ and $A$ is equality of the average causal effect of $A$ on $Y$ across levels of $Z$ in both the treated and in the untreated.\n",
"- Additive structural mean models and IV estimation. This third homogeneity condition (iv) is often implausible because some unmeasured confounders may also be effect modifiers. \n",
"- Another type of homogeneity condition (iv) is that the $Z-A$ association on the additive scale is constant across levels of the confounders $U$."
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### 16.4 An alternative fourth condition: monotonicity"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### 16.5 The three instrumental conditions revisited"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"前面的部分讨论了相对优缺点 of choosing monotonicity or homogeneity as the condition (iv). Our discussion implicitly assumed that the proposed instrument $Z$ was in fact an instrument."
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### 16.6 Instrumental variable estimation versus other methods\n",
"\n",
"IV估计在至少三个方面与所有先前讨论的方法不同:\n",
"\n",
"- First, IV estimation requires modeling assumptions even if infinite data were available.\n",
"- Second, relatively minor violations of conditions (i)-(iv) for IV estimation may result in large biases of unpredictable or counterintuitive direction.\n",
"- Third, the ideal setting for the applicability of standard IV estimation is more restrictive than that for other methods.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "true",
"toc-hr-collapsed": false
},
"source": [
"## Programs"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"## Setup and imports\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import statsmodels.api as sm\n",
"import matplotlib.pyplot as plt\n",
"import scipy.optimize\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero\n"
]
}
],
"source": [
"nhefs_all = pd.read_excel('NHEFS.xls')"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"Create the instrument, $Z$"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"nhefs_all['highprice'] = (nhefs_all.price82 >= 1.5).astype('int')"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"We'll use a different subset of the data than used previously"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"restriction_cols = ['wt82', 'education', 'price82']\n",
"missing = nhefs_all[restriction_cols].isnull().any(axis=1)\n",
"nhefs = nhefs_all.loc[~missing]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"data": {
"text/plain": [
"(1476, 65)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nhefs.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"We'll check whether $Z$ (`highprice`) and $A$ (`qsmk`) are associated, that $\\Pr[A=1|Z=1] - \\Pr[A=1|Z=0] \\not = 0$"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Pr[A=1|Z=1] = 25.8%\n",
" Pr[A=1|Z=0] = 19.5%\n",
"Pr[A=1|Z=1] − Pr[A=1|Z=0] = 6.3%\n"
]
}
],
"source": [
"a_given_z1 = nhefs.qsmk.loc[nhefs.highprice == 1]\n",
"a_given_z0 = nhefs.qsmk.loc[nhefs.highprice == 0]\n",
"\n",
"pr_a1_z1 = (a_given_z1 == 1).sum() * 1.0 / a_given_z1.shape[0]\n",
"pr_a1_z0 = (a_given_z0 == 1).sum() * 1.0 / a_given_z0.shape[0]\n",
"\n",
"print(\" Pr[A=1|Z=1] = {:>4.1f}%\".format(pr_a1_z1 * 100))\n",
"print(\" Pr[A=1|Z=0] = {:>4.1f}%\".format(pr_a1_z0 * 100))\n",
"print(\"Pr[A=1|Z=1] − Pr[A=1|Z=0] = {:>4.1f}%\".format((pr_a1_z1 - pr_a1_z0) * 100))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "true"
},
"source": [
"### Program 16.1"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"Pg 196: For a dichotomous instrument $Z$ that also meets condition (iv) from section 16.3,\n",
"\n",
"$$\n",
" \\text{E}[Y^{a=1}] - \\text{E}[Y^{a=0}] = \\frac{\\text{E}[Y|Z=1] - \\text{E}[Y|Z=0]}{\\text{E}[A|Z=1] - \\text{E}[A|Z=0]}\n",
"$$\n",
"\n",
"\"We estimated the numerator and denominator of the IV estimand by simply calculating the four sample averages ...\", pg 197"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"est_y_given_z1 = nhefs.wt82_71[nhefs.highprice == 1].mean()\n",
"est_y_given_z0 = nhefs.wt82_71[nhefs.highprice == 0].mean()\n",
"est_a_given_z1 = nhefs.qsmk[nhefs.highprice == 1].mean()\n",
"est_a_given_z0 = nhefs.qsmk[nhefs.highprice == 0].mean()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"estimated E[Y|Z=1] = 2.686\n",
"estimated E[Y|Z=0] = 2.536\n",
"estimated E[A|Z=1] = 0.258\n",
"estimated E[A|Z=0] = 0.195\n"
]
}
],
"source": [
"print('estimated E[Y|Z=1] = {:>0.3f}'.format(est_y_given_z1))\n",
"print('estimated E[Y|Z=0] = {:>0.3f}'.format(est_y_given_z0))\n",
"print('estimated E[A|Z=1] = {:>0.3f}'.format(est_a_given_z1))\n",
"print('estimated E[A|Z=0] = {:>0.3f}'.format(est_a_given_z0))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the usual IV estimate: 2.40kg\n"
]
}
],
"source": [
"estimate = (est_y_given_z1 - est_y_given_z0) / (est_a_given_z1 - est_a_given_z0)\n",
"print('the usual IV estimate: {:>0.2f}kg'.format(estimate))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"\"Equivalently, we could have fit two (saturated) linear models to estimate the differences in the denominator and the numerator...\""
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"model = sm.OLS.from_formula('wt82_71 ~ highprice', data=nhefs)\n",
"numer = model.fit()\n",
"\n",
"model = sm.OLS.from_formula('qsmk ~ highprice', data=nhefs)\n",
"denom = model.fit()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the usual IV estimate: 2.40kg\n"
]
}
],
"source": [
"estimate = numer.params['highprice'] / denom.params['highprice']\n",
"print('the usual IV estimate: {:>0.2f}kg'.format(estimate))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "true"
},
"source": [
"### Program 16.2"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"Two stage least squares estimator"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"First, we'll manually use two models, but this will give the wrong confidence interval"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"model = sm.OLS.from_formula('qsmk ~ highprice', data=nhefs)\n",
"stg_1 = model.fit()\n",
"\n",
"nhefs['A_pred'] = stg_1.predict(nhefs)\n",
"\n",
"model = sm.OLS.from_formula('wt82_71 ~ A_pred', data=nhefs)\n",
"stg_2 = model.fit()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" | coef | std err | t | P>|t| | [0.025 | 0.975] | \n",
"
\n",
"\n",
" Intercept | 2.0682 | 5.140 | 0.402 | 0.687 | -8.014 | 12.151 | \n",
"
\n",
"\n",
" A_pred | 2.3963 | 20.055 | 0.119 | 0.905 | -36.942 | 41.735 | \n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stg_2.summary().tables[1]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the 2-stage estimate: 2.40kg\n"
]
}
],
"source": [
"estimate = stg_2.params['A_pred']\n",
"print('the 2-stage estimate: {:>0.2f}kg'.format(estimate))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"\"A commonly used rule of thumb is to declare an instrument as weak if the F-statistic from the first-stage model is less than 10\""
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1st-stage F-statistic: 0.82\n"
]
}
],
"source": [
"print('1st-stage F-statistic: {:>0.2f}'.format(stg_1.fvalue))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"The confidence intervals in the second-stage model aren't quite right"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"We'll do the two-stage model again, using `IV2SLS` from the `linearmodels` package"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"from linearmodels.iv import IV2SLS as lm_IV2SLS"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"model = lm_IV2SLS.from_formula(\n",
" 'wt82_71 ~ 1 + [qsmk ~ highprice]',\n",
" data=nhefs\n",
")\n",
"\n",
"two_stage = model.fit(cov_type='homoskedastic')"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Parameter Estimates\n",
"\n",
" | Parameter | Std. Err. | T-stat | P-value | Lower CI | Upper CI | \n",
"
\n",
"\n",
" Intercept | 2.0682 | 5.0817 | 0.4070 | 0.6840 | -7.8917 | 12.028 | \n",
"
\n",
"\n",
" qsmk | 2.3963 | 19.827 | 0.1209 | 0.9038 | -36.463 | 41.256 | \n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"two_stage.summary.tables[1]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" estimate 95% C.I.\n",
"beta_1 2.40 (-36.5, 41.3)\n"
]
}
],
"source": [
"# `conf_ints` is slightly different between linearmodels and statsmodels\n",
"\n",
"est = two_stage.params.qsmk\n",
"conf_ints = two_stage.conf_int(level=0.95)\n",
"lo, hi = conf_ints['lower']['qsmk'], conf_ints['upper']['qsmk']\n",
"\n",
"print(' estimate 95% C.I.')\n",
"print('beta_1 {:>6.2f} ({:>0.1f}, {:>0.1f})'.format(est, lo, hi))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Program 16.3"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"\"The parameters of structural mean models can be estimated via g-estimation\""
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"We'll solve this using the same methods as in Program 14.2. Recall, in that program we searched for a $\\psi^\\dagger$ that would minimize the coefficient on $H(\\psi^\\dagger)$. The $\\psi^\\dagger$ that achieves the minimum is our estimate of the causal effect.\n",
"\n",
"In Program 14.2 we ended with a call to `scipy.optimize.minimize` to do the fine-grained search for $\\psi^\\dagger$. Here, we'll just go straight to that."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"def logit_abs_alpha(psi):\n",
" nhefs['H_of_psi'] = nhefs.wt82_71 - psi * nhefs.qsmk\n",
" gee = sm.GEE.from_formula(\n",
" 'highprice ~ H_of_psi',\n",
" data=nhefs,\n",
" groups=nhefs.seqn,\n",
" family=sm.families.Binomial()\n",
" )\n",
" res = gee.fit()\n",
" alpha = res.params.H_of_psi\n",
" return abs(alpha)"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"This can take a while to run"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"data": {
"text/plain": [
" fun: 2.9857141760250987e-12\n",
" hess_inv: array([[4213.9017246]])\n",
" jac: array([0.00061682])\n",
" message: 'Desired error not necessarily achieved due to precision loss.'\n",
" nfev: 312\n",
" nit: 1\n",
" njev: 100\n",
" status: 2\n",
" success: False\n",
" x: array([2.3962701])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"opt_results = scipy.optimize.minimize(\n",
" fun=logit_abs_alpha,\n",
" x0=4.0,\n",
" tol=1e-4\n",
")\n",
"\n",
"opt_results"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"best estimate: 2.396\n"
]
}
],
"source": [
"print('best estimate: {:>0.3f}'.format(opt_results.x[0]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Program 16.4"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"We'll calculate the IV estimate using a few different cutoffs for `highprice`"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"First, we'll calculate the \"usual\" IV estimate using data means"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"def wald_estimate(Y, A, Z):\n",
" numer = Y.loc[Z == 1].mean() - Y.loc[Z == 0].mean()\n",
" denom = A.loc[Z == 1].mean() - A.loc[Z == 0].mean()\n",
" return numer / denom"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cutoff price: $1.60 estimate: 41.28kg\n",
"cutoff price: $1.70 estimate: -40.91kg\n",
"cutoff price: $1.80 estimate: -21.10kg\n",
"cutoff price: $1.90 estimate: -12.81kg\n"
]
}
],
"source": [
"for cutoff in (1.6, 1.7, 1.8, 1.9):\n",
" estimate = wald_estimate(\n",
" nhefs.wt82_71,\n",
" nhefs.qsmk,\n",
" (nhefs.price82 >= cutoff).astype('int')\n",
" )\n",
" print('cutoff price: ${:>0.2f} estimate: {:>6.2f}kg'.format(cutoff, estimate))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"Next we'll re-calculate using 2-stage models, to get confidence intervals"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" estimate 95% C.I.\n",
"$1.60 41.28 (-281.8, 364.4)\n",
"$1.70 -40.91 (-408.6, 326.8)\n",
"$1.80 -21.10 ( -76.8, 34.6)\n",
"$1.90 -12.81 ( -59.2, 33.5)\n"
]
}
],
"source": [
"print(' estimate 95% C.I.')\n",
"for cutoff in (1.6, 1.7, 1.8, 1.9):\n",
" nhefs['highprice'] = (nhefs.price82 >= cutoff).astype('int')\n",
" \n",
" model = lm_IV2SLS.from_formula(\n",
" 'wt82_71 ~ 1 + [qsmk ~ highprice]',\n",
" data=nhefs\n",
" )\n",
" res = model.fit(cov_type='homoskedastic')\n",
" \n",
" est = res.params.qsmk\n",
" conf_ints = res.conf_int(level=0.95)\n",
" lo, hi = conf_ints['lower']['qsmk'], conf_ints['upper']['qsmk']\n",
"\n",
" print('${:>0.2f} {:>6.2f} ({:>6.1f}, {:>6.1f})'.format(cutoff, est, lo, hi))\n",
"\n",
" \n",
"# restore `highprice` to its original meaning\n",
"nhefs['highprice'] = (nhefs.price82 >= 1.50).astype('int')"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Program 16.5"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"model = lm_IV2SLS.from_formula(\n",
" 'wt82_71 ~ 1 + sex + race + age + smokeintensity + smokeyrs' \n",
" ' + C(exercise) + C(active) + wt71 + [qsmk ~ highprice]',\n",
" data=nhefs\n",
")\n",
"res = model.fit(cov_type='homoskedastic')"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"Collapsed": "false"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Parameter Estimates\n",
"\n",
" | Parameter | Std. Err. | T-stat | P-value | Lower CI | Upper CI | \n",
"
\n",
"\n",
" Intercept | 17.280 | 2.3259 | 7.4296 | 0.0000 | 12.722 | 21.839 | \n",
"
\n",
"\n",
" C(exercise)[T.1] | 0.4987 | 2.1624 | 0.2306 | 0.8176 | -3.7395 | 4.7370 | \n",
"
\n",
"\n",
" C(exercise)[T.2] | 0.5818 | 2.1743 | 0.2676 | 0.7890 | -3.6796 | 4.8433 | \n",
"
\n",
"\n",
" C(active)[T.1] | -1.1701 | 0.6050 | -1.9341 | 0.0531 | -2.3559 | 0.0156 | \n",
"
\n",
"\n",
" C(active)[T.2] | -0.5123 | 1.3031 | -0.3931 | 0.6942 | -3.0664 | 2.0418 | \n",
"
\n",
"\n",
" sex | -1.6444 | 2.6201 | -0.6276 | 0.5303 | -6.7797 | 3.4909 | \n",
"
\n",
"\n",
" race | -0.1833 | 4.6314 | -0.0396 | 0.9684 | -9.2607 | 8.8942 | \n",
"
\n",
"\n",
" age | -0.1636 | 0.2396 | -0.6831 | 0.4946 | -0.6332 | 0.3059 | \n",
"
\n",
"\n",
" smokeintensity | 0.0058 | 0.1449 | 0.0398 | 0.9683 | -0.2783 | 0.2898 | \n",
"
\n",
"\n",
" smokeyrs | 0.0258 | 0.1608 | 0.1607 | 0.8723 | -0.2893 | 0.3409 | \n",
"
\n",
"\n",
" wt71 | -0.0979 | 0.0361 | -2.7116 | 0.0067 | -0.1687 | -0.0271 | \n",
"
\n",
"\n",
" qsmk | -1.0423 | 29.865 | -0.0349 | 0.9722 | -59.577 | 57.492 | \n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res.summary.tables[1]"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
},
"toc-autonumbering": false
},
"nbformat": 4,
"nbformat_minor": 4
}