知识点: Stochastic function 从数学上来说是随机泛函。Pyro 样本相对于 Pytorch 样本最大的不同之处是它们有 names, Pyro 的后段使用这些 names 来唯一的识别抽样语句 and change their behavior at runtime depending on how the enclosing stochastic function is being used.

Pyro 模型介绍

概率编程的基本单元是随机函数(stochastic function), 它是包含下面两种成分的任意 Python 可调用对象:

  • 确定性 Python 代码

  • 初等随机函数 that call a random number generator

具体来说,随机函数可以是任何具有方法 __call__() 的对象, like a function, a method, or a PyTorch nn.Module.

在整个教程和文档中,由于随机函数可用于表示数据生成过程的简化或者抽象描述,因此我们通常将其称为 stochastic functions models。将 models 表示为随机函数意味着像常规的 Python 可调用对象一样, models can be composed, reused, imported, and serialized.

Table of Contents

[1]:
import torch
import pyro

pyro.set_rng_seed(101)

初等随机函数

初等随机函数(Primitive stochastic functions, or distributions) 是一类重要的随机函数,我们可以为其显式计算某个输入下得到某个输出的概率. As of PyTorch 0.4 and Pyro 0.2, Pyro uses PyTorch’s distribution library. You can also create custom distributions using transforms.

使用初等随机函数很简单. 例如, 我们可以使用如下代码从标准正态分布 \(\mathcal{N}(0,1)\) 抽取一个样本:

[2]:
loc = 0.   # mean zero
scale = 1. # unit variance
normal = torch.distributions.Normal(loc, scale) # create a normal distribution object
x = normal.rsample() # draw a sample from N(0,1)
print("sample", x)
print("log prob", normal.log_prob(x)) # score the sample from N(0,1)
sample tensor(-1.3905)
log prob tensor(-1.8857)

这里, torch.distributions.NormalDistribution class 的一个实例 that takes parameters and provides sample and score methods. 因为 Pyro’s 的分布库 pyro.distributions 是 Pytorch 分布库 torch.distributions 的一个简单打包,所以我们可以在推断过程中利用 PyTorch’s 快速张量计算和自动微分的能力.

一个简单模型

所有的概率程序(probablistic programs)是通过初等随机函数和确定性计算组合得到的。我们最终的目的是要是用概率编程(probablistic programming)来模拟真实世界,我们现在从一个具体的例子出发。

现在我们有一堆关于每天平均气温和是否阴天(cloud cover)的数据。我们想 reason about how temperature interacts with whether it was sunny or cloudy. 如下的简单随机函数描述了数据的生成过程:

[3]:
from graphviz import Source
Source('digraph{rankdir=LR; cloudy -> temperature}')
[3]:
_images/intro_part_i_9_0.svg
[4]:
def weather():
    cloudy = torch.distributions.Bernoulli(0.3).sample()
    cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
    mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
    scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]
    temp = torch.distributions.Normal(mean_temp, scale_temp).rsample()
    return cloudy, temp.item()

g = weather()
print(g)
('cloudy', 46.847618103027344)

让我们一行行的来看这段代码。 首先, 在第2行 we define a binary random variable ‘cloudy’, which is given by a draw from the Bernoulli distribution with a parameter of 0.3. Since the Bernoulli distributions return 0s or 1s, 在第3行 we convert the value cloudy to a string so that return values of weather are easier to parse. So according to this model 30% of the time it’s cloudy and 70% of the time it’s sunny.

在第4-5行 we define the parameters we’re going to use to sample the temperature in lines 6. These parameters depend on the particular value of cloudy we sampled in line 2. For example, the mean temperature is 55 degrees (Fahrenheit) on cloudy days and 75 degrees on sunny days. Finally we return the two values cloudy and temp in line 7.

但是,这个函数 weather 完全与 Pyro 无关 - it only calls PyTorch. 我们需要将其转换为 Pyro 程序 if we want to use this model for anything other than sampling fake data. (那么,这个模型除生成假数据还应该能干嘛呢?我们希望 can be composed, reused, imported, and serialized.)

[5]:
# Hint
def weather():
    cloudy = pyro.sample('cloudy', pyro.distributions.Bernoulli(0.3))
    cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
    mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
    scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]
    temp = pyro.sample('temp', pyro.distributions.Normal(mean_temp, scale_temp))
    return cloudy, temp.item()

weather()
[5]:
('cloudy', 53.20433807373047)
[8]:
# 打印每个样本点
import pyro.distributions as dist
from pyro.poutine.trace_messenger import TraceMessenger
mu = 8.5
def scale_obs(mu):
    weight = pyro.sample("weight", dist.Normal(mu, 1.))
    return pyro.sample("measurement", dist.Normal(weight, 0.75), obs=9.5)

with TraceMessenger() as tracer:
    scale_obs(mu)

trace = tracer.trace
logp = 0.
for name, node in trace.nodes.items():
    print(name, node['fn'], node['value'], node['is_observed'])
    if node["type"] == "sample":
        logp = logp + node["fn"].log_prob(node["value"]).sum()
weight Normal(loc: 8.5, scale: 1.0) tensor(9.1378) False
measurement Normal(loc: 9.137797355651855, scale: 0.75) 9.5 True

pyro.sample 原语

为了将 weather 转换为 Pyro 程序, 我们将会用 pyro.distribution 替代torch.distributions, 而且用 calls to pyro.sample 来替代 calls to .sample() and .rsample(). 使用 pyro.sample 就像调用初等随机函数一样简单,但是有一个重要的区别:

[20]:
# `pyro.sample` 是 Pyro 的核心原语(language primitive)之一
x = pyro.sample("my_sample", pyro.distributions.Normal(loc, scale))
print(x)
tensor(1.0920)

就像 torch.distributions.Normal().rsample() 一样, 它返回一个标准正态分布的样本。但是最关键的区别是该样本是有 names 的. Pyro的后端使用这些 names

  • to uniquely identify sample statements and

  • change their behavior at runtime depending on how the enclosing stochastic function is being used.

正如我们将看到的,这就是 Pyro 如何实现 various manipulations that underlie inference algorithms.

我们已经介绍了 pyro.samplepyro.distributions,现在我们可以将我们的简单模型重写为Pyro程序:

[29]:
def weather():
    cloudy = pyro.sample('cloudy', pyro.distributions.Bernoulli(0.3))
    cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
    mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
    scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]
    temp = pyro.sample('temp', pyro.distributions.Normal(mean_temp, scale_temp))
    return cloudy, temp.item()

for _ in range(3):
    print(weather())
('sunny', 88.48491668701172)
('sunny', 66.36395263671875)
('cloudy', 61.69258117675781)

Procedurally,weather() 仍然是一个non-deterministic Python callable that returns two random samples. Because the randomness is now invoked with pyro.sample, however, 它远不止于此。具体来说,weather() 定义了两个有名字的随机变量的联合分布: cloudytemp. 也就是说我们定义了一个可以用于推理的概率模型 using the techniques of probability theory. 例如,我们可能会问:如果我观察到70度的温度,多云的可能性有多大?下一个教程将讨论如何提出和回答这类问题。

一个重要的问题是:Pyro 后段如何通过名字查看对应的随机变量和样本呢?

[30]:
import pyro.poutine as poutine
import pyro.distributions as dist
from pyro.poutine.trace_messenger import TraceMessenger

cond_data = {"temp": torch.tensor(52)}

with TraceMessenger() as tracer:
    weather()

trace = tracer.trace
logp = 0.
for name, node in trace.nodes.items():
    print(name, node['fn'], node['value'])
    if node["type"] == "sample":
        logp = logp + node["fn"].log_prob(node["value"]).sum()
cloudy Bernoulli(probs: 0.30000001192092896) tensor(1.)
temp Normal(loc: 55.0, scale: 10.0) tensor(74.8488)

思考:同样的问题是,如何查看和使用模型分布和指导分布的参数呢?如自定义目标函数呢?如何化解成最原始的梯度下降法?

[31]:
node
[31]:
{'type': 'sample',
 'name': 'temp',
 'fn': Normal(loc: 55.0, scale: 10.0),
 'is_observed': False,
 'args': (),
 'kwargs': {},
 'value': tensor(74.8488),
 'infer': {},
 'scale': 1.0,
 'mask': None,
 'cond_indep_stack': (),
 'done': True,
 'stop': False,
 'continuation': None}

通用概率编程

  • Universality: Stochastic Recursion, Higher-order Stochastic Functions, and Random Control Flow

现在我们已经看到了如何定义一个简单的随机函数。建立起来很容易,例如:

[7]:
# 建立在 weather 基础上的一个概率程序
# weather{cloudy --> temp} --> ice_cream
def ice_cream_sales():
    cloudy, temp = weather()
    expected_sales = 200. if cloudy == 'sunny' and temp > 80.0 else 50.
    ice_cream = pyro.sample('ice_cream', pyro.distributions.Normal(expected_sales, 10.0))
    return ice_cream

这种模块化显然是非常强大的,但是它强大到足以包含我们想要表达的所有不同类型的随机函数吗?

It turns out that because Pyro is embedded in Python, stochastic functions can contain arbitrarily complex deterministic Python and randomness can freely affect control flow. 例如,我们可以构造递归函数 that terminate their recursion nondeterministically, provided we take care to pass pyro.sample unique sample names whenever it’s called. 例如我们可以定义一个几何分布,来计算直到第一次成功之前的失败次数,如下所示:

[37]:
def geometric(p, t=None):
    t = 0 if t is None else t
    # 该函数表示第 t 次失败之后, 继续尝试直到成功,返回总共失败次数。
    x = pyro.sample("x_{}".format(t), pyro.distributions.Bernoulli(p))
    if x.item() == 1:
        return 0
    else:
        t_update = t+1
        return 1+geometric(p, t_update)

print(geometric(0.1))
12

Note that the names x_0, x_1, etc., in geometric() 是动态生成的,并且不同的执行可以具有不同数量的 named 随机变量。

[535]:
cond_data = {"last": torch.tensor(52)}

with TraceMessenger() as tracer:
    cond_data = {"x_{}".format(geometric(0.5)):v for k, v in cond_data.items()}

print(cond_data)
trace = tracer.trace
logp = 0.
for name, node in trace.nodes.items():
    print(name, node)
    if node["type"] == "sample":
        if node["is_observed"]:
            assert node["value"] is cond_data[name]
            print('观测变量', node)
        logp = logp + node["fn"].log_prob(node["value"]).sum()

logp
{'x_2': tensor(52)}
x_0 {'type': 'sample', 'name': 'x_0', 'fn': Bernoulli(probs: 0.5), 'is_observed': False, 'args': (), 'kwargs': {}, 'value': tensor(0.), 'infer': {}, 'scale': 1.0, 'mask': None, 'cond_indep_stack': (), 'done': True, 'stop': False, 'continuation': None}
x_1 {'type': 'sample', 'name': 'x_1', 'fn': Bernoulli(probs: 0.5), 'is_observed': False, 'args': (), 'kwargs': {}, 'value': tensor(0.), 'infer': {}, 'scale': 1.0, 'mask': None, 'cond_indep_stack': (), 'done': True, 'stop': False, 'continuation': None}
x_2 {'type': 'sample', 'name': 'x_2', 'fn': Bernoulli(probs: 0.5), 'is_observed': False, 'args': (), 'kwargs': {}, 'value': tensor(1.), 'infer': {}, 'scale': 1.0, 'mask': None, 'cond_indep_stack': (), 'done': True, 'stop': False, 'continuation': None}
[535]:
tensor(-2.0794)

我们也可以定义把其他随机函数当成输入或者输出的随机函数:

[39]:
from graphviz import Source
Source('digraph{rankdir=LR; loc, scale -> z1, z2 -> y}')
[39]:
_images/intro_part_i_31_0.svg
[43]:
from graphviz import Source
Source('digraph{rankdir=LR; mu_latent -> loc -> fn; scale, fn -> y}')
[43]:
_images/intro_part_i_32_0.svg
[44]:
def normal_product(loc, scale):
    z1 = pyro.sample("z1", pyro.distributions.Normal(loc, scale))
    z2 = pyro.sample("z2", pyro.distributions.Normal(loc, scale))
    y = z1 * z2
    return y

# Input: None
# Output: 带参数随机函数 p(y;scale)
def make_normal_normal():
    mu_latent = pyro.sample("mu_latent", pyro.distributions.Normal(0, 1))
    fn = lambda scale: normal_product(mu_latent, scale)
    return fn

print(make_normal_normal()(1.))
tensor(0.7166)

这里 make_normal_normal() 是一个参数为 ‘scale’ 的随机函数,在其执行过程中,产生了三个有名字的随机变量。

Pyro 支持任意的 Python code— 包括iteration, recursion, higher-order functions, etc,结合随机控制流程,意味着 Pyro 随机函数是通用概率编程语言 (Universal Probabilistic Programming) ,即它们可用于表示任何可计算的概率分布。 正如我们将在后续教程中看到的那样,它难以置信的强大。

下一步?

我们已经展示了如何使用随机函数和初等随机函数(primitive distributions)来表示 Pyro中的模型。为了从数据中学习模型并对其进行推理,我们需要能够进行推断, 这是 下一个 tutorial 的主体。