Write your first stochastic function in Python

· 5 min read

Let’s estimate how many ice creams insomnia cookies in New Haven will sell this Fall. Several variables impact our forecasting, including weather and temperature. This blog post will review some essential functionalities and tutorials that pyro. Ai provides and then writes our first stochastic function for this problem.

Let’s get back to our example; a simple stochastic function that describes weather could be $\text{Bernoulli}(\alpha)$. Our prior belief on how likely a day to be cloudy is $\frac{3}{10}$, and we want to generate a sample using torch’s internal distribution library as follows:

1
2
 cloudy = torch.distributions.Bernoulli(0.3).sample()
 cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'

Therefore, the variable cloudy is either $1$ or $0$.To sample temperature, let’s define other variables that depend on it. According to our experience during college at Yale, New Haven on cloudy days is around $55^o$ Fahrenheit, and on sunny days it is $75^o$ Fahrenheit. We believe these numbers can increase/decrease up to $10$ and $15$ standard deviations concerning means:

3
4
 mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]

Now, we can define our stochastic function to forecast temperature with the following:

5
temp = torch.distributions.Normal(mean_temp, scale_temp).rsample()

pyro helps use the basic functionality of pytorhch’s huge pool of libraries and enables us to infer possible hidden variables. Let’s wrap up all the code we wrote so far in pyro:

 6
 7
 8
 9
10
11
12
13
import pyro.distributions as dist
def weather():
  cloudy = pyro.sample('cloudy', dist.Bernoulli(0.3))
  cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
  mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
  scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]
  temp = pyro.sample('temp', dist.Normal(mean_temp, scale_temp))
  return cloudy, temp.item()

As you see, it’s similar to what we wrote in PyTorch. A sample output could be (‘cloudy,’ 64.544) (‘sunny’, 94.375) (‘sunny’, 72.518). Building off of this model is self-explanatory:

14
15
16
17
18
19
20
def ice_cream_sales():
  cloudy, temp = weather()
  if cloudy == 'sunny' and temp > 80:
    exp_sales = 200
  else:
    exp_sales = 50
  return pyro.sample('ice_cream',dist.Normal(exp_sales, 10))

We expect to sell more ice creams in sunny and warm temperatures.

Introduction to Inference

Here, we will give a simple example of inference functions in statistics and introduce pyro basics to work. Then, I will include the model and guide functions to discuss and shed light on all aspects of variational inference.

Example 1: Weight Measurement

We have a remarkable ability to guess how much an object weighs by only watching them. We believe in our knowledge of some characteristics and materials of the thing rather than supernatural power. But, our scale could be more reliable, and we get slightly different values every time. We want to measure again and again to compensate for this error:

$$ \text{weight} | \text{guess} \sim \text{Normal}(\mu,\sigma)$$

$$\text{measurement} | \text{guess,weight} \sim \text{Normal}(\text{weight},0.75) $$

We can define a simple stochastic function for this phenomenon by sampling via a normal distribution and appropriate mean (the scale which is also dependent on our guess) and standard deviations:

21
22
23
def scale(guess):
  weight = pyro.sample("weight", dist.Normal(guess, 1.0))
  return pyro.sample("measurement", dist.Normal(weight, 0.75))

And pyro is going to help us to infer the latent variable weight:

$$ (\text{weight} |\text{guess}, \text{measurement}=9.5) \propto f(x) $$

and also provides notation obs=. for conditions and observations:

24
25
26
27
28
def scale_obs(guess): # equivalent to conditioned_scale above
  weight = pyro.sample("weight", dist.Normal(guess, 1.))
  # here we condition on measurement == 9.5
  obs_ = torch.tensor(9.5)
  return pyro.sample("measurement",dist.Normal(weight,0.75),obs=obs_)

It behaves exactly analogous to:

29
conditioned_scale=pyro.condition(weight, data={"measurement": torch.tensor(9.5)})(guess)

The code means we want to sample according to a normal distribution conditioned on our initial guess and measurement=$9.5$. However, it can sometimes be more straightforward to infer the hidden variable than here. Sometimes, integration over the measurements is intractable (i.e., $p(z|x) = \frac{p(x,z)}{\int dz p(x,z)}$). We will approximate the posterior using a similar function to our model, named guide. Input variables for these functions are always the same. The model has data and our observations, but guide doesn’t and is the ultimate distribution we want to learn.

30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
guess = 8.5

def scale_parametrized_guide(guess):
  a = pyro.param("a", torch.tensor(guess))
  b = pyro.param("b", torch.tensor(1.))
  return pyro.sample("weight", dist.Normal(a, torch.abs(b)))

pyro.clear_param_store()
svi = pyro.infer.SVI(model=conditioned_scale,
           guide=scale_parametrized_guide,
           optim=pyro.optim.Adam({"lr": 0.003}),
           loss=pyro.infer.Trace_ELBO())

losses, a, b = [], [], []
num_steps = 2500
for t in range(num_steps):
  losses.append(svi.step(guess))
  a.append(pyro.param("a").item())
  b.append(pyro.param("b").item())

Once we run this code for enough iterations, we can say that we approximated our hidden parameters $a$ and $b$, which are necessary to approximate our posterior. Sample outputs could be (a=9.206, b=0.605).

Example 2: Fair Two-sided Coin

To be completed ..

Example 3: Topic Modeling

Topic modeling is a powerful technique to assign appropriate distributions to words and documents. David Blei (Columbia University, JMLR 2003) proposed Dirichlet distribution prior to generating words within a document and offered variational inference to obtain the hyperparameters. However, people used Markov Chain Monte Carlo (MCMC), specifically Gibbs Sampling, to approximate the parameters. While MCMC approximates the exact solution, it is prolonged. Akash Srivastava et al. (from the University of Edinburgh, ICLR 2017) introduced variational autoencoders as a novel approach for topic models. AutoEncoder captures non-linearity where other methods like PCA don’t. Yet, it needs to have regularity and may lead to overfitting. In other words, it doesn’t capture the structure of your data. Variational AE also captures the structure of your data by guiding your model to generate reliable samples. You can quickly learn variational AE using a graphical processing unit (GPU) over millions of documents.

Example

Using the same principles, we learned that we could write advanced posterior inference models like topic modeling for’ model’ and’ guide.’ Implementing a topic modeling algorithm using model and guide is here: code.