Contextualized Regression#

Contextual models allow to identify heterogeneous effects which change between different contexts. The contextualized package provides easy access to these heterogeneous models through the contextualized.easy module interfaces. This notebook shows a simple example of using contextualized.easy regression.

1. Generate Simulation Data#

In this simulated regression, we’ll study the simple varying-coefficients case where \(Y =X\beta(C) + \epsilon\), with \(\beta(C) \sim \text{N}(C\phi, \text{I})\), and \(\epsilon \sim \text{N}(0, 0.01 \text{I})\).

  • \(Y\): The dependent or output variable.

  • \(X\): The independent or predictor variable.

  • \(\beta(C)\): The coefficients of the regression model, which vary depending on the context \(C\).

  • \(C\): The context variable that influences the coefficients \(\beta\).

  • \(\phi\): A parameter vector that influences how the context \(C\) impacts the coefficients \(\beta(C)\).

  • \(I\): The identity matrix, used to define the variance of the normal distributions.

  • \(\epsilon\): The error term, representing random noise in the data. Can be ignored when predicting \(Y\) given \(X\).

  • \(\sim \text{N}(\mu, \sigma^2)\): Indicates a normal distribution with mean \(\mu\) and variance \(\sigma^2\).

import numpy as np
n_samples = 1000
n_outcomes = 3
n_context = 1
n_observed = 1
C = np.random.uniform(-1, 1, size=(n_samples, n_context))
X = np.random.uniform(0, 0.5, size=(n_samples, n_observed))
phi = np.random.uniform(-1, 1, size=(n_context, n_observed, n_outcomes))
beta = np.tensordot(C, phi, axes=1) + np.random.normal(0, 0.01, size=(n_samples, n_observed, n_outcomes))
Y = np.array([np.tensordot(X[i], beta[i], axes=1) for i in range(n_samples)])

2. Build and fit model#

The contextualized.easy models use an sklearn-type wrapper interface. More details about the specific implementation can be found in the documentation.

from contextualized.easy import ContextualizedRegressor
model = ContextualizedRegressor(), X, Y, max_epochs=20, learning_rate=1e-3, n_bootstraps=3)

3. Inspect the model predictions.#

import matplotlib.pyplot as plt
%matplotlib inline
preds = model.predict(C, X)[:, 0]
plt.rcParams.update({'font.size': 18})
plt.scatter(Y[:, 0], preds)
plt.xlabel("True Value")
plt.ylabel("Predicted Value")

4. Check what the individual bootstrap models learned.#

Bootstrapping is a resampling method that involves creating multiple samples from the original dataset by sampling with replacement. Each bootstrap sample is used to train different instances of a model, resulting in a collection of models that learned from varying data. This approach improves model robustness and provides insight into the variability of predictions.

For more information, refer to this resource on bootstrapping.

model_preds = model.predict(C, X, individual_preds=True)
for i, pred in enumerate(model_preds):
    plt.scatter(Y[:, 0], pred[:, 0], label='Bootstrap {}'.format(i))
plt.xlabel("True Value")
plt.ylabel("Predicted Value")

5. Check what parameters the models learned.#

beta_preds, mu_preds = model.predict_params(C, individual_preds=True)
order = np.argsort(C.squeeze())  # put C in order for plotting
C = C[order].squeeze()
beta_preds = beta_preds[:, order]
mu_preds = mu_preds[:, order]
beta = beta[order]

    C, np.mean(beta_preds[:, :, 0], axis=0),
            label='$\\beta$ Predicted', color='blue')
    np.percentile(beta_preds[:, :, 0], 2.5, axis=0).squeeze(),
    np.percentile(beta_preds[:, :, 0], 97.5, axis=0).squeeze(),
    color='blue', alpha=0.1
plt.scatter(C, beta[:, :, 0], label='$\\beta$ Actual', marker='+')

    C, np.mean(mu_preds[:, :, 0], axis=0),
            label='$\\mu$ Predicted', color='orange')
    np.percentile(mu_preds[:, :, 0], 2.5, axis=0).squeeze(),
    np.percentile(mu_preds[:, :, 0], 97.5, axis=0).squeeze(),
    color='orange', alpha=0.1
plt.scatter(C, np.zeros_like(C), label='$\\mu$ Actual', marker='+')
plt.legend(loc='center right', bbox_to_anchor=(1.6, 0.5), fontsize=16)
plt.xlabel("C value")
plt.ylabel("Parameter value")

6. Save/load the trained model.#

from contextualized.utils import save, load

save_path = './'
save(model, path=save_path)
model = load(save_path)

7. Further Reading#

For a deeper look#

Contextualized regression with real-world data#

API Reference#