Analysis

Analysis#

contextualized.analysis contains functions to analyze and plot the results of contextualized models. All functions can be loaded directly from the module, e.g. from contextualized.analysis import plot_heterogeneous_predictor_effects.

`pvals.calc_homogeneous_context_effects_pvals`	Calculate p-values for the effects of context directly on the outcome.
`pvals.calc_homogeneous_predictor_effects_pvals`	Calculate p-values for the context-invariant effects of predictors.
`pvals.calc_heterogeneous_predictor_effects_pvals`	Calculate p-values for the heterogeneous (context-dependent) effects of predictors.
`pvals.test_each_context`	Test heterogeneous predictor effects attributed to every individual context feature.
`pvals.get_possible_pvals`	Get the range of possible p-values based on the number of bootstraps.
`accuracy_split.print_acc_by_covars`	Prints AUROC for each class for different covariate splits.
`bootstraps.select_good_bootstraps`	Prune any divergent or bad bootstraps with mean training errors below tol * min(training errors).
`embeddings.plot_lowdim_rep`	Plot a low-dimensional representation of a dataset.
`embeddings.plot_embedding_for_all_covars`	Plot embeddings of representations for all covariates in a Pandas dataframe.
`effects.plot_homogeneous_context_effects`	Plot the direct effect of context on outcomes, disregarding other features.
`effects.plot_homogeneous_predictor_effects`	Plot the effect of predictors on outcomes that do not change with context (homogeneous).
`effects.plot_heterogeneous_predictor_effects`	Plot how the effect of predictors on outcomes changes with context (heterogeneous).

calc_homogeneous_context_effects_pvals(model: SKLearnWrapper, C: ndarray, verbose: bool = True, **kwargs) → ndarray[source]#

Calculate p-values for the effects of context directly on the outcome.

Parameters:

model (SKLearnWrapper) – Model to analyze.
C (np.ndarray) – Contexts to analyze.
verbose (bool) – Whether to print the range of possible p-values.

Returns:

P-values of shape (n_contexts, n_outcomes) testing whether the: sign of the direct effect of context on outcomes is consistent across bootstraps.

Return type:

np.ndarray

Raises:

ValueError – If the model’s n_bootstraps is less than 2.

calc_homogeneous_predictor_effects_pvals(model: SKLearnWrapper, C: ndarray, verbose: bool = True, **kwargs) → ndarray[source]#

Calculate p-values for the context-invariant effects of predictors.

Parameters:

model (SKLearnWrapper) – Model to analyze.
C (np.ndarray) – Contexts to analyze.
verbose (bool) – Whether to print the range of possible p-values.

Returns:

P-values of shape (n_predictors, n_outcomes) testing whether the: sign of the context-invariant predictor effects are consistent across bootstraps.

Return type:

np.ndarray

Raises:

ValueError – If the model’s n_bootstraps is less than 2.

calc_heterogeneous_predictor_effects_pvals(model, C: ndarray, verbose: bool = True, **kwargs) → ndarray[source]#

Calculate p-values for the heterogeneous (context-dependent) effects of predictors.

Parameters:

model (SKLearnWrapper) – Model to analyze.
C (np.ndarray) – Contexts to analyze.
verbose (bool) – Whether to print the range of possible p-values.

Returns:

P-values of shape (n_contexts, n_predictors, n_outcomes) testing whether the: context-varying parameter range is consistent across bootstraps.

Return type:

np.ndarray

Raises:

ValueError – If the model’s n_bootstraps is less than 2.

test_each_context(model_constructor: Type[SKLearnWrapper], C: DataFrame, X: DataFrame, Y: DataFrame, verbose: bool = True, model_kwargs: Dict = {'encoder_type': 'linear'}, fit_kwargs: Dict = {'learning_rate': 0.01, 'max_epochs': 3, 'n_bootstraps': 20}) → DataFrame[source]#

Test heterogeneous predictor effects attributed to every individual context feature. Applies test_heterogeneous_predictor_effects to a model learned for a single context feature in C, and does this sequentially for every context feature.

Parameters:

model_constructor (SKLearnWrapper) – The constructor of the model to be tested, currently either ContextualizedRegressor or ContextualizedClassifier.
C (pd.DataFrame) – The context dataframe (n_samples, n_contexts).
X (pd.DataFrame) – The predictor dataframe (n_samples, n_predictors).
Y (pd.DataFrame) – The outcome, target, or label dataframe (n_samples, n_outcomes).
verbose (bool) – Whether to print the range of possible p-values.
**kwargs – Additional arguments for the model constructor.

Returns:

A DataFrame of p-values for each (context, predictor, outcome) combination, describing how much the predictor’s effect on the outcome varies across the context.

Return type:

pd.DataFrame

Raises:

ValueError – If the model’s n_bootstraps is less than 2.

get_possible_pvals(num_bootstraps: int) → list[source]#

Get the range of possible p-values based on the number of bootstraps.

Parameters:: num_bootstraps (int) – The number of bootstraps.
Returns:: The minimum and maximum possible p-values.
Return type:: list

print_acc_by_covars(Y_true: ndarray, Y_pred: ndarray, covar_df: DataFrame, **kwargs) → None[source]#

Prints AUROC for each class for different covariate splits. Should only be used with ContextualizedClassifier.

Parameters:

Y_true (np.ndarray) – True labels.
Y_pred (np.ndarray) – Predicted labels.
covar_df (pd.DataFrame) – DataFrame of covariates.
max_classes (int, optional) – Maximum number of classes to print. Defaults to 20.
covar_stds (np.ndarray, optional) – Standard deviations of covariates. Defaults to None.
covar_means (np.ndarray, optional) – Means of covariates. Defaults to None.
covar_encoders (List[LabelEncoder], optional) – Encoders for covariates. Defaults to None.
train_idx (np.ndarray, optional) – Boolean array indicating training data. Defaults to None.
test_idx (np.ndarray, optional) – Boolean array indicating testing data. Defaults to None.

Returns:

None

select_good_bootstraps(sklearn_wrapper: SKLearnWrapper, train_errs: ndarray, tol: float = 2) → SKLearnWrapper[source]#

Prune any divergent or bad bootstraps with mean training errors below tol * min(training errors).

Parameters:

sklearn_wrapper (contextualized.easy.wrappers.SKLearnWrapper) – Wrapper for the sklearn model.
train_errs (np.ndarray) – Training errors for each bootstrap (n_bootstraps, n_samples, n_outcomes).
tol (float) – Only bootstraps with mean train_errs below tol * min(train_errs) are kept.

Returns:

The input model with only selected bootstraps.

Return type:

contextualized.easy.wrappers.SKLearnWrapper

plot_lowdim_rep(low_dim: ndarray, labels: ndarray, **kwargs)[source]#

Plot a low-dimensional representation of a dataset.

Parameters:

low_dim (np.ndarray) – Low-dimensional representation of shape (n_samples, 2).
labels (np.ndarray) – Labels of shape (n_samples,).
kwargs – Keyword arguments for plotting.

Returns:

None

plot_embedding_for_all_covars(reps: ndarray, covars_df: DataFrame, covars_stds: Optional[ndarray] = None, covars_means: Optional[ndarray] = None, covars_encoders: Optional[List[Callable]] = None, **kwargs) → None[source]#

Plot embeddings of representations for all covariates in a Pandas dataframe.

Parameters:

reps (np.ndarray) – Embeddings of shape (n_samples, n_dims).
covars_df (pd.DataFrame) – DataFrame of covariates.
covars_stds (np.ndarray, optional) – Standard deviations of covariates. Defaults to None.
covars_means (np.ndarray, optional) – Means of covariates. Defaults to None.
covars_encoders (List[LabelEncoder], optional) – Encoders for covariates. Defaults to None.
kwargs – Keyword arguments for plotting.

Returns:

None

plot_homogeneous_context_effects(model: SKLearnWrapper, C: ndarray, **kwargs) → None[source]#

Plot the direct effect of context on outcomes, disregarding other features. This context effect is homogeneous in that it is a static function of context (context-invariant).

Parameters:

model (SKLearnWrapper) – a fitted contextualized.easy model
C – the context values to use to estimate the effects
verbose (bool, optional) – print progess. Default True.
individual_preds (bool, optional) – whether to use plot each bootstrap. Default True.
C_vis (np.ndarray, optional) – Context bins used to visualize context (n_vis, n_contexts). Default None to construct anew.
n_vis (int, optional) – Number of bins to use to visualize context. Default 1000.
lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.
upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.
classification (bool, optional) – Whether to exponentiate the effects. Default True.
C_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each context. Default None.
C_means (np.ndarray, optional) – means for each context. Default None.
C_stds (np.ndarray, optional) – standard deviations for each context. Default None.
xlabel_prefix (str, optional) – prefix for x label. Default “”.
figname (str, optional) – name of figure to save. Default None.

Returns:

None

plot_homogeneous_predictor_effects(model: SKLearnWrapper, C: ndarray, X: ndarray, **kwargs) → None[source]#

Plot the effect of predictors on outcomes that do not change with context (homogeneous).

Parameters:

model (SKLearnWrapper) – a fitted contextualized.easy model
C – the context values to use to estimate the effects
X – the predictor values to use to estimate the effects
max_classes_for_discrete (int, optional) – maximum number of classes to treat as discrete. Default 10.
min_effect_size (float, optional) – minimum effect size to plot. Default 0.003.
ylabel (str, optional) – y label for plot. Default “Influence of “.
xlabel_prefix (str, optional) – prefix for x label. Default “”.
X_names (List[str], optional) – names of predictors. Default None.
X_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each predictor. Default None.
X_means (np.ndarray, optional) – means for each predictor. Default None.
X_stds (np.ndarray, optional) – standard deviations for each predictor. Default None.
verbose (bool, optional) – print progess. Default True.
lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.
upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.
classification (bool, optional) – Whether to exponentiate the effects. Default True.
figname (str, optional) – name of figure to save. Default None.

Returns:

None

plot_heterogeneous_predictor_effects(model, C, X, **kwargs)[source]#

Plot how the effect of predictors on outcomes changes with context (heterogeneous).

Parameters:

model (SKLearnWrapper) – a fitted contextualized.easy model
C – the context values to use to estimate the effects
X – the predictor values to use to estimate the effects
max_classes_for_discrete (int, optional) – maximum number of classes to treat as discrete. Default 10.
min_effect_size (float, optional) – minimum effect size to plot. Default 0.003.
y_prefix (str, optional) – y prefix for plot. Default “Influence of “.
X_names (List[str], optional) – names of predictors. Default None.
verbose (bool, optional) – print progess. Default True.
individual_preds (bool, optional) – whether to use plot each bootstrap. Default True.
C_vis (np.ndarray, optional) – Context bins used to visualize context (n_vis, n_contexts). Default None to construct anew.
n_vis (int, optional) – Number of bins to use to visualize context. Default 1000.
lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.
upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.
classification (bool, optional) – Whether to exponentiate the effects. Default True.
C_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each context. Default None.
C_means (np.ndarray, optional) – means for each context. Default None.
C_stds (np.ndarray, optional) – standard deviations for each context. Default None.
xlabel_prefix (str, optional) – prefix for x label. Default “”.
figname (str, optional) – name of figure to save. Default None.

Returns:

None