Analysis

Analysis#

contextualized.analysis contains functions to analyze and plot the results of contextualized models. All functions can be loaded directly from the module, e.g. from contextualized.analysis import plot_heterogeneous_predictor_effects.

pvals.calc_homogeneous_context_effects_pvals

Calculate p-values for the effects of context directly on the outcome.

pvals.calc_homogeneous_predictor_effects_pvals

Calculate p-values for the context-invariant effects of predictors.

pvals.calc_heterogeneous_predictor_effects_pvals

Calculate p-values for the heterogeneous (context-dependent) effects of predictors.

pvals.test_each_context

Test heterogeneous predictor effects attributed to every individual context feature.

pvals.get_possible_pvals

Get the range of possible p-values based on the number of bootstraps.

accuracy_split.print_acc_by_covars

Prints AUROC for each class for different covariate splits.

bootstraps.select_good_bootstraps

Prune any divergent or bad bootstraps with mean training errors below tol * min(training errors).

embeddings.plot_lowdim_rep

Plot a low-dimensional representation of a dataset.

embeddings.plot_embedding_for_all_covars

Plot embeddings of representations for all covariates in a Pandas dataframe.

effects.plot_homogeneous_context_effects

Plot the direct effect of context on outcomes, disregarding other features.

effects.plot_homogeneous_predictor_effects

Plot the effect of predictors on outcomes that do not change with context (homogeneous).

effects.plot_heterogeneous_predictor_effects

Plot how the effect of predictors on outcomes changes with context (heterogeneous).

calc_homogeneous_context_effects_pvals(model: SKLearnWrapper, C: ndarray, verbose: bool = True, **kwargs) ndarray[source]#

Calculate p-values for the effects of context directly on the outcome.

Parameters:
  • model (SKLearnWrapper) – Model to analyze.

  • C (np.ndarray) – Contexts to analyze.

  • verbose (bool) – Whether to print the range of possible p-values.

Returns:

P-values of shape (n_contexts, n_outcomes) testing whether the

sign of the direct effect of context on outcomes is consistent across bootstraps.

Return type:

np.ndarray

Raises:

ValueError – If the model’s n_bootstraps is less than 2.

calc_homogeneous_predictor_effects_pvals(model: SKLearnWrapper, C: ndarray, verbose: bool = True, **kwargs) ndarray[source]#

Calculate p-values for the context-invariant effects of predictors.

Parameters:
  • model (SKLearnWrapper) – Model to analyze.

  • C (np.ndarray) – Contexts to analyze.

  • verbose (bool) – Whether to print the range of possible p-values.

Returns:

P-values of shape (n_predictors, n_outcomes) testing whether the

sign of the context-invariant predictor effects are consistent across bootstraps.

Return type:

np.ndarray

Raises:

ValueError – If the model’s n_bootstraps is less than 2.

calc_heterogeneous_predictor_effects_pvals(model, C: ndarray, verbose: bool = True, **kwargs) ndarray[source]#

Calculate p-values for the heterogeneous (context-dependent) effects of predictors.

Parameters:
  • model (SKLearnWrapper) – Model to analyze.

  • C (np.ndarray) – Contexts to analyze.

  • verbose (bool) – Whether to print the range of possible p-values.

Returns:

P-values of shape (n_contexts, n_predictors, n_outcomes) testing whether the

context-varying parameter range is consistent across bootstraps.

Return type:

np.ndarray

Raises:

ValueError – If the model’s n_bootstraps is less than 2.

test_each_context(model_constructor: Type[SKLearnWrapper], C: DataFrame, X: DataFrame, Y: DataFrame, verbose: bool = True, model_kwargs: Dict = {'encoder_type': 'linear'}, fit_kwargs: Dict = {'learning_rate': 0.01, 'max_epochs': 3, 'n_bootstraps': 20}) DataFrame[source]#

Test heterogeneous predictor effects attributed to every individual context feature. Applies test_heterogeneous_predictor_effects to a model learned for a single context feature in C, and does this sequentially for every context feature.

Parameters:
  • model_constructor (SKLearnWrapper) – The constructor of the model to be tested, currently either ContextualizedRegressor or ContextualizedClassifier.

  • C (pd.DataFrame) – The context dataframe (n_samples, n_contexts).

  • X (pd.DataFrame) – The predictor dataframe (n_samples, n_predictors).

  • Y (pd.DataFrame) – The outcome, target, or label dataframe (n_samples, n_outcomes).

  • verbose (bool) – Whether to print the range of possible p-values.

  • **kwargs – Additional arguments for the model constructor.

Returns:

A DataFrame of p-values for each (context, predictor, outcome) combination, describing how much the predictor’s effect on the outcome varies across the context.

Return type:

pd.DataFrame

Raises:

ValueError – If the model’s n_bootstraps is less than 2.

get_possible_pvals(num_bootstraps: int) list[source]#

Get the range of possible p-values based on the number of bootstraps.

Parameters:

num_bootstraps (int) – The number of bootstraps.

Returns:

The minimum and maximum possible p-values.

Return type:

list

print_acc_by_covars(Y_true: ndarray, Y_pred: ndarray, covar_df: DataFrame, **kwargs) None[source]#

Prints AUROC for each class for different covariate splits. Should only be used with ContextualizedClassifier.

Parameters:
  • Y_true (np.ndarray) – True labels.

  • Y_pred (np.ndarray) – Predicted labels.

  • covar_df (pd.DataFrame) – DataFrame of covariates.

  • max_classes (int, optional) – Maximum number of classes to print. Defaults to 20.

  • covar_stds (np.ndarray, optional) – Standard deviations of covariates. Defaults to None.

  • covar_means (np.ndarray, optional) – Means of covariates. Defaults to None.

  • covar_encoders (List[LabelEncoder], optional) – Encoders for covariates. Defaults to None.

  • train_idx (np.ndarray, optional) – Boolean array indicating training data. Defaults to None.

  • test_idx (np.ndarray, optional) – Boolean array indicating testing data. Defaults to None.

Returns:

None

select_good_bootstraps(sklearn_wrapper: SKLearnWrapper, train_errs: ndarray, tol: float = 2) SKLearnWrapper[source]#

Prune any divergent or bad bootstraps with mean training errors below tol * min(training errors).

Parameters:
  • sklearn_wrapper (contextualized.easy.wrappers.SKLearnWrapper) – Wrapper for the sklearn model.

  • train_errs (np.ndarray) – Training errors for each bootstrap (n_bootstraps, n_samples, n_outcomes).

  • tol (float) – Only bootstraps with mean train_errs below tol * min(train_errs) are kept.

Returns:

The input model with only selected bootstraps.

Return type:

contextualized.easy.wrappers.SKLearnWrapper

plot_lowdim_rep(low_dim: ndarray, labels: ndarray, **kwargs)[source]#

Plot a low-dimensional representation of a dataset.

Parameters:
  • low_dim (np.ndarray) – Low-dimensional representation of shape (n_samples, 2).

  • labels (np.ndarray) – Labels of shape (n_samples,).

  • kwargs – Keyword arguments for plotting.

Returns:

None

plot_embedding_for_all_covars(reps: ndarray, covars_df: DataFrame, covars_stds: Optional[ndarray] = None, covars_means: Optional[ndarray] = None, covars_encoders: Optional[List[Callable]] = None, **kwargs) None[source]#

Plot embeddings of representations for all covariates in a Pandas dataframe.

Parameters:
  • reps (np.ndarray) – Embeddings of shape (n_samples, n_dims).

  • covars_df (pd.DataFrame) – DataFrame of covariates.

  • covars_stds (np.ndarray, optional) – Standard deviations of covariates. Defaults to None.

  • covars_means (np.ndarray, optional) – Means of covariates. Defaults to None.

  • covars_encoders (List[LabelEncoder], optional) – Encoders for covariates. Defaults to None.

  • kwargs – Keyword arguments for plotting.

Returns:

None

plot_homogeneous_context_effects(model: SKLearnWrapper, C: ndarray, **kwargs) None[source]#

Plot the direct effect of context on outcomes, disregarding other features. This context effect is homogeneous in that it is a static function of context (context-invariant).

Parameters:
  • model (SKLearnWrapper) – a fitted contextualized.easy model

  • C – the context values to use to estimate the effects

  • verbose (bool, optional) – print progess. Default True.

  • individual_preds (bool, optional) – whether to use plot each bootstrap. Default True.

  • C_vis (np.ndarray, optional) – Context bins used to visualize context (n_vis, n_contexts). Default None to construct anew.

  • n_vis (int, optional) – Number of bins to use to visualize context. Default 1000.

  • lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.

  • upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.

  • classification (bool, optional) – Whether to exponentiate the effects. Default True.

  • C_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each context. Default None.

  • C_means (np.ndarray, optional) – means for each context. Default None.

  • C_stds (np.ndarray, optional) – standard deviations for each context. Default None.

  • xlabel_prefix (str, optional) – prefix for x label. Default “”.

  • figname (str, optional) – name of figure to save. Default None.

Returns:

None

plot_homogeneous_predictor_effects(model: SKLearnWrapper, C: ndarray, X: ndarray, **kwargs) None[source]#

Plot the effect of predictors on outcomes that do not change with context (homogeneous).

Parameters:
  • model (SKLearnWrapper) – a fitted contextualized.easy model

  • C – the context values to use to estimate the effects

  • X – the predictor values to use to estimate the effects

  • max_classes_for_discrete (int, optional) – maximum number of classes to treat as discrete. Default 10.

  • min_effect_size (float, optional) – minimum effect size to plot. Default 0.003.

  • ylabel (str, optional) – y label for plot. Default “Influence of “.

  • xlabel_prefix (str, optional) – prefix for x label. Default “”.

  • X_names (List[str], optional) – names of predictors. Default None.

  • X_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each predictor. Default None.

  • X_means (np.ndarray, optional) – means for each predictor. Default None.

  • X_stds (np.ndarray, optional) – standard deviations for each predictor. Default None.

  • verbose (bool, optional) – print progess. Default True.

  • lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.

  • upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.

  • classification (bool, optional) – Whether to exponentiate the effects. Default True.

  • figname (str, optional) – name of figure to save. Default None.

Returns:

None

plot_heterogeneous_predictor_effects(model, C, X, **kwargs)[source]#

Plot how the effect of predictors on outcomes changes with context (heterogeneous).

Parameters:
  • model (SKLearnWrapper) – a fitted contextualized.easy model

  • C – the context values to use to estimate the effects

  • X – the predictor values to use to estimate the effects

  • max_classes_for_discrete (int, optional) – maximum number of classes to treat as discrete. Default 10.

  • min_effect_size (float, optional) – minimum effect size to plot. Default 0.003.

  • y_prefix (str, optional) – y prefix for plot. Default “Influence of “.

  • X_names (List[str], optional) – names of predictors. Default None.

  • verbose (bool, optional) – print progess. Default True.

  • individual_preds (bool, optional) – whether to use plot each bootstrap. Default True.

  • C_vis (np.ndarray, optional) – Context bins used to visualize context (n_vis, n_contexts). Default None to construct anew.

  • n_vis (int, optional) – Number of bins to use to visualize context. Default 1000.

  • lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.

  • upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.

  • classification (bool, optional) – Whether to exponentiate the effects. Default True.

  • C_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each context. Default None.

  • C_means (np.ndarray, optional) – means for each context. Default None.

  • C_stds (np.ndarray, optional) – standard deviations for each context. Default None.

  • xlabel_prefix (str, optional) – prefix for x label. Default “”.

  • figname (str, optional) – name of figure to save. Default None.

Returns:

None