Model Analysis#

contextualized.analysis contains functions to analyze and plot the results of contextualized models. All functions can be loaded directly from the module, e.g. from contextualized.analysis import plot_heterogeneous_predictor_effects.

accuracy_split.print_acc_by_covars

Prints Accuracy for different data splits with covariates.

embeddings.plot_lowdim_rep

Plot a low-dimensional representation of a dataset.

embeddings.plot_embedding_for_all_covars

Plot embeddings of representations for all covariates in a Pandas dataframe.

effects.plot_homogeneous_context_effects

Plot the direct effect of context on outcomes, disregarding other features.

effects.plot_homogeneous_predictor_effects

Plot the effect of predictors on outcomes that do not change with context (homogeneous).

effects.plot_heterogeneous_predictor_effects

Plot how the effect of predictors on outcomes changes with context (heterogeneous).

pvals.calc_homogeneous_context_effects_pvals

Calculate p-values for the effects of context.

pvals.calc_homogeneous_predictor_effects_pvals

Calculate p-values for the context-invariant effects of predictors.

pvals.calc_heterogeneous_predictor_effects_pvals

Calculate p-values for the heterogeneous effects of predictors.

print_acc_by_covars(Y_true: ndarray, Y_pred: ndarray, covar_df: DataFrame, **kwargs) None[source]#

Prints Accuracy for different data splits with covariates.

Parameters:
  • Y_true (np.ndarray) – True labels.

  • Y_pred (np.ndarray) – Predicted labels.

  • covar_df (pd.DataFrame) – DataFrame of covariates.

  • max_classes (int, optional) – Maximum number of classes to print. Defaults to 20.

  • covar_stds (np.ndarray, optional) – Standard deviations of covariates. Defaults to None.

  • covar_means (np.ndarray, optional) – Means of covariates. Defaults to None.

  • covar_encoders (List[LabelEncoder], optional) – Encoders for covariates. Defaults to None.

  • train_idx (np.ndarray, optional) – Boolean array indicating training data. Defaults to None.

  • test_idx (np.ndarray, optional) – Boolean array indicating testing data. Defaults to None.

Returns:

None

plot_lowdim_rep(low_dim: ndarray, labels: ndarray, **kwargs)[source]#

Plot a low-dimensional representation of a dataset.

Parameters:
  • low_dim (np.ndarray) – Low-dimensional representation of shape (n_samples, 2).

  • labels (np.ndarray) – Labels of shape (n_samples,).

  • kwargs – Keyword arguments for plotting.

Returns:

None

plot_embedding_for_all_covars(reps: ndarray, covars_df: DataFrame, covars_stds: Optional[ndarray] = None, covars_means: Optional[ndarray] = None, covars_encoders: Optional[List[Callable]] = None, **kwargs) None[source]#

Plot embeddings of representations for all covariates in a Pandas dataframe.

Parameters:
  • reps (np.ndarray) – Embeddings of shape (n_samples, n_dims).

  • covars_df (pd.DataFrame) – DataFrame of covariates.

  • covars_stds (np.ndarray, optional) – Standard deviations of covariates. Defaults to None.

  • covars_means (np.ndarray, optional) – Means of covariates. Defaults to None.

  • covars_encoders (List[LabelEncoder], optional) – Encoders for covariates. Defaults to None.

  • kwargs – Keyword arguments for plotting.

Returns:

None

plot_homogeneous_context_effects(model: SKLearnWrapper, C: ndarray, **kwargs) None[source]#

Plot the direct effect of context on outcomes, disregarding other features. This context effect is homogeneous in that it is a static function of context (context-invariant).

Parameters:
  • model (SKLearnWrapper) – a fitted contextualized.easy model

  • C – the context values to use to estimate the effects

  • verbose (bool, optional) – print progess. Default True.

  • individual_preds (bool, optional) – whether to use plot each bootstrap. Default True.

  • C_vis (np.ndarray, optional) – Context bins used to visualize context (n_vis, n_contexts). Default None to construct anew.

  • n_vis (int, optional) – Number of bins to use to visualize context. Default 1000.

  • lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.

  • upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.

  • classification (bool, optional) – Whether to exponentiate the effects. Default True.

  • C_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each context. Default None.

  • C_means (np.ndarray, optional) – means for each context. Default None.

  • C_stds (np.ndarray, optional) – standard deviations for each context. Default None.

  • xlabel_prefix (str, optional) – prefix for x label. Default “”.

  • figname (str, optional) – name of figure to save. Default None.

Returns:

None

plot_homogeneous_predictor_effects(model: SKLearnWrapper, C: ndarray, X: ndarray, **kwargs) None[source]#

Plot the effect of predictors on outcomes that do not change with context (homogeneous).

Parameters:
  • model (SKLearnWrapper) – a fitted contextualized.easy model

  • C – the context values to use to estimate the effects

  • X – the predictor values to use to estimate the effects

  • max_classes_for_discrete (int, optional) – maximum number of classes to treat as discrete. Default 10.

  • min_effect_size (float, optional) – minimum effect size to plot. Default 0.003.

  • ylabel (str, optional) – y label for plot. Default “Influence of “.

  • xlabel_prefix (str, optional) – prefix for x label. Default “”.

  • X_names (List[str], optional) – names of predictors. Default None.

  • X_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each predictor. Default None.

  • X_means (np.ndarray, optional) – means for each predictor. Default None.

  • X_stds (np.ndarray, optional) – standard deviations for each predictor. Default None.

  • verbose (bool, optional) – print progess. Default True.

  • lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.

  • upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.

  • classification (bool, optional) – Whether to exponentiate the effects. Default True.

  • figname (str, optional) – name of figure to save. Default None.

Returns:

None

plot_heterogeneous_predictor_effects(model, C, X, **kwargs)[source]#

Plot how the effect of predictors on outcomes changes with context (heterogeneous).

Parameters:
  • model (SKLearnWrapper) – a fitted contextualized.easy model

  • C – the context values to use to estimate the effects

  • X – the predictor values to use to estimate the effects

  • max_classes_for_discrete (int, optional) – maximum number of classes to treat as discrete. Default 10.

  • min_effect_size (float, optional) – minimum effect size to plot. Default 0.003.

  • y_prefix (str, optional) – y prefix for plot. Default “Influence of “.

  • X_names (List[str], optional) – names of predictors. Default None.

  • verbose (bool, optional) – print progess. Default True.

  • individual_preds (bool, optional) – whether to use plot each bootstrap. Default True.

  • C_vis (np.ndarray, optional) – Context bins used to visualize context (n_vis, n_contexts). Default None to construct anew.

  • n_vis (int, optional) – Number of bins to use to visualize context. Default 1000.

  • lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.

  • upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.

  • classification (bool, optional) – Whether to exponentiate the effects. Default True.

  • C_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each context. Default None.

  • C_means (np.ndarray, optional) – means for each context. Default None.

  • C_stds (np.ndarray, optional) – standard deviations for each context. Default None.

  • xlabel_prefix (str, optional) – prefix for x label. Default “”.

  • figname (str, optional) – name of figure to save. Default None.

Returns:

None

calc_homogeneous_context_effects_pvals(model: SKLearnWrapper, C: ndarray, **kwargs) ndarray[source]#

Calculate p-values for the effects of context.

Parameters:
  • model (SKLearnWrapper) – Model to analyze.

  • C (np.ndarray) – Contexts to analyze.

Returns:

P-values of shape (n_contexts, n_outcomes) testing whether the

sign of the direct effect of context on outcomes is consistent across bootstraps.

Return type:

np.ndarray

calc_homogeneous_predictor_effects_pvals(model: SKLearnWrapper, C: ndarray, **kwargs) ndarray[source]#

Calculate p-values for the context-invariant effects of predictors.

Parameters:
  • model (SKLearnWrapper) – Model to analyze.

  • C (np.ndarray) – Contexts to analyze.

Returns:

P-values of shape (n_predictors, n_outcomes) testing whether the

sign of the context-invariant predictor effects are consistent across bootstraps.

Return type:

np.ndarray

calc_heterogeneous_predictor_effects_pvals(model, C, **kwargs)[source]#

Calculate p-values for the heterogeneous effects of predictors.

Parameters:
  • model (SKLearnWrapper) – Model to analyze.

  • C (np.ndarray) – Contexts to analyze.

Returns:

P-values of shape (n_contexts, n_predictors, n_outcomes) testing whether the

context-varying parameter range is consistent across bootstraps.

Return type:

np.ndarray