Analysis#
contextualized.analysis
contains functions to analyze and plot the results of contextualized models.
All functions can be loaded directly from the module, e.g. from contextualized.analysis import plot_heterogeneous_predictor_effects
.
Calculate p-values for the effects of context directly on the outcome. |
|
Calculate p-values for the context-invariant effects of predictors. |
|
Calculate p-values for the heterogeneous (context-dependent) effects of predictors. |
|
Test heterogeneous predictor effects attributed to every individual context feature. |
|
Get the range of possible p-values based on the number of bootstraps. |
|
Prints AUROC for each class for different covariate splits. |
|
Prune any divergent or bad bootstraps with mean training errors below tol * min(training errors). |
|
Plot a low-dimensional representation of a dataset. |
|
Plot embeddings of representations for all covariates in a Pandas dataframe. |
|
Plot the direct effect of context on outcomes, disregarding other features. |
|
Plot the effect of predictors on outcomes that do not change with context (homogeneous). |
|
Plot how the effect of predictors on outcomes changes with context (heterogeneous). |
- calc_homogeneous_context_effects_pvals(model: SKLearnWrapper, C: ndarray, verbose: bool = True, **kwargs) ndarray [source]#
Calculate p-values for the effects of context directly on the outcome.
- Parameters:
model (SKLearnWrapper) – Model to analyze.
C (np.ndarray) – Contexts to analyze.
verbose (bool) – Whether to print the range of possible p-values.
- Returns:
- P-values of shape (n_contexts, n_outcomes) testing whether the
sign of the direct effect of context on outcomes is consistent across bootstraps.
- Return type:
np.ndarray
- Raises:
ValueError – If the model’s n_bootstraps is less than 2.
- calc_homogeneous_predictor_effects_pvals(model: SKLearnWrapper, C: ndarray, verbose: bool = True, **kwargs) ndarray [source]#
Calculate p-values for the context-invariant effects of predictors.
- Parameters:
model (SKLearnWrapper) – Model to analyze.
C (np.ndarray) – Contexts to analyze.
verbose (bool) – Whether to print the range of possible p-values.
- Returns:
- P-values of shape (n_predictors, n_outcomes) testing whether the
sign of the context-invariant predictor effects are consistent across bootstraps.
- Return type:
np.ndarray
- Raises:
ValueError – If the model’s n_bootstraps is less than 2.
- calc_heterogeneous_predictor_effects_pvals(model, C: ndarray, verbose: bool = True, **kwargs) ndarray [source]#
Calculate p-values for the heterogeneous (context-dependent) effects of predictors.
- Parameters:
model (SKLearnWrapper) – Model to analyze.
C (np.ndarray) – Contexts to analyze.
verbose (bool) – Whether to print the range of possible p-values.
- Returns:
- P-values of shape (n_contexts, n_predictors, n_outcomes) testing whether the
context-varying parameter range is consistent across bootstraps.
- Return type:
np.ndarray
- Raises:
ValueError – If the model’s n_bootstraps is less than 2.
- test_each_context(model_constructor: Type[SKLearnWrapper], C: DataFrame, X: DataFrame, Y: DataFrame, verbose: bool = True, model_kwargs: Dict = {'encoder_type': 'linear'}, fit_kwargs: Dict = {'learning_rate': 0.01, 'max_epochs': 3, 'n_bootstraps': 20}) DataFrame [source]#
Test heterogeneous predictor effects attributed to every individual context feature. Applies test_heterogeneous_predictor_effects to a model learned for a single context feature in C, and does this sequentially for every context feature.
- Parameters:
model_constructor (SKLearnWrapper) – The constructor of the model to be tested, currently either ContextualizedRegressor or ContextualizedClassifier.
C (pd.DataFrame) – The context dataframe (n_samples, n_contexts).
X (pd.DataFrame) – The predictor dataframe (n_samples, n_predictors).
Y (pd.DataFrame) – The outcome, target, or label dataframe (n_samples, n_outcomes).
verbose (bool) – Whether to print the range of possible p-values.
**kwargs – Additional arguments for the model constructor.
- Returns:
A DataFrame of p-values for each (context, predictor, outcome) combination, describing how much the predictor’s effect on the outcome varies across the context.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the model’s n_bootstraps is less than 2.
- get_possible_pvals(num_bootstraps: int) list [source]#
Get the range of possible p-values based on the number of bootstraps.
- Parameters:
num_bootstraps (int) – The number of bootstraps.
- Returns:
The minimum and maximum possible p-values.
- Return type:
list
- print_acc_by_covars(Y_true: ndarray, Y_pred: ndarray, covar_df: DataFrame, **kwargs) None [source]#
Prints AUROC for each class for different covariate splits. Should only be used with ContextualizedClassifier.
- Parameters:
Y_true (np.ndarray) – True labels.
Y_pred (np.ndarray) – Predicted labels.
covar_df (pd.DataFrame) – DataFrame of covariates.
max_classes (int, optional) – Maximum number of classes to print. Defaults to 20.
covar_stds (np.ndarray, optional) – Standard deviations of covariates. Defaults to None.
covar_means (np.ndarray, optional) – Means of covariates. Defaults to None.
covar_encoders (List[LabelEncoder], optional) – Encoders for covariates. Defaults to None.
train_idx (np.ndarray, optional) – Boolean array indicating training data. Defaults to None.
test_idx (np.ndarray, optional) – Boolean array indicating testing data. Defaults to None.
- Returns:
None
- select_good_bootstraps(sklearn_wrapper: SKLearnWrapper, train_errs: ndarray, tol: float = 2) SKLearnWrapper [source]#
Prune any divergent or bad bootstraps with mean training errors below tol * min(training errors).
- Parameters:
sklearn_wrapper (contextualized.easy.wrappers.SKLearnWrapper) – Wrapper for the sklearn model.
train_errs (np.ndarray) – Training errors for each bootstrap (n_bootstraps, n_samples, n_outcomes).
tol (float) – Only bootstraps with mean train_errs below tol * min(train_errs) are kept.
- Returns:
The input model with only selected bootstraps.
- Return type:
contextualized.easy.wrappers.SKLearnWrapper
- plot_lowdim_rep(low_dim: ndarray, labels: ndarray, **kwargs)[source]#
Plot a low-dimensional representation of a dataset.
- Parameters:
low_dim (np.ndarray) – Low-dimensional representation of shape (n_samples, 2).
labels (np.ndarray) – Labels of shape (n_samples,).
kwargs – Keyword arguments for plotting.
- Returns:
None
- plot_embedding_for_all_covars(reps: ndarray, covars_df: DataFrame, covars_stds: Optional[ndarray] = None, covars_means: Optional[ndarray] = None, covars_encoders: Optional[List[Callable]] = None, **kwargs) None [source]#
Plot embeddings of representations for all covariates in a Pandas dataframe.
- Parameters:
reps (np.ndarray) – Embeddings of shape (n_samples, n_dims).
covars_df (pd.DataFrame) – DataFrame of covariates.
covars_stds (np.ndarray, optional) – Standard deviations of covariates. Defaults to None.
covars_means (np.ndarray, optional) – Means of covariates. Defaults to None.
covars_encoders (List[LabelEncoder], optional) – Encoders for covariates. Defaults to None.
kwargs – Keyword arguments for plotting.
- Returns:
None
- plot_homogeneous_context_effects(model: SKLearnWrapper, C: ndarray, **kwargs) None [source]#
Plot the direct effect of context on outcomes, disregarding other features. This context effect is homogeneous in that it is a static function of context (context-invariant).
- Parameters:
model (SKLearnWrapper) – a fitted
contextualized.easy
modelC – the context values to use to estimate the effects
verbose (bool, optional) – print progess. Default True.
individual_preds (bool, optional) – whether to use plot each bootstrap. Default True.
C_vis (np.ndarray, optional) – Context bins used to visualize context (n_vis, n_contexts). Default None to construct anew.
n_vis (int, optional) – Number of bins to use to visualize context. Default 1000.
lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.
upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.
classification (bool, optional) – Whether to exponentiate the effects. Default True.
C_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each context. Default None.
C_means (np.ndarray, optional) – means for each context. Default None.
C_stds (np.ndarray, optional) – standard deviations for each context. Default None.
xlabel_prefix (str, optional) – prefix for x label. Default “”.
figname (str, optional) – name of figure to save. Default None.
- Returns:
None
- plot_homogeneous_predictor_effects(model: SKLearnWrapper, C: ndarray, X: ndarray, **kwargs) None [source]#
Plot the effect of predictors on outcomes that do not change with context (homogeneous).
- Parameters:
model (SKLearnWrapper) – a fitted
contextualized.easy
modelC – the context values to use to estimate the effects
X – the predictor values to use to estimate the effects
max_classes_for_discrete (int, optional) – maximum number of classes to treat as discrete. Default 10.
min_effect_size (float, optional) – minimum effect size to plot. Default 0.003.
ylabel (str, optional) – y label for plot. Default “Influence of “.
xlabel_prefix (str, optional) – prefix for x label. Default “”.
X_names (List[str], optional) – names of predictors. Default None.
X_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each predictor. Default None.
X_means (np.ndarray, optional) – means for each predictor. Default None.
X_stds (np.ndarray, optional) – standard deviations for each predictor. Default None.
verbose (bool, optional) – print progess. Default True.
lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.
upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.
classification (bool, optional) – Whether to exponentiate the effects. Default True.
figname (str, optional) – name of figure to save. Default None.
- Returns:
None
- plot_heterogeneous_predictor_effects(model, C, X, **kwargs)[source]#
Plot how the effect of predictors on outcomes changes with context (heterogeneous).
- Parameters:
model (SKLearnWrapper) – a fitted
contextualized.easy
modelC – the context values to use to estimate the effects
X – the predictor values to use to estimate the effects
max_classes_for_discrete (int, optional) – maximum number of classes to treat as discrete. Default 10.
min_effect_size (float, optional) – minimum effect size to plot. Default 0.003.
y_prefix (str, optional) – y prefix for plot. Default “Influence of “.
X_names (List[str], optional) – names of predictors. Default None.
verbose (bool, optional) – print progess. Default True.
individual_preds (bool, optional) – whether to use plot each bootstrap. Default True.
C_vis (np.ndarray, optional) – Context bins used to visualize context (n_vis, n_contexts). Default None to construct anew.
n_vis (int, optional) – Number of bins to use to visualize context. Default 1000.
lower_pct (int, optional) – Lower percentile for bootstraps. Default 2.5.
upper_pct (int, optional) – Upper percentile for bootstraps. Default 97.5.
classification (bool, optional) – Whether to exponentiate the effects. Default True.
C_encoders (List[sklearn.preprocessing.LabelEncoder], optional) – encoders for each context. Default None.
C_means (np.ndarray, optional) – means for each context. Default None.
C_stds (np.ndarray, optional) – standard deviations for each context. Default None.
xlabel_prefix (str, optional) – prefix for x label. Default “”.
figname (str, optional) – name of figure to save. Default None.
- Returns:
None