dfds_ds_toolbox.analysis.plotting module
- dfds_ds_toolbox.analysis.plotting.get_trend_stats(data, target_col, features_list=None, bins=10, data_test=None)
Calculates trend changes and correlation between train/test for list of features.
- Parameters
data (
DataFrame) – dataframe containing features and target columnstarget_col (
str) – target column namefeatures_list (
Optional[List[str],None]) – by default creates plots for all features. If list passed, creates plots of only those features.bins (
int) – number of bins to be created from continuous featuredata_test (
Optional[DataFrame,None]) – test data which has to be compared with input data for correlation
- Return type
DataFrame- Returns
dataframe with trend changes and trend correlation (if test data passed)
- dfds_ds_toolbox.analysis.plotting.plot_classification_proba_histogram(y_true, y_pred, ax=None)
Plot histogram of predictions for binary classifiers.
- Parameters
y_true (
Sequence[int]) – 1D array of binary target values, 0 or 1.y_pred (
Sequence[float]) – 1D array of predicted target values, probability of class 1.ax (
Optional[Axes,None]) – Optional pre-existing axis to plot on
- Return type
Figure
- dfds_ds_toolbox.analysis.plotting.plot_gain_chart(y_true, y_pred, n_bins=10, ax=None)
- The cumulative gains chart shows the percentage of the overall number of cases in a given
category “gained” by targeting a percentage of the total number of cases.
- Parameters
y_true (
Sequence[int]) – array with observed values, either 0 or 1.y_pred (
Sequence[float]) – array with predicted probabilities, float between 0 and 1.n_bins (
int) – number of bins to useax (
Optional[Axes,None]) – Optional pre-existing axis to plot on
- Return type
Figure- Returns
matplotlib Figure
- dfds_ds_toolbox.analysis.plotting.plot_lift_curve(y_true, y_pred, n_bins=10, ax=None)
Plot lift curve, i.e. how much better than baserate is the model at different thresholds.
Lift of 1 corresponds to predicting the baserate for the whole sample.
- Parameters
y_true (
Sequence[int]) – array with observed values, either 0 or 1.y_pred (
Sequence[float]) – array with predicted probabilities, float between 0 and 1.n_bins (
int) – number of bins to useax (
Optional[Axes,None]) – Optional pre-existing axis to plot on
- Return type
Figure- Returns
matplotlib Figure
- dfds_ds_toolbox.analysis.plotting.plot_regression_predicted_vs_actual(y_true, y_pred, alpha=0.2, ax=None)
Scatter plot of the predicted vs true targets for regression problems.
- Parameters
y_true (
Sequence[float]) – array with observed valuesy_pred (
Sequence[float]) – array with predicted valuesalpha (
float) – transparency of the dots on the scatter plotax (
Optional[Axes,None]) – Optional pre-existing axis to plot on
- Return type
Figure- Returns
Figure
- dfds_ds_toolbox.analysis.plotting.plot_roc_curve(y_true, y_pred, label='Train', ax=None)
plot roc curve for train and test
- Parameters
y_true (
Sequence[int]) – array with observed classesy_pred (
Sequence[float]) – array with predicted probabilitieslabel (
str) – extra text to add, e.g. “Train” or “Test”ax (
Optional[Axes,None]) – Optional pre-existing axis to plot on
- Return type
Figure- Returns
Figure
- dfds_ds_toolbox.analysis.plotting.plot_univariate_dependencies(data, target_col, features_list=None, bins=10, data_test=None)
Creates univariate dependence plots for features in the dataset
- Parameters
data (
DataFrame) – dataframe containing features and target columnstarget_col (
str) – target column namefeatures_list (
Optional[List[str],None]) – by default creates plots for all features. If list passed, creates plots of only those features.bins (
int) – number of bins to be created from continuous featuredata_test (
Optional[DataFrame,None]) – test data which has to be compared with input data for correlation
- Returns
Draws univariate plots for all columns in data