This report is the output of the Amazon SageMaker Clarify analysis. The report is split into following parts:
1. Analysis configuration
2. Pretraining bias metrics
Bias analysis requires you to configure the outcome label column, the facet and optionally a group variable. Generating explanations requires you to configure the outcome label. You configured the analysis with the following variables. The complete analysis configuration is appended at the end.
Outcome label: You chose the column sentiment
in the input data as the outcome label. Bias metric computation requires designating the positive outcome. You chose sentiment=1
as the positive outcome. sentiment
consisted of values [-1, 0, 1]
.
The figure below shows the distribution of values of sentiment
.
Facet: You chose the column product_category
in the input data as the facet. product_category
consisted of values ['Blouses', 'Dresses', 'Fine gauge', 'Intimates', 'Jackets', 'Jeans', 'Knits', 'Layering', 'Legwear', 'Lounge', 'Outerwear', 'Pants', 'Shorts', 'Skirts', 'Sleep', 'Sweaters', 'Swim', 'Trend']
. Bias metrics were computed by comparing the inputs product_category=Blouses
with all other inputs, then by comparing inputs product_category=Dresses
with all other inputs, then by comparing inputs product_category=Pants
with all other inputs, then by comparing inputs product_category=Knits
with all other inputs, then by comparing inputs product_category=Intimates
with all other inputs, then by comparing inputs product_category=Outerwear
with all other inputs, then by comparing inputs product_category=Lounge
with all other inputs, then by comparing inputs product_category=Sweaters
with all other inputs, then by comparing inputs product_category=Skirts
with all other inputs, then by comparing inputs product_category=Fine gauge
with all other inputs, then by comparing inputs product_category=Sleep
with all other inputs, then by comparing inputs product_category=Jackets
with all other inputs, then by comparing inputs product_category=Swim
with all other inputs, then by comparing inputs product_category=Trend
with all other inputs, then by comparing inputs product_category=Jeans
with all other inputs, then by comparing inputs product_category=Legwear
with all other inputs, then by comparing inputs product_category=Shorts
with all other inputs, then by comparing inputs product_category=Layering
with all other inputs.
The figure below shows the distribution of values of product_category
.
Pretraining bias metrics measure imbalances in facet value representation in the training data. Imbalances can be measured across different dimensions. For instance, you could focus imbalances within the inputs with positive observed label only. The figure below shows how different pretraining bias metrics focus on different dimensions. For a detailed description of these dimensions, see Learn How Amazon SageMaker Clarify Helps Detect Bias.
The metric values along with an informal description of what they mean are shown below. For mathematical formulas and examples, see the Measure Pretraining Bias section of the AWS documentation.
We computed the bias metrics for the label sentiment
using label value(s)/threshold 1
product_category
The groups are represented in the dataset with the following proportions.
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.736321 |
DPL | Difference in Positive Proportions in Labels (DPL) | 0.016356 |
JS | Jensen-Shannon Divergence (JS) | 0.000186 |
KL | Kullback-Liebler Divergence (KL) | 0.000737 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.016356 |
LP | L-p Norm (LP) | 0.023131 |
TVD | Total Variation Distance (TVD) | 0.016356 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.45682 |
DPL | Difference in Positive Proportions in Labels (DPL) | 0.022482 |
JS | Jensen-Shannon Divergence (JS) | 0.000352 |
KL | Kullback-Liebler Divergence (KL) | 0.001392 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.022482 |
LP | L-p Norm (LP) | 0.031795 |
TVD | Total Variation Distance (TVD) | 0.022482 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.880668 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.026661 |
JS | Jensen-Shannon Divergence (JS) | 0.000522 |
KL | Kullback-Liebler Divergence (KL) | 0.002119 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.026661 |
LP | L-p Norm (LP) | 0.037704 |
TVD | Total Variation Distance (TVD) | 0.026661 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.59109 |
DPL | Difference in Positive Proportions in Labels (DPL) | 0.011213 |
JS | Jensen-Shannon Divergence (JS) | 0.000088 |
KL | Kullback-Liebler Divergence (KL) | 0.00035 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.011213 |
LP | L-p Norm (LP) | 0.015857 |
TVD | Total Variation Distance (TVD) | 0.011213 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.987006 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.025599 |
JS | Jensen-Shannon Divergence (JS) | 0.000483 |
KL | Kullback-Liebler Divergence (KL) | 0.001959 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.025599 |
LP | L-p Norm (LP) | 0.036203 |
TVD | Total Variation Distance (TVD) | 0.025599 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.971802 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.026121 |
JS | Jensen-Shannon Divergence (JS) | 0.000503 |
KL | Kullback-Liebler Divergence (KL) | 0.00204 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.026121 |
LP | L-p Norm (LP) | 0.036941 |
TVD | Total Variation Distance (TVD) | 0.026121 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.940864 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.045509 |
JS | Jensen-Shannon Divergence (JS) | 0.001573 |
KL | Kullback-Liebler Divergence (KL) | 0.006474 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.045509 |
LP | L-p Norm (LP) | 0.06436 |
TVD | Total Variation Distance (TVD) | 0.045509 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.878016 |
DPL | Difference in Positive Proportions in Labels (DPL) | 0.021044 |
JS | Jensen-Shannon Divergence (JS) | 0.000305 |
KL | Kullback-Liebler Divergence (KL) | 0.001207 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.021044 |
LP | L-p Norm (LP) | 0.029761 |
TVD | Total Variation Distance (TVD) | 0.021044 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.92018 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.021053 |
JS | Jensen-Shannon Divergence (JS) | 0.000323 |
KL | Kullback-Liebler Divergence (KL) | 0.001308 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.021053 |
LP | L-p Norm (LP) | 0.029773 |
TVD | Total Variation Distance (TVD) | 0.021053 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.906391 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.020859 |
JS | Jensen-Shannon Divergence (JS) | 0.000317 |
KL | Kullback-Liebler Divergence (KL) | 0.001283 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.020859 |
LP | L-p Norm (LP) | 0.0295 |
TVD | Total Variation Distance (TVD) | 0.020859 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.981084 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.047723 |
JS | Jensen-Shannon Divergence (JS) | 0.001743 |
KL | Kullback-Liebler Divergence (KL) | 0.007185 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.047723 |
LP | L-p Norm (LP) | 0.067491 |
TVD | Total Variation Distance (TVD) | 0.047723 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.939627 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.035868 |
JS | Jensen-Shannon Divergence (JS) | 0.000961 |
KL | Kullback-Liebler Divergence (KL) | 0.003928 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.035868 |
LP | L-p Norm (LP) | 0.050725 |
TVD | Total Variation Distance (TVD) | 0.035868 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.970653 |
DPL | Difference in Positive Proportions in Labels (DPL) | 0.01162 |
JS | Jensen-Shannon Divergence (JS) | 0.000094 |
KL | Kullback-Liebler Divergence (KL) | 0.000373 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.01162 |
LP | L-p Norm (LP) | 0.016433 |
TVD | Total Variation Distance (TVD) | 0.01162 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.98957 |
DPL | Difference in Positive Proportions in Labels (DPL) | 0.110042 |
JS | Jensen-Shannon Divergence (JS) | 0.00748 |
KL | Kullback-Liebler Divergence (KL) | 0.028876 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.110042 |
LP | L-p Norm (LP) | 0.155623 |
TVD | Total Variation Distance (TVD) | 0.110042 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.902413 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.055597 |
JS | Jensen-Shannon Divergence (JS) | 0.002382 |
KL | Kullback-Liebler Divergence (KL) | 0.009875 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.055597 |
LP | L-p Norm (LP) | 0.078626 |
TVD | Total Variation Distance (TVD) | 0.055597 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.986034 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.027173 |
JS | Jensen-Shannon Divergence (JS) | 0.000545 |
KL | Kullback-Liebler Divergence (KL) | 0.002215 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.027173 |
LP | L-p Norm (LP) | 0.038428 |
TVD | Total Variation Distance (TVD) | 0.027173 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.973128 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.019247 |
JS | Jensen-Shannon Divergence (JS) | 0.00027 |
KL | Kullback-Liebler Divergence (KL) | 0.001091 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.019247 |
LP | L-p Norm (LP) | 0.027219 |
TVD | Total Variation Distance (TVD) | 0.019247 |
name | description | value |
---|---|---|
CI | Class Imbalance (CI) | 0.988332 |
DPL | Difference in Positive Proportions in Labels (DPL) | -0.086077 |
JS | Jensen-Shannon Divergence (JS) | 0.006138 |
KL | Kullback-Liebler Divergence (KL) | 0.026226 |
KS | Kolmogorov-Smirnov Distance (KS) | 0.086077 |
LP | L-p Norm (LP) | 0.121732 |
TVD | Total Variation Distance (TVD) | 0.086077 |
{
"dataset_type": "text/csv",
"headers": [
"sentiment",
"review_body",
"product_category"
],
"label": "sentiment",
"label_values_or_threshold": [
1
],
"facet": [
{
"name_or_index": "product_category"
}
],
"methods": {
"pre_training_bias": {
"methods": [
"CI",
"DPL",
"KL",
"JS",
"LP",
"TVD",
"KS"
]
},
"report": {
"name": "report",
"title": "Analysis Report"
}
}
}