This report is the output of the Amazon SageMaker Clarify analysis. The report is split into following parts:
1. Analysis configuration
2. Pretraining bias metrics
Bias analysis requires you to configure the outcome label column, the facet and optionally a group variable. Generating explanations requires you to configure the outcome label. You configured the analysis with the following variables. The complete analysis configuration is appended at the end.
Outcome label: You chose the column sentiment in the input data as the outcome label. Bias metric computation requires designating the positive outcome. You chose sentiment=1 as the positive outcome. sentiment consisted of values [-1, 0, 1].
The figure below shows the distribution of values of sentiment.
Facet: You chose the column product_category in the input data as the facet. product_category consisted of values ['Blouses', 'Dresses', 'Fine gauge', 'Intimates', 'Jackets', 'Jeans', 'Knits', 'Layering', 'Legwear', 'Lounge', 'Outerwear', 'Pants', 'Shorts', 'Skirts', 'Sleep', 'Sweaters', 'Swim', 'Trend']. Bias metrics were computed by comparing the inputs product_category=Blouses with all other inputs, then by comparing inputs product_category=Dresses with all other inputs, then by comparing inputs product_category=Pants with all other inputs, then by comparing inputs product_category=Knits with all other inputs, then by comparing inputs product_category=Intimates with all other inputs, then by comparing inputs product_category=Outerwear with all other inputs, then by comparing inputs product_category=Lounge with all other inputs, then by comparing inputs product_category=Sweaters with all other inputs, then by comparing inputs product_category=Skirts with all other inputs, then by comparing inputs product_category=Fine gauge with all other inputs, then by comparing inputs product_category=Sleep with all other inputs, then by comparing inputs product_category=Jackets with all other inputs, then by comparing inputs product_category=Swim with all other inputs, then by comparing inputs product_category=Trend with all other inputs, then by comparing inputs product_category=Jeans with all other inputs, then by comparing inputs product_category=Legwear with all other inputs, then by comparing inputs product_category=Shorts with all other inputs, then by comparing inputs product_category=Layering with all other inputs.
The figure below shows the distribution of values of product_category.
Pretraining bias metrics measure imbalances in facet value representation in the training data. Imbalances can be measured across different dimensions. For instance, you could focus imbalances within the inputs with positive observed label only. The figure below shows how different pretraining bias metrics focus on different dimensions. For a detailed description of these dimensions, see Learn How Amazon SageMaker Clarify Helps Detect Bias.
The metric values along with an informal description of what they mean are shown below. For mathematical formulas and examples, see the Measure Pretraining Bias section of the AWS documentation.
We computed the bias metrics for the label sentiment using label value(s)/threshold 1
product_category
The groups are represented in the dataset with the following proportions.
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.736321 |
| DPL | Difference in Positive Proportions in Labels (DPL) | 0.016356 |
| JS | Jensen-Shannon Divergence (JS) | 0.000186 |
| KL | Kullback-Liebler Divergence (KL) | 0.000737 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.016356 |
| LP | L-p Norm (LP) | 0.023131 |
| TVD | Total Variation Distance (TVD) | 0.016356 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.45682 |
| DPL | Difference in Positive Proportions in Labels (DPL) | 0.022482 |
| JS | Jensen-Shannon Divergence (JS) | 0.000352 |
| KL | Kullback-Liebler Divergence (KL) | 0.001392 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.022482 |
| LP | L-p Norm (LP) | 0.031795 |
| TVD | Total Variation Distance (TVD) | 0.022482 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.880668 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.026661 |
| JS | Jensen-Shannon Divergence (JS) | 0.000522 |
| KL | Kullback-Liebler Divergence (KL) | 0.002119 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.026661 |
| LP | L-p Norm (LP) | 0.037704 |
| TVD | Total Variation Distance (TVD) | 0.026661 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.59109 |
| DPL | Difference in Positive Proportions in Labels (DPL) | 0.011213 |
| JS | Jensen-Shannon Divergence (JS) | 0.000088 |
| KL | Kullback-Liebler Divergence (KL) | 0.00035 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.011213 |
| LP | L-p Norm (LP) | 0.015857 |
| TVD | Total Variation Distance (TVD) | 0.011213 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.987006 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.025599 |
| JS | Jensen-Shannon Divergence (JS) | 0.000483 |
| KL | Kullback-Liebler Divergence (KL) | 0.001959 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.025599 |
| LP | L-p Norm (LP) | 0.036203 |
| TVD | Total Variation Distance (TVD) | 0.025599 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.971802 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.026121 |
| JS | Jensen-Shannon Divergence (JS) | 0.000503 |
| KL | Kullback-Liebler Divergence (KL) | 0.00204 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.026121 |
| LP | L-p Norm (LP) | 0.036941 |
| TVD | Total Variation Distance (TVD) | 0.026121 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.940864 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.045509 |
| JS | Jensen-Shannon Divergence (JS) | 0.001573 |
| KL | Kullback-Liebler Divergence (KL) | 0.006474 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.045509 |
| LP | L-p Norm (LP) | 0.06436 |
| TVD | Total Variation Distance (TVD) | 0.045509 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.878016 |
| DPL | Difference in Positive Proportions in Labels (DPL) | 0.021044 |
| JS | Jensen-Shannon Divergence (JS) | 0.000305 |
| KL | Kullback-Liebler Divergence (KL) | 0.001207 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.021044 |
| LP | L-p Norm (LP) | 0.029761 |
| TVD | Total Variation Distance (TVD) | 0.021044 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.92018 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.021053 |
| JS | Jensen-Shannon Divergence (JS) | 0.000323 |
| KL | Kullback-Liebler Divergence (KL) | 0.001308 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.021053 |
| LP | L-p Norm (LP) | 0.029773 |
| TVD | Total Variation Distance (TVD) | 0.021053 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.906391 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.020859 |
| JS | Jensen-Shannon Divergence (JS) | 0.000317 |
| KL | Kullback-Liebler Divergence (KL) | 0.001283 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.020859 |
| LP | L-p Norm (LP) | 0.0295 |
| TVD | Total Variation Distance (TVD) | 0.020859 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.981084 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.047723 |
| JS | Jensen-Shannon Divergence (JS) | 0.001743 |
| KL | Kullback-Liebler Divergence (KL) | 0.007185 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.047723 |
| LP | L-p Norm (LP) | 0.067491 |
| TVD | Total Variation Distance (TVD) | 0.047723 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.939627 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.035868 |
| JS | Jensen-Shannon Divergence (JS) | 0.000961 |
| KL | Kullback-Liebler Divergence (KL) | 0.003928 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.035868 |
| LP | L-p Norm (LP) | 0.050725 |
| TVD | Total Variation Distance (TVD) | 0.035868 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.970653 |
| DPL | Difference in Positive Proportions in Labels (DPL) | 0.01162 |
| JS | Jensen-Shannon Divergence (JS) | 0.000094 |
| KL | Kullback-Liebler Divergence (KL) | 0.000373 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.01162 |
| LP | L-p Norm (LP) | 0.016433 |
| TVD | Total Variation Distance (TVD) | 0.01162 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.98957 |
| DPL | Difference in Positive Proportions in Labels (DPL) | 0.110042 |
| JS | Jensen-Shannon Divergence (JS) | 0.00748 |
| KL | Kullback-Liebler Divergence (KL) | 0.028876 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.110042 |
| LP | L-p Norm (LP) | 0.155623 |
| TVD | Total Variation Distance (TVD) | 0.110042 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.902413 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.055597 |
| JS | Jensen-Shannon Divergence (JS) | 0.002382 |
| KL | Kullback-Liebler Divergence (KL) | 0.009875 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.055597 |
| LP | L-p Norm (LP) | 0.078626 |
| TVD | Total Variation Distance (TVD) | 0.055597 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.986034 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.027173 |
| JS | Jensen-Shannon Divergence (JS) | 0.000545 |
| KL | Kullback-Liebler Divergence (KL) | 0.002215 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.027173 |
| LP | L-p Norm (LP) | 0.038428 |
| TVD | Total Variation Distance (TVD) | 0.027173 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.973128 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.019247 |
| JS | Jensen-Shannon Divergence (JS) | 0.00027 |
| KL | Kullback-Liebler Divergence (KL) | 0.001091 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.019247 |
| LP | L-p Norm (LP) | 0.027219 |
| TVD | Total Variation Distance (TVD) | 0.019247 |
| name | description | value |
|---|---|---|
| CI | Class Imbalance (CI) | 0.988332 |
| DPL | Difference in Positive Proportions in Labels (DPL) | -0.086077 |
| JS | Jensen-Shannon Divergence (JS) | 0.006138 |
| KL | Kullback-Liebler Divergence (KL) | 0.026226 |
| KS | Kolmogorov-Smirnov Distance (KS) | 0.086077 |
| LP | L-p Norm (LP) | 0.121732 |
| TVD | Total Variation Distance (TVD) | 0.086077 |
{
"dataset_type": "text/csv",
"headers": [
"sentiment",
"review_body",
"product_category"
],
"label": "sentiment",
"label_values_or_threshold": [
1
],
"facet": [
{
"name_or_index": "product_category"
}
],
"methods": {
"pre_training_bias": {
"methods": [
"CI",
"DPL",
"KL",
"JS",
"LP",
"TVD",
"KS"
]
},
"report": {
"name": "report",
"title": "Analysis Report"
}
}
}