import boto3
import sagemaker
import pandas as pd
import numpy as np
import botocore
import time
import json
= botocore.config.Config(user_agent_extra='dlai-pds/c1/w3')
config
# low-level service client of the boto3 session
= boto3.client(service_name='sagemaker',
sm =config)
config
= boto3.client('sagemaker-runtime',
sm_runtime =config)
config
= sagemaker.Session(sagemaker_client=sm,
sess =sm_runtime)
sagemaker_runtime_client
= sess.default_bucket()
bucket = sagemaker.get_execution_role()
role = sess.boto_region_name region
1 Introduction
In an earlier article we introduced AWS cloud services for data science, and how it can help with different stages of the data science & machine learning workflow.
In this article, we will use Amazon Sagemaker Autopilot to train a natural language processing (NLP) model. The model will analyze customer feedback and classify the messages into positive (1), neutral (0) and negative (-1) sentiment.
Amazon SageMaker Autopilot automatically trains and tunes the best machine learning models for classification or regression, based on your data while allowing to maintain full control and visibility.
SageMaker Autopilot is an example of AutoML, much like Pycaret which I have written about previously. In comparison, not only is Autopilot even more automated than Pycaret, it is also designed to work at large scale as is possible with cloud data science solutions.
SageMaker Autopilot will inspect the raw dataset, apply feature processors, pick the best set of algorithms, train and tune multiple models, and then rank the models based on performance - all with just a few clicks. Autopilot transparently generates a set of Python scripts and notebooks for a complete end-to-end pipeline including data analysis, candidate generation, feature engineering, and model training/tuning.
SageMaker Autopilot job consists of the following high-level steps: * Data analysis where the data is summarized and analyzed to determine which feature engineering techniques, hyper-parameters, and models to explore. * Feature engineering where the data is scrubbed, balanced, combined, and split into train and validation. * Model training and tuning where the top performing features, hyper-parameters, and models are selected and trained.
These re-usable scripts and notebooks give us full visibility into how the model candidates were created. Since Autopilot integrates natively with SageMaker Studio, we can visually explore the different models generated by SageMaker Autopilot.
SageMaker Autopilot can be used by people without machine learning experience to automatically train a model from a dataset. Additionally, experienced developers can use Autopilot to train a baseline model from which they can iterate and manually improve.
Autopilot is available through the SageMaker Studio UI and AWS Python SDK. In this project, we will use the AWS Python SDK to train a series of text-classification models and deploy the model with the highest accuracy.
For more details on Autopilot, please refer to this Amazon Science Publication.
2 Use case: Analyze Customer Sentiment
Customer feedback appears across many channels including social media and partner websites. As a company, you want to capture this valuable product feedback to spot negative trends and improve the situation, if needed. Here we will train a model to classify the feedback messages into positive (1), neutral (0) and negative (-1) sentiment.
First, let’s install and import required modules.
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'
3 Review transformed dataset
Let’s transform the dataset into a format that Autopilot recognizes. Specifically, a comma-separated file of label,features
as shown here:
sentiment,review_body
-1,"this is bad"
0,"this is ok"
1,"this is great"
...
Sentiment is one of three classes: negative (-1), neutral (0), or positive (1). Autopilot requires that the target variable, sentiment
is first and the set of features, just review_body
in this case, come next.
!aws s3 cp 's3://dlai-practical-data-science/data/balanced/womens_clothing_ecommerce_reviews_balanced.csv' ./
download: s3://dlai-practical-data-science/data/balanced/womens_clothing_ecommerce_reviews_balanced.csv to ./womens_clothing_ecommerce_reviews_balanced.csv
= './womens_clothing_ecommerce_reviews_balanced.csv'
path
= pd.read_csv(path, delimiter=',')
df df.head()
sentiment | review_body | product_category | |
---|---|---|---|
0 | -1 | This suit did nothing for me. the top has zero... | Swim |
1 | -1 | Like other reviewers i saw this dress on the ... | Dresses |
2 | -1 | I wish i had read the reviews before purchasin... | Knits |
3 | -1 | I ordered these pants in my usual size (xl) an... | Legwear |
4 | -1 | I noticed this top on one of the sales associa... | Knits |
= './womens_clothing_ecommerce_reviews_balanced_for_autopilot.csv'
path_autopilot
'sentiment', 'review_body']].to_csv(path_autopilot,
df[[=',',
sep=False) index
4 Configure the Autopilot job
4.1 Upload data to S3 bucket
= sess.upload_data(bucket=bucket, key_prefix='autopilot/data', path=path_autopilot)
autopilot_train_s3_uri autopilot_train_s3_uri
's3://sagemaker-us-east-1-491783890788/autopilot/data/womens_clothing_ecommerce_reviews_balanced_for_autopilot.csv'
Check the existence of the dataset in this S3 bucket folder:
!aws s3 ls $autopilot_train_s3_uri
2023-02-05 14:47:43 2253749 womens_clothing_ecommerce_reviews_balanced_for_autopilot.csv
4.2 S3 output for generated assets
Set the S3 output path for the Autopilot outputs. This includes Jupyter notebooks (analysis), Python scripts (feature engineering), and trained models.
= 's3://{}/autopilot'.format(bucket)
model_output_s3_uri
print(model_output_s3_uri)
s3://sagemaker-us-east-1-491783890788/autopilot
4.3 Configure the Autopilot job
Let’s now create the Autopilot job name.
import time
= int(time.time())
timestamp
= 'automl-dm-{}'.format(timestamp) auto_ml_job_name
When configuring our Autopilot job, we need to specify the maximum number of candidates, max_candidates
, to explore as well as the input/output S3 locations and target column to predict. In this case, we want to predict sentiment
from the review text.
We will create an instance of the sagemaker.automl.automl.AutoML
estimator class passing the required configuration parameters. Target attribute for predictions here is sentiment
.
= 3
max_candidates
= sagemaker.automl.automl.AutoML(
automl ='sentiment',
target_attribute_name=auto_ml_job_name,
base_job_name=model_output_s3_uri,
output_path=max_candidates,
max_candidates=sess,
sagemaker_session=role,
role=1200,
max_runtime_per_training_job_in_seconds=7200
total_job_runtime_in_seconds )
5 Launch the Autopilot job
Now we call the fit
function of the configured estimator passing the S3 bucket input data path and the Autopilot job name.
automl.fit(
autopilot_train_s3_uri, =auto_ml_job_name,
job_name=False,
wait=False
logs )
6 Track Autopilot job progress
Once the Autopilot job has been launched, we can track the job progress directly from the notebook using the SDK capabilities.
6.1 Autopilot job description
Function describe_auto_ml_job
of the Amazon SageMaker service returns the information about the AutoML job in dictionary format. We can review the response syntax and response elements in the documentation.
= automl.describe_auto_ml_job(job_name=auto_ml_job_name) job_description_response
6.2 Autopilot job status
To track the job progress we can use two response elements: AutoMLJobStatus
and AutoMLJobSecondaryStatus
, which correspond to the primary (Completed | InProgress | Failed | Stopped | Stopping) and secondary (AnalyzingData | FeatureEngineering | ModelTuning etc.) job states respectively. To see if the AutoML job has started, we can check the existence of the AutoMLJobStatus
and AutoMLJobSecondaryStatus
elements in the job description response.
We will use the following scheme to track the job progress:
# check if the job is still at certain stage
while [check 'AutoMLJobStatus' and 'AutoMLJobSecondaryStatus'] in job_description_response:
# update the job description response
= automl.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)
job_description_response # print the message the Autopilot job is in the stage ...
print([message])
# get a time step to check the status again
15)
sleep(print("Autopilot job complete...")
while 'AutoMLJobStatus' not in job_description_response.keys() and 'AutoMLJobSecondaryStatus' not in job_description_response.keys():
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response print('[INFO] Autopilot job has not yet started. Please wait. ')
# function `json.dumps` encodes JSON string for printing.
print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))
print('[INFO] Waiting for Autopilot job to start...')
15)
sleep(
print('[OK] AutoML job started.')
[OK] AutoML job started.
6.3 Review the SageMaker processing jobs
The Autopilot creates the required SageMaker processing jobs during the run:
- First processing job (data splitter) checks the data sanity, performs stratified shuffling and splits the data into training and validation.
- Second processing job (candidate generator) first streams through the data to compute statistics for the dataset. Then, uses these statistics to identify the problem type, and possible types of every column-predictor: numeric, categorical, natural language, etc.
6.4 Wait for the data analysis step to finish
Here we will use the same scheme as above to check the completion of the data analysis step. This step can be identified with the (primary) job status value InProgress
and secondary job status values Starting
and then AnalyzingData
.
%%time
= job_description_response['AutoMLJobStatus']
job_status = job_description_response['AutoMLJobSecondaryStatus']
job_sec_status
if job_status not in ('Stopped', 'Failed'):
while job_status in ('InProgress') and job_sec_status in ('Starting', 'AnalyzingData'):
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response = job_description_response['AutoMLJobStatus']
job_status = job_description_response['AutoMLJobSecondaryStatus']
job_sec_status print(job_status, job_sec_status)
15)
time.sleep(print('[OK] Data analysis phase completed.\n')
print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))
InProgress FeatureEngineering
[OK] Data analysis phase completed.
{
"AutoMLJobArn": "arn:aws:sagemaker:us-east-1:491783890788:automl-job/automl-dm-1675608463",
"AutoMLJobArtifacts": {
"CandidateDefinitionNotebookLocation": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/sagemaker-automl-candidates/automl-dm-1675608463-pr-1-210c7900f5854fdc89ce01c59579c034fb883/notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb",
"DataExplorationNotebookLocation": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/sagemaker-automl-candidates/automl-dm-1675608463-pr-1-210c7900f5854fdc89ce01c59579c034fb883/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb"
},
"AutoMLJobConfig": {
"CompletionCriteria": {
"MaxAutoMLJobRuntimeInSeconds": 7200,
"MaxCandidates": 3,
"MaxRuntimePerTrainingJobInSeconds": 1200
},
"SecurityConfig": {
"EnableInterContainerTrafficEncryption": false
}
},
"AutoMLJobName": "automl-dm-1675608463",
"AutoMLJobSecondaryStatus": "FeatureEngineering",
"AutoMLJobStatus": "InProgress",
"CreationTime": "2023-02-05 14:47:43.853000+00:00",
"GenerateCandidateDefinitionsOnly": false,
"InputDataConfig": [
{
"ChannelType": "training",
"ContentType": "text/csv;header=present",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://sagemaker-us-east-1-491783890788/auto-ml-input-data/womens_clothing_ecommerce_reviews_balanced_for_autopilot.csv"
}
},
"TargetAttributeName": "sentiment"
}
],
"LastModifiedTime": "2023-02-05 14:56:15.134000+00:00",
"OutputDataConfig": {
"S3OutputPath": "s3://sagemaker-us-east-1-491783890788/autopilot"
},
"ResolvedAttributes": {
"AutoMLJobObjective": {
"MetricName": "Accuracy"
},
"CompletionCriteria": {
"MaxAutoMLJobRuntimeInSeconds": 7200,
"MaxCandidates": 3,
"MaxRuntimePerTrainingJobInSeconds": 1200
},
"ProblemType": "MulticlassClassification"
},
"ResponseMetadata": {
"HTTPHeaders": {
"content-length": "1815",
"content-type": "application/x-amz-json-1.1",
"date": "Sun, 05 Feb 2023 14:56:16 GMT",
"x-amzn-requestid": "0faeba6e-7645-46d4-a41d-658ebc1167e8"
},
"HTTPStatusCode": 200,
"RequestId": "0faeba6e-7645-46d4-a41d-658ebc1167e8",
"RetryAttempts": 0
},
"RoleArn": "arn:aws:iam::491783890788:role/sagemaker-studio-vpc-firewall-us-east-1-sagemaker-execution-role"
}
CPU times: user 26.6 ms, sys: 43 µs, total: 26.7 ms
Wall time: 15.2 s
6.5 View generated notebooks
Once data analysis is complete, SageMaker AutoPilot generates two notebooks: * Data exploration * Candidate definition
Notebooks are included in the AutoML job artifacts generated during the run. Before checking the existence of the notebooks, we can check if the artifacts have been generated.
We will use the status check scheme described above. The generation of artifacts can be identified by existence of AutoMLJobArtifacts
element in the keys of the job description response.
# get the information about the running Autopilot job
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response
# keep in the while loop until the Autopilot job artifacts will be generated
while 'AutoMLJobArtifacts' not in job_description_response.keys():
# update the information about the running Autopilot job
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response print('[INFO] Autopilot job has not yet generated the artifacts. Please wait. ')
print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))
print('[INFO] Waiting for AutoMLJobArtifacts...')
15)
time.sleep(
print('[OK] AutoMLJobArtifacts generated.')
[OK] AutoMLJobArtifacts generated.
We need to wait for Autopilot to make the notebooks available.
We will again use the status check scheme described above. Notebooks creation can be identified by existence of DataExplorationNotebookLocation
element in the keys of the job_description_response['AutoMLJobArtifacts']
dictionary.
# get the information about the running Autopilot job
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response
# keep in the while loop until the notebooks will be created
while 'DataExplorationNotebookLocation' not in job_description_response['AutoMLJobArtifacts'].keys():
# update the information about the running Autopilot job
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response print('[INFO] Autopilot job has not yet generated the notebooks. Please wait. ')
print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))
print('[INFO] Waiting for DataExplorationNotebookLocation...')
15)
time.sleep(
print('[OK] DataExplorationNotebookLocation found.')
[OK] DataExplorationNotebookLocation found.
We could review the generated resources in S3 directly. We can find the notebooks in the folder notebooks
and download them by clicking on object Actions
/Object actions
-> Download as
/Download
.
7 Feature engineering
We will use the status check scheme described above. The feature engineering step can be identified with the (primary) job status value InProgress
and secondary job status value FeatureEngineering
.
%%time
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response = job_description_response['AutoMLJobStatus']
job_status = job_description_response['AutoMLJobSecondaryStatus']
job_sec_status print(job_status)
print(job_sec_status)
if job_status not in ('Stopped', 'Failed'):
while job_status in ('InProgress') and job_sec_status in ('FeatureEngineering'):
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response = job_description_response['AutoMLJobStatus']
job_status = job_description_response['AutoMLJobSecondaryStatus']
job_sec_status print(job_status, job_sec_status)
5)
time.sleep(print('[OK] Feature engineering phase completed.\n')
print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))
InProgress
FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress FeatureEngineering
InProgress ModelTuning
[OK] Feature engineering phase completed.
{
"AutoMLJobArn": "arn:aws:sagemaker:us-east-1:491783890788:automl-job/automl-dm-1675608463",
"AutoMLJobArtifacts": {
"CandidateDefinitionNotebookLocation": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/sagemaker-automl-candidates/automl-dm-1675608463-pr-1-210c7900f5854fdc89ce01c59579c034fb883/notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb",
"DataExplorationNotebookLocation": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/sagemaker-automl-candidates/automl-dm-1675608463-pr-1-210c7900f5854fdc89ce01c59579c034fb883/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb"
},
"AutoMLJobConfig": {
"CompletionCriteria": {
"MaxAutoMLJobRuntimeInSeconds": 7200,
"MaxCandidates": 3,
"MaxRuntimePerTrainingJobInSeconds": 1200
},
"SecurityConfig": {
"EnableInterContainerTrafficEncryption": false
}
},
"AutoMLJobName": "automl-dm-1675608463",
"AutoMLJobSecondaryStatus": "ModelTuning",
"AutoMLJobStatus": "InProgress",
"CreationTime": "2023-02-05 14:47:43.853000+00:00",
"GenerateCandidateDefinitionsOnly": false,
"InputDataConfig": [
{
"ChannelType": "training",
"ContentType": "text/csv;header=present",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://sagemaker-us-east-1-491783890788/auto-ml-input-data/womens_clothing_ecommerce_reviews_balanced_for_autopilot.csv"
}
},
"TargetAttributeName": "sentiment"
}
],
"LastModifiedTime": "2023-02-05 15:04:28.632000+00:00",
"OutputDataConfig": {
"S3OutputPath": "s3://sagemaker-us-east-1-491783890788/autopilot"
},
"ResolvedAttributes": {
"AutoMLJobObjective": {
"MetricName": "Accuracy"
},
"CompletionCriteria": {
"MaxAutoMLJobRuntimeInSeconds": 7200,
"MaxCandidates": 3,
"MaxRuntimePerTrainingJobInSeconds": 1200
},
"ProblemType": "MulticlassClassification"
},
"ResponseMetadata": {
"HTTPHeaders": {
"content-length": "1808",
"content-type": "application/x-amz-json-1.1",
"date": "Sun, 05 Feb 2023 15:04:28 GMT",
"x-amzn-requestid": "eecffe9b-ef5e-4e69-b4ca-d0b0b3a95be7"
},
"HTTPStatusCode": 200,
"RequestId": "eecffe9b-ef5e-4e69-b4ca-d0b0b3a95be7",
"RetryAttempts": 0
},
"RoleArn": "arn:aws:iam::491783890788:role/sagemaker-studio-vpc-firewall-us-east-1-sagemaker-execution-role"
}
CPU times: user 378 ms, sys: 49.3 ms, total: 427 ms
Wall time: 7min 7s
8 Model training and tuning
We can use the status check scheme described above. the model tuning step can be identified with the (primary) job status value InProgress
and secondary job status value ModelTuning
.
%%time
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response = job_description_response['AutoMLJobStatus']
job_status = job_description_response['AutoMLJobSecondaryStatus']
job_sec_status print(job_status)
print(job_sec_status)
if job_status not in ('Stopped', 'Failed'):
while job_status in ('InProgress') and job_sec_status in ('ModelTuning'):
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response = job_description_response['AutoMLJobStatus']
job_status = job_description_response['AutoMLJobSecondaryStatus']
job_sec_status print(job_status, job_sec_status)
5)
time.sleep(print('[OK] Model tuning phase completed.\n')
print(json.dumps(job_description_response, indent=4, sort_keys=True, default=str))
InProgress
ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress ModelTuning
InProgress MaxCandidatesReached
[OK] Model tuning phase completed.
{
"AutoMLJobArn": "arn:aws:sagemaker:us-east-1:491783890788:automl-job/automl-dm-1675608463",
"AutoMLJobArtifacts": {
"CandidateDefinitionNotebookLocation": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/sagemaker-automl-candidates/automl-dm-1675608463-pr-1-210c7900f5854fdc89ce01c59579c034fb883/notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb",
"DataExplorationNotebookLocation": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/sagemaker-automl-candidates/automl-dm-1675608463-pr-1-210c7900f5854fdc89ce01c59579c034fb883/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb"
},
"AutoMLJobConfig": {
"CompletionCriteria": {
"MaxAutoMLJobRuntimeInSeconds": 7200,
"MaxCandidates": 3,
"MaxRuntimePerTrainingJobInSeconds": 1200
},
"SecurityConfig": {
"EnableInterContainerTrafficEncryption": false
}
},
"AutoMLJobName": "automl-dm-1675608463",
"AutoMLJobSecondaryStatus": "MaxCandidatesReached",
"AutoMLJobStatus": "InProgress",
"BestCandidate": {
"CandidateName": "automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateProperties": {
"CandidateMetrics": [
{
"MetricName": "F1macro",
"Set": "Validation",
"StandardMetricName": "F1macro",
"Value": 0.6152600049972534
},
{
"MetricName": "PrecisionMacro",
"Set": "Validation",
"StandardMetricName": "PrecisionMacro",
"Value": 0.6158699989318848
},
{
"MetricName": "Accuracy",
"Set": "Validation",
"StandardMetricName": "Accuracy",
"Value": 0.6150500178337097
},
{
"MetricName": "BalancedAccuracy",
"Set": "Validation",
"StandardMetricName": "BalancedAccuracy",
"Value": 0.6150500178337097
},
{
"MetricName": "LogLoss",
"Set": "Validation",
"StandardMetricName": "LogLoss",
"Value": 0.843940019607544
},
{
"MetricName": "RecallMacro",
"Set": "Validation",
"StandardMetricName": "RecallMacro",
"Value": 0.6150500178337097
}
]
},
"CandidateStatus": "Completed",
"CandidateSteps": [
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:processing-job/automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepName": "automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepType": "AWS::SageMaker::ProcessingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a",
"CandidateStepName": "automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:transform-job/automl-dm-1675608463-dpp2-rpb-1-fd31c8d697b34a02a3472ce6d7557cd",
"CandidateStepName": "automl-dm-1675608463-dpp2-rpb-1-fd31c8d697b34a02a3472ce6d7557cd",
"CandidateStepType": "AWS::SageMaker::TransformJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateStepName": "automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
}
],
"CreationTime": "2023-02-05 15:06:01+00:00",
"EndTime": "2023-02-05 15:07:54+00:00",
"FinalAutoMLJobObjectiveMetric": {
"MetricName": "validation:accuracy",
"Value": 0.6150500178337097
},
"InferenceContainers": [
{
"Environment": {
"AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF": "1",
"AUTOML_TRANSFORM_MODE": "feature-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "application/x-recordio-protobuf",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a/output/model.tar.gz"
},
{
"Environment": {
"MAX_CONTENT_LENGTH": "20971520",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,probabilities"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.3-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/tuning/automl-dm--dpp2-xgb/automl-dm-1675608463sujxUg8wYQX0-002-657fba80/output/model.tar.gz"
},
{
"Environment": {
"AUTOML_TRANSFORM_MODE": "inverse-label-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_INPUT": "predicted_label",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,labels,probabilities",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a/output/model.tar.gz"
}
],
"LastModifiedTime": "2023-02-05 15:09:06.585000+00:00",
"ObjectiveStatus": "Succeeded"
},
"CreationTime": "2023-02-05 14:47:43.853000+00:00",
"GenerateCandidateDefinitionsOnly": false,
"InputDataConfig": [
{
"ChannelType": "training",
"ContentType": "text/csv;header=present",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://sagemaker-us-east-1-491783890788/auto-ml-input-data/womens_clothing_ecommerce_reviews_balanced_for_autopilot.csv"
}
},
"TargetAttributeName": "sentiment"
}
],
"LastModifiedTime": "2023-02-05 15:09:06.661000+00:00",
"OutputDataConfig": {
"S3OutputPath": "s3://sagemaker-us-east-1-491783890788/autopilot"
},
"ResolvedAttributes": {
"AutoMLJobObjective": {
"MetricName": "Accuracy"
},
"CompletionCriteria": {
"MaxAutoMLJobRuntimeInSeconds": 7200,
"MaxCandidates": 3,
"MaxRuntimePerTrainingJobInSeconds": 1200
},
"ProblemType": "MulticlassClassification"
},
"ResponseMetadata": {
"HTTPHeaders": {
"content-length": "5731",
"content-type": "application/x-amz-json-1.1",
"date": "Sun, 05 Feb 2023 15:09:06 GMT",
"x-amzn-requestid": "d6af6156-cd79-4bf4-8025-52c85f36afa3"
},
"HTTPStatusCode": 200,
"RequestId": "d6af6156-cd79-4bf4-8025-52c85f36afa3",
"RetryAttempts": 0
},
"RoleArn": "arn:aws:iam::491783890788:role/sagemaker-studio-vpc-firewall-us-east-1-sagemaker-execution-role"
}
CPU times: user 241 ms, sys: 24.9 ms, total: 266 ms
Wall time: 4min 12s
Finally, we can check the completion of the Autopilot job looking for the Completed
job status.
%%time
from pprint import pprint
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response
pprint(job_description_response)= job_description_response['AutoMLJobStatus']
job_status = job_description_response['AutoMLJobSecondaryStatus']
job_sec_status print('Job status: {}'.format(job_status))
print('Secondary job status: {}'.format(job_sec_status))
if job_status not in ('Stopped', 'Failed'):
while job_status not in ('Completed'):
= automl.describe_auto_ml_job(job_name=auto_ml_job_name)
job_description_response = job_description_response['AutoMLJobStatus']
job_status = job_description_response['AutoMLJobSecondaryStatus']
job_sec_status print('Job status: {}'.format(job_status))
print('Secondary job status: {}'.format(job_sec_status))
10)
time.sleep(print('[OK] Autopilot job completed.\n')
else:
print('Job status: {}'.format(job_status))
print('Secondary job status: {}'.format(job_status))
{'AutoMLJobArn': 'arn:aws:sagemaker:us-east-1:491783890788:automl-job/automl-dm-1675608463',
'AutoMLJobArtifacts': {'CandidateDefinitionNotebookLocation': 's3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/sagemaker-automl-candidates/automl-dm-1675608463-pr-1-210c7900f5854fdc89ce01c59579c034fb883/notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb',
'DataExplorationNotebookLocation': 's3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/sagemaker-automl-candidates/automl-dm-1675608463-pr-1-210c7900f5854fdc89ce01c59579c034fb883/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb'},
'AutoMLJobConfig': {'CompletionCriteria': {'MaxAutoMLJobRuntimeInSeconds': 7200,
'MaxCandidates': 3,
'MaxRuntimePerTrainingJobInSeconds': 1200},
'SecurityConfig': {'EnableInterContainerTrafficEncryption': False}},
'AutoMLJobName': 'automl-dm-1675608463',
'AutoMLJobSecondaryStatus': 'MergingAutoMLTaskReports',
'AutoMLJobStatus': 'InProgress',
'BestCandidate': {'CandidateName': 'automl-dm-1675608463sujxUg8wYQX0-002-657fba80',
'CandidateProperties': {'CandidateMetrics': [{'MetricName': 'F1macro',
'Set': 'Validation',
'StandardMetricName': 'F1macro',
'Value': 0.6152600049972534},
{'MetricName': 'PrecisionMacro',
'Set': 'Validation',
'StandardMetricName': 'PrecisionMacro',
'Value': 0.6158699989318848},
{'MetricName': 'Accuracy',
'Set': 'Validation',
'StandardMetricName': 'Accuracy',
'Value': 0.6150500178337097},
{'MetricName': 'BalancedAccuracy',
'Set': 'Validation',
'StandardMetricName': 'BalancedAccuracy',
'Value': 0.6150500178337097},
{'MetricName': 'LogLoss',
'Set': 'Validation',
'StandardMetricName': 'LogLoss',
'Value': 0.843940019607544},
{'MetricName': 'RecallMacro',
'Set': 'Validation',
'StandardMetricName': 'RecallMacro',
'Value': 0.6150500178337097}]},
'CandidateStatus': 'Completed',
'CandidateSteps': [{'CandidateStepArn': 'arn:aws:sagemaker:us-east-1:491783890788:processing-job/automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5',
'CandidateStepName': 'automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5',
'CandidateStepType': 'AWS::SageMaker::ProcessingJob'},
{'CandidateStepArn': 'arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a',
'CandidateStepName': 'automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a',
'CandidateStepType': 'AWS::SageMaker::TrainingJob'},
{'CandidateStepArn': 'arn:aws:sagemaker:us-east-1:491783890788:transform-job/automl-dm-1675608463-dpp2-rpb-1-fd31c8d697b34a02a3472ce6d7557cd',
'CandidateStepName': 'automl-dm-1675608463-dpp2-rpb-1-fd31c8d697b34a02a3472ce6d7557cd',
'CandidateStepType': 'AWS::SageMaker::TransformJob'},
{'CandidateStepArn': 'arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463sujxUg8wYQX0-002-657fba80',
'CandidateStepName': 'automl-dm-1675608463sujxUg8wYQX0-002-657fba80',
'CandidateStepType': 'AWS::SageMaker::TrainingJob'}],
'CreationTime': datetime.datetime(2023, 2, 5, 15, 6, 1, tzinfo=tzlocal()),
'EndTime': datetime.datetime(2023, 2, 5, 15, 7, 54, tzinfo=tzlocal()),
'FinalAutoMLJobObjectiveMetric': {'MetricName': 'validation:accuracy',
'Value': 0.6150500178337097},
'InferenceContainers': [{'Environment': {'AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF': '1',
'AUTOML_TRANSFORM_MODE': 'feature-transform',
'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': 'application/x-recordio-protobuf',
'SAGEMAKER_PROGRAM': 'sagemaker_serve',
'SAGEMAKER_SUBMIT_DIRECTORY': '/opt/ml/model/code'},
'Image': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3',
'ModelDataUrl': 's3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a/output/model.tar.gz'},
{'Environment': {'MAX_CONTENT_LENGTH': '20971520',
'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': 'text/csv',
'SAGEMAKER_INFERENCE_OUTPUT': 'predicted_label',
'SAGEMAKER_INFERENCE_SUPPORTED': 'predicted_label,probability,probabilities'},
'Image': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.3-1-cpu-py3',
'ModelDataUrl': 's3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/tuning/automl-dm--dpp2-xgb/automl-dm-1675608463sujxUg8wYQX0-002-657fba80/output/model.tar.gz'},
{'Environment': {'AUTOML_TRANSFORM_MODE': 'inverse-label-transform',
'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': 'text/csv',
'SAGEMAKER_INFERENCE_INPUT': 'predicted_label',
'SAGEMAKER_INFERENCE_OUTPUT': 'predicted_label',
'SAGEMAKER_INFERENCE_SUPPORTED': 'predicted_label,probability,labels,probabilities',
'SAGEMAKER_PROGRAM': 'sagemaker_serve',
'SAGEMAKER_SUBMIT_DIRECTORY': '/opt/ml/model/code'},
'Image': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3',
'ModelDataUrl': 's3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a/output/model.tar.gz'}],
'LastModifiedTime': datetime.datetime(2023, 2, 5, 15, 9, 6, 585000, tzinfo=tzlocal()),
'ObjectiveStatus': 'Succeeded'},
'CreationTime': datetime.datetime(2023, 2, 5, 14, 47, 43, 853000, tzinfo=tzlocal()),
'GenerateCandidateDefinitionsOnly': False,
'InputDataConfig': [{'ChannelType': 'training',
'ContentType': 'text/csv;header=present',
'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',
'S3Uri': 's3://sagemaker-us-east-1-491783890788/auto-ml-input-data/womens_clothing_ecommerce_reviews_balanced_for_autopilot.csv'}},
'TargetAttributeName': 'sentiment'}],
'LastModifiedTime': datetime.datetime(2023, 2, 5, 15, 9, 7, 862000, tzinfo=tzlocal()),
'OutputDataConfig': {'S3OutputPath': 's3://sagemaker-us-east-1-491783890788/autopilot'},
'ResolvedAttributes': {'AutoMLJobObjective': {'MetricName': 'Accuracy'},
'CompletionCriteria': {'MaxAutoMLJobRuntimeInSeconds': 7200,
'MaxCandidates': 3,
'MaxRuntimePerTrainingJobInSeconds': 1200},
'ProblemType': 'MulticlassClassification'},
'ResponseMetadata': {'HTTPHeaders': {'content-length': '5735',
'content-type': 'application/x-amz-json-1.1',
'date': 'Sun, 05 Feb 2023 15:09:27 GMT',
'x-amzn-requestid': '5577738e-56f0-40ea-8ae0-9f4f512ecae8'},
'HTTPStatusCode': 200,
'RequestId': '5577738e-56f0-40ea-8ae0-9f4f512ecae8',
'RetryAttempts': 0},
'RoleArn': 'arn:aws:iam::491783890788:role/sagemaker-studio-vpc-firewall-us-east-1-sagemaker-execution-role'}
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: InProgress
Secondary job status: MergingAutoMLTaskReports
Job status: Completed
Secondary job status: Completed
[OK] Autopilot job completed.
CPU times: user 719 ms, sys: 63.7 ms, total: 783 ms
Wall time: 7min 59s
8.1 Compare model candidates
Once model tuning is complete, we can view all the candidates (pipeline evaluations with different hyperparameter combinations) that were explored by AutoML and sort them by their final performance metric.
We will list candidates generated by Autopilot sorted by accuracy from highest to lowest.
To do this we will use the list_candidates
function passing the Autopilot job name auto_ml_job_name
with the accuracy field FinalObjectiveMetricValue
. It returns the list of candidates with the information about them.
= automl.list_candidates(
candidates =..., # Autopilot job name
job_name='...' # accuracy field name
sort_by )
= automl.list_candidates(
candidates =auto_ml_job_name,
job_name='FinalObjectiveMetricValue'
sort_by )
We can review the response syntax and response elements of the function list_candidates
in the documentation. Now let’s put the candidate existence check into the loop:
while candidates == []:
= automl.list_candidates(job_name=auto_ml_job_name)
candidates print('[INFO] Autopilot job is generating the candidates. Please wait.')
10)
time.sleep(
print('[OK] Candidates generated.')
[OK] Candidates generated.
The information about each of the candidates is in the dictionary with the following keys:
print(candidates[0].keys())
dict_keys(['CandidateName', 'FinalAutoMLJobObjectiveMetric', 'ObjectiveStatus', 'CandidateSteps', 'CandidateStatus', 'InferenceContainers', 'CreationTime', 'EndTime', 'LastModifiedTime', 'CandidateProperties'])
CandidateName
contains the candidate name and the FinalAutoMLJobObjectiveMetric
element contains the metric information which can be used to identify the best candidate later. Let’s check that they were generated.
while 'CandidateName' not in candidates[0]:
= automl.list_candidates(job_name=auto_ml_job_name)
candidates print('[INFO] Autopilot job is generating CandidateName. Please wait. ')
10)
sleep(
print('[OK] CandidateName generated.')
[OK] CandidateName generated.
while 'FinalAutoMLJobObjectiveMetric' not in candidates[0]:
= automl.list_candidates(job_name=auto_ml_job_name)
candidates print('[INFO] Autopilot job is generating FinalAutoMLJobObjectiveMetric. Please wait. ')
10)
sleep(
print('[OK] FinalAutoMLJobObjectiveMetric generated.')
[OK] FinalAutoMLJobObjectiveMetric generated.
print(json.dumps(candidates, indent=4, sort_keys=True, default=str))
[
{
"CandidateName": "automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateProperties": {
"CandidateArtifactLocations": {
"Explainability": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/documentation/explainability/output",
"ModelInsights": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/documentation/model_monitor/output"
},
"CandidateMetrics": [
{
"MetricName": "F1macro",
"Set": "Validation",
"StandardMetricName": "F1macro",
"Value": 0.6152600049972534
},
{
"MetricName": "PrecisionMacro",
"Set": "Validation",
"StandardMetricName": "PrecisionMacro",
"Value": 0.6158699989318848
},
{
"MetricName": "Accuracy",
"Set": "Validation",
"StandardMetricName": "Accuracy",
"Value": 0.6150500178337097
},
{
"MetricName": "BalancedAccuracy",
"Set": "Validation",
"StandardMetricName": "BalancedAccuracy",
"Value": 0.6150500178337097
},
{
"MetricName": "LogLoss",
"Set": "Validation",
"StandardMetricName": "LogLoss",
"Value": 0.843940019607544
},
{
"MetricName": "RecallMacro",
"Set": "Validation",
"StandardMetricName": "RecallMacro",
"Value": 0.6150500178337097
}
]
},
"CandidateStatus": "Completed",
"CandidateSteps": [
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:processing-job/automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepName": "automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepType": "AWS::SageMaker::ProcessingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a",
"CandidateStepName": "automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:transform-job/automl-dm-1675608463-dpp2-rpb-1-fd31c8d697b34a02a3472ce6d7557cd",
"CandidateStepName": "automl-dm-1675608463-dpp2-rpb-1-fd31c8d697b34a02a3472ce6d7557cd",
"CandidateStepType": "AWS::SageMaker::TransformJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateStepName": "automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
}
],
"CreationTime": "2023-02-05 15:06:01+00:00",
"EndTime": "2023-02-05 15:07:54+00:00",
"FinalAutoMLJobObjectiveMetric": {
"MetricName": "validation:accuracy",
"Value": 0.6150500178337097
},
"InferenceContainers": [
{
"Environment": {
"AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF": "1",
"AUTOML_TRANSFORM_MODE": "feature-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "application/x-recordio-protobuf",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a/output/model.tar.gz"
},
{
"Environment": {
"MAX_CONTENT_LENGTH": "20971520",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,probabilities"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.3-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/tuning/automl-dm--dpp2-xgb/automl-dm-1675608463sujxUg8wYQX0-002-657fba80/output/model.tar.gz"
},
{
"Environment": {
"AUTOML_TRANSFORM_MODE": "inverse-label-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_INPUT": "predicted_label",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,labels,probabilities",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a/output/model.tar.gz"
}
],
"LastModifiedTime": "2023-02-05 15:09:06.585000+00:00",
"ObjectiveStatus": "Succeeded"
},
{
"CandidateName": "automl-dm-1675608463sujxUg8wYQX0-001-5d775b4b",
"CandidateProperties": {
"CandidateMetrics": [
{
"MetricName": "F1macro",
"Set": "Validation",
"StandardMetricName": "F1macro",
"Value": 0.6157000064849854
},
{
"MetricName": "PrecisionMacro",
"Set": "Validation",
"StandardMetricName": "PrecisionMacro",
"Value": 0.6168199777603149
},
{
"MetricName": "Accuracy",
"Set": "Validation",
"StandardMetricName": "Accuracy",
"Value": 0.6149100065231323
},
{
"MetricName": "BalancedAccuracy",
"Set": "Validation",
"StandardMetricName": "BalancedAccuracy",
"Value": 0.6149100065231323
},
{
"MetricName": "LogLoss",
"Set": "Validation",
"StandardMetricName": "LogLoss",
"Value": 0.8395400047302246
},
{
"MetricName": "RecallMacro",
"Set": "Validation",
"StandardMetricName": "RecallMacro",
"Value": 0.6149100065231323
}
]
},
"CandidateStatus": "Completed",
"CandidateSteps": [
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:processing-job/automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepName": "automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepType": "AWS::SageMaker::ProcessingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463-dpp0-1-b325f697683a4300957f609440a1906660e",
"CandidateStepName": "automl-dm-1675608463-dpp0-1-b325f697683a4300957f609440a1906660e",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:transform-job/automl-dm-1675608463-dpp0-rpb-1-57a73878e9f24b9dbe23bf82b200317",
"CandidateStepName": "automl-dm-1675608463-dpp0-rpb-1-57a73878e9f24b9dbe23bf82b200317",
"CandidateStepType": "AWS::SageMaker::TransformJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463sujxUg8wYQX0-001-5d775b4b",
"CandidateStepName": "automl-dm-1675608463sujxUg8wYQX0-001-5d775b4b",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
}
],
"CreationTime": "2023-02-05 15:05:53+00:00",
"EndTime": "2023-02-05 15:07:46+00:00",
"FinalAutoMLJobObjectiveMetric": {
"MetricName": "validation:accuracy",
"Value": 0.6149100065231323
},
"InferenceContainers": [
{
"Environment": {
"AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF": "1",
"AUTOML_TRANSFORM_MODE": "feature-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "application/x-recordio-protobuf",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp0-1-b325f697683a4300957f609440a1906660e/output/model.tar.gz"
},
{
"Environment": {
"MAX_CONTENT_LENGTH": "20971520",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,probabilities"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.3-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/tuning/automl-dm--dpp0-xgb/automl-dm-1675608463sujxUg8wYQX0-001-5d775b4b/output/model.tar.gz"
},
{
"Environment": {
"AUTOML_TRANSFORM_MODE": "inverse-label-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_INPUT": "predicted_label",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,labels,probabilities",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp0-1-b325f697683a4300957f609440a1906660e/output/model.tar.gz"
}
],
"LastModifiedTime": "2023-02-05 15:09:06.515000+00:00",
"ObjectiveStatus": "Succeeded"
},
{
"CandidateName": "automl-dm-1675608463sujxUg8wYQX0-003-a2d5723e",
"CandidateProperties": {
"CandidateMetrics": [
{
"MetricName": "F1macro",
"Set": "Validation",
"StandardMetricName": "F1macro",
"Value": 0.39879000186920166
},
{
"MetricName": "PrecisionMacro",
"Set": "Validation",
"StandardMetricName": "PrecisionMacro",
"Value": 0.39879998564720154
},
{
"MetricName": "Accuracy",
"Set": "Validation",
"StandardMetricName": "Accuracy",
"Value": 0.3990600109100342
},
{
"MetricName": "BalancedAccuracy",
"Set": "Validation",
"StandardMetricName": "BalancedAccuracy",
"Value": 0.3990600109100342
},
{
"MetricName": "LogLoss",
"Set": "Validation",
"StandardMetricName": "LogLoss",
"Value": 1.2047499418258667
},
{
"MetricName": "RecallMacro",
"Set": "Validation",
"StandardMetricName": "RecallMacro",
"Value": 0.3990600109100342
}
]
},
"CandidateStatus": "Completed",
"CandidateSteps": [
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:processing-job/automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepName": "automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepType": "AWS::SageMaker::ProcessingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463-dpp1-1-8b1885df2d2546b0abb07a329d1fb466b29",
"CandidateStepName": "automl-dm-1675608463-dpp1-1-8b1885df2d2546b0abb07a329d1fb466b29",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:transform-job/automl-dm-1675608463-dpp1-csv-1-24672b27ae4440179a3b7b3070f05ec",
"CandidateStepName": "automl-dm-1675608463-dpp1-csv-1-24672b27ae4440179a3b7b3070f05ec",
"CandidateStepType": "AWS::SageMaker::TransformJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463sujxUg8wYQX0-003-a2d5723e",
"CandidateStepName": "automl-dm-1675608463sujxUg8wYQX0-003-a2d5723e",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
}
],
"CreationTime": "2023-02-05 15:06:13+00:00",
"EndTime": "2023-02-05 15:08:50+00:00",
"FinalAutoMLJobObjectiveMetric": {
"MetricName": "validation:accuracy",
"Value": 0.3990600109100342
},
"InferenceContainers": [
{
"Environment": {
"AUTOML_TRANSFORM_MODE": "feature-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "application/x-recordio-protobuf",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp1-1-8b1885df2d2546b0abb07a329d1fb466b29/output/model.tar.gz"
},
{
"Environment": {
"MAX_CONTENT_LENGTH": "20971520",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,probabilities"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.3-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/tuning/automl-dm--dpp1-xgb/automl-dm-1675608463sujxUg8wYQX0-003-a2d5723e/output/model.tar.gz"
},
{
"Environment": {
"AUTOML_TRANSFORM_MODE": "inverse-label-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_INPUT": "predicted_label",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,labels,probabilities",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp1-1-8b1885df2d2546b0abb07a329d1fb466b29/output/model.tar.gz"
}
],
"LastModifiedTime": "2023-02-05 15:09:06.513000+00:00",
"ObjectiveStatus": "Succeeded"
}
]
You can print the names of the candidates with their metric values:
print("metric " + str(candidates[0]['FinalAutoMLJobObjectiveMetric']['MetricName']))
for index, candidate in enumerate(candidates):
print(str(index) + " "
+ candidate['CandidateName'] + " "
+ str(candidate['FinalAutoMLJobObjectiveMetric']['Value']))
metric validation:accuracy
0 automl-dm-1675608463sujxUg8wYQX0-002-657fba80 0.6150500178337097
1 automl-dm-1675608463sujxUg8wYQX0-001-5d775b4b 0.6149100065231323
2 automl-dm-1675608463sujxUg8wYQX0-003-a2d5723e 0.3990600109100342
8.2 Review best candidate
Now that we have successfully completed the Autopilot job on the dataset and visualized the trials, we can get the information about the best candidate model and review it.
We can use the best_candidate
function passing the Autopilot job name. Note: This function will give an error if candidates have not been generated.
= automl.list_candidates(job_name=auto_ml_job_name)
candidates
if candidates != []:
= automl.best_candidate(
best_candidate =auto_ml_job_name
job_name
)print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))
{
"CandidateName": "automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateProperties": {
"CandidateArtifactLocations": {
"Explainability": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/documentation/explainability/output",
"ModelInsights": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/documentation/model_monitor/output"
},
"CandidateMetrics": [
{
"MetricName": "F1macro",
"Set": "Validation",
"StandardMetricName": "F1macro",
"Value": 0.6152600049972534
},
{
"MetricName": "PrecisionMacro",
"Set": "Validation",
"StandardMetricName": "PrecisionMacro",
"Value": 0.6158699989318848
},
{
"MetricName": "Accuracy",
"Set": "Validation",
"StandardMetricName": "Accuracy",
"Value": 0.6150500178337097
},
{
"MetricName": "BalancedAccuracy",
"Set": "Validation",
"StandardMetricName": "BalancedAccuracy",
"Value": 0.6150500178337097
},
{
"MetricName": "LogLoss",
"Set": "Validation",
"StandardMetricName": "LogLoss",
"Value": 0.843940019607544
},
{
"MetricName": "RecallMacro",
"Set": "Validation",
"StandardMetricName": "RecallMacro",
"Value": 0.6150500178337097
}
]
},
"CandidateStatus": "Completed",
"CandidateSteps": [
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:processing-job/automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepName": "automl-dm-1675608463-db-1-ec0fb37f4b964d1a9485854c252aa8f0683f5",
"CandidateStepType": "AWS::SageMaker::ProcessingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a",
"CandidateStepName": "automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:transform-job/automl-dm-1675608463-dpp2-rpb-1-fd31c8d697b34a02a3472ce6d7557cd",
"CandidateStepName": "automl-dm-1675608463-dpp2-rpb-1-fd31c8d697b34a02a3472ce6d7557cd",
"CandidateStepType": "AWS::SageMaker::TransformJob"
},
{
"CandidateStepArn": "arn:aws:sagemaker:us-east-1:491783890788:training-job/automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateStepName": "automl-dm-1675608463sujxUg8wYQX0-002-657fba80",
"CandidateStepType": "AWS::SageMaker::TrainingJob"
}
],
"CreationTime": "2023-02-05 15:06:01+00:00",
"EndTime": "2023-02-05 15:07:54+00:00",
"FinalAutoMLJobObjectiveMetric": {
"MetricName": "validation:accuracy",
"Value": 0.6150500178337097
},
"InferenceContainers": [
{
"Environment": {
"AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF": "1",
"AUTOML_TRANSFORM_MODE": "feature-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "application/x-recordio-protobuf",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a/output/model.tar.gz"
},
{
"Environment": {
"MAX_CONTENT_LENGTH": "20971520",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,probabilities"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.3-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/tuning/automl-dm--dpp2-xgb/automl-dm-1675608463sujxUg8wYQX0-002-657fba80/output/model.tar.gz"
},
{
"Environment": {
"AUTOML_TRANSFORM_MODE": "inverse-label-transform",
"SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
"SAGEMAKER_INFERENCE_INPUT": "predicted_label",
"SAGEMAKER_INFERENCE_OUTPUT": "predicted_label",
"SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,labels,probabilities",
"SAGEMAKER_PROGRAM": "sagemaker_serve",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
},
"Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
"ModelDataUrl": "s3://sagemaker-us-east-1-491783890788/autopilot/automl-dm-1675608463/data-processor-models/automl-dm-1675608463-dpp2-1-f8c17915c5bd4efbb7862d503ce9d50304a/output/model.tar.gz"
}
],
"LastModifiedTime": "2023-02-05 15:09:06.585000+00:00",
"ObjectiveStatus": "Succeeded"
}
Check the existence of the candidate name for the best candidate.
while 'CandidateName' not in best_candidate:
= automl.best_candidate(job_name=auto_ml_job_name)
best_candidate print('[INFO] Autopilot Job is generating BestCandidate CandidateName. Please wait. ')
print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))
10)
sleep(
print('[OK] BestCandidate CandidateName generated.')
[OK] BestCandidate CandidateName generated.
Check the existence of the metric value for the best candidate.
while 'FinalAutoMLJobObjectiveMetric' not in best_candidate:
= automl.best_candidate(job_name=auto_ml_job_name)
best_candidate print('[INFO] Autopilot Job is generating BestCandidate FinalAutoMLJobObjectiveMetric. Please wait. ')
print(json.dumps(best_candidate, indent=4, sort_keys=True, default=str))
10)
sleep(
print('[OK] BestCandidate FinalAutoMLJobObjectiveMetric generated.')
[OK] BestCandidate FinalAutoMLJobObjectiveMetric generated.
Print the information about the best candidate:
= best_candidate['CandidateName']
best_candidate_identifier print("Candidate name: " + best_candidate_identifier)
print("Metric name: " + best_candidate['FinalAutoMLJobObjectiveMetric']['MetricName'])
print("Metric value: " + str(best_candidate['FinalAutoMLJobObjectiveMetric']['Value']))
Candidate name: automl-dm-1675608463sujxUg8wYQX0-002-657fba80
Metric name: validation:accuracy
Metric value: 0.6150500178337097
9 Review all output in S3 bucket
We can see the artifacts generated by Autopilot including the following:
data-processor-models/ # "models" learned to transform raw data into features
documentation/ # explainability and other documentation about your model
preprocessed-data/ # data for train and validation
sagemaker-automl-candidates/ # candidate models which autopilot compares
transformed-data/ # candidate-specific data for train and validation
tuning/ # candidate-specific tuning results
validations/ # validation results
10 Deploy and test best candidate model
10.1 Deploy best candidate model
While batch transformations are supported, we will deploy our model as a REST Endpoint in this example.
First, we need to customize the inference response. The inference containers generated by SageMaker Autopilot allow you to select the response content for predictions. By default the inference containers are configured to generate the predicted_label
. But we can add probability
into the list of inference response keys.
= ['predicted_label', 'probability'] inference_response_keys
Now we will create a SageMaker endpoint from the best candidate generated by Autopilot. Wait for SageMaker to deploy the endpoint.
= automl.deploy(
autopilot_model =1,
initial_instance_count='ml.m5.large',
instance_type=best_candidate,
candidate=inference_response_keys,
inference_response_keys=sagemaker.predictor.Predictor,
predictor_cls=sagemaker.serializers.JSONSerializer(),
serializer=sagemaker.deserializers.JSONDeserializer()
deserializer
)
print('\nEndpoint name: {}'.format(autopilot_model.endpoint_name))
-------!
Endpoint name: sagemaker-sklearn-automl-2023-02-05-15-18-52-694
10.2 Test the model
Let’s invoke a few predictions for the actual reviews using the deployed endpoint to test our model.
#sm_runtime = boto3.client('sagemaker-runtime')
= ['This product is great!',
review_list 'OK, but not great.',
'This is not the right product.']
for review in review_list:
# remove commas from the review since we're passing the inputs as a CSV
= review.replace(",", "")
review
= sm_runtime.invoke_endpoint(
response =autopilot_model.endpoint_name, # endpoint name
EndpointName='text/csv', # type of input data
ContentType='text/csv', # type of the inference in the response
Accept=review # review text
Body
)
=response['Body'].read().decode('utf-8').strip().split(',')
response_body
print('Review: ', review, ' Predicated class: {}'.format(response_body[0]))
print("(-1 = Negative, 0=Neutral, 1=Positive)")
Review: This product is great! Predicated class: 1
Review: OK but not great. Predicated class: 0
Review: This is not the right product. Predicated class: -1
(-1 = Negative, 0=Neutral, 1=Positive)
So we used Amazon SageMaker Autopilot to automatically find the best model, hyper-parameters, and feature-engineering scripts for our dataset. Autopilot uses a uniquely-transparent approach to AutoML by generating re-usable Python scripts and notebooks.
11 Acknowledgements
I’d like to express my thanks to the great Deep Learning AI Practical Data Science on AWS Specialisation Course which i completed, and acknowledge the use of some images and other materials from the training course in this article.