AWS SageMaker: Train your machine in an easy way!

  • by

AWS SageMaker

  • AWS SageMaker is a fully managed, modular machine learning service to build, train, and deploy machine learning (ML) models quickly.
  • SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
  • AWS SageMaker is designed for high availability with no maintenance windows or scheduled downtime.
  • SageMaker APIs run in Amazon’s proven, high-availability data centers, with service stack replication configured across three facilities in each AWS region to provide fault tolerance in the event of a server failure or AZ outage
  • AWS SageMaker provides a full end-to-end workflow, but users can continue to use their existing tools with SageMaker.
  • SageMaker supports Jupyter notebooks.
  • AWS SageMaker allows users to select the number and type of instance used for the hosted notebook, training & model hosting.

AWS SageMaker Machine Learning Working Principle

AWS SageMaker Guided Video Published By AWS
  • Involves exploring and pre-processing, or “wrangling,” example data before using it for model training.
  • To pre-process the data, you typically do the following:
    • Fetch the data
    • Clean the data
    • Prepare or transform the data

Train the Model

  • Model training includes both training and evaluating the model, as follows:
    • Training the model
      • Needs an algorithm, which depends on a number of factors.
      • Need compute resources for training.
    • Evaluating the model
      • determine whether the accuracy of the inferences is acceptable.

Training Data Format – File mode vs Pipe mode

  • Most Amazon SageMaker algorithms work best when using the optimized protobuf recordIO format for the training data.
  • Using RecordIO format allows algorithms to take advantage of Pipe mode when training the algorithms that support it.
  • File mode loads all of the data from S3 to the training instance volumes
  • In Pipe mode, the training job streams data directly from S3.
  • Streaming can provide faster start times for training jobs and better throughput.
  • With Pipe mode, reduce the size of the EBS volumes for the training instances is also reduced Pipe mode needs only enough disk space to store your final model artifacts.
  • File mode needs disk space to store both the final model artifacts and the full training data set.

Build the Model

  • SageMaker provides several built-in machine learning algorithms that can be used for a variety of problem types
  • Write a custom training script in a machine learning framework that AWS SageMaker supports, and use one of the pre-built framework containers to run it in SageMaker.
  • Bring your own algorithm or model to train or host in SageMaker.
    • SageMaker provides pre-built Docker images for its built-in algorithms and the supported deep learning frameworks used for training and inference
    • By using containers, machine learning algorithms can be trained and deploy models quickly and reliably at any scale.
  • Use an algorithm that you subscribe to from AWS Marketplace.

Deploy the model

  • Re-engineer a model before integrating it with application and deploy it.
  • It supports both hosting services and batch transform.

Amazon SageMaker Autopilot

  • Automatically create machine learning models with full visibility.
  • Autopilot automatically trains and tunes the best machine learning models for classification or regression.
  • AWS SageMaker Autopilot allows you to automatically build machine learning models without compromises.

Autopilot working principle:

  • Benefits
    • Generate high quality models quickly
    • Maintain visibility and control
    • Easy to deploy
  • Use cases
    • Price Predictions
    • Churn Prediction
    • Risk Assessment

Features

  • provides an HTTPS endpoint where the machine learning model is available to provide inferences.
  • supports Canary deployment using Production Variant and deploying multiple variants of a model to the same SageMaker HTTPS endpoint.
  • supports automatic scaling for production variants. Automatic scaling dynamically adjusts the number of instances provisioned for a production variant in response to changes in your workload.

It is advisable to refer AWS documentation on AWS Sagemaker while building your concepts.

To explore other AWS services, you can click here.