Deployment Methods

Magemaker offers multiple ways to deploy your models to AWS, GCP and Azure. Choose the method that best fits your workflow.

Interactive Deployment

When you run the magemaker --cloud [aws|gcp|azure|all] command, you’ll get an interactive menu that walks you through the deployment process:

magemaker --cloud [aws|gcp|azure|all]

This method is great for:

  • First-time users
  • Exploring available models
  • Testing different configurations

YAML-based Deployment

For reproducible deployments and CI/CD integration, use YAML configuration files:

magemaker --deploy .magemaker_config/your-model.yaml

This is recommended for:

  • Production deployments
  • CI/CD pipelines
  • Infrastructure as Code (IaC)
  • Team collaborations

Multi-Cloud Deployment

Magemaker supports deployment to AWS SageMaker, GCP Vertex AI, and Azure ML. Here’s how to deploy the same model (facebook/opt-125m) to different cloud providers:

AWS (SageMaker)

deployment: !Deployment
  destination: aws
  endpoint_name: opt-125m-aws
  instance_count: 1
  instance_type: ml.m5.xlarge

models:
  - !Model
    id: facebook/opt-125m
    source: huggingface

GCP (Vertex AI)

deployment: !Deployment
  destination: gcp
  endpoint_name: opt-125m-gcp
  instance_count: 1
  machine_type: n1-standard-4
  accelerator_type: NVIDIA_TESLA_T4
  accelerator_count: 1

models:
  - !Model
    id: facebook/opt-125m
    source: huggingface

Azure ML

deployment: !Deployment
  destination: azure
  endpoint_name: opt-125m-azure
  instance_count: 1
  instance_type: Standard_DS3_v2

models:
  - !Model
    id: facebook-opt-125m
    source: huggingface

YAML Configuration Reference

Basic Deployment

deployment: !Deployment
  destination: aws
  endpoint_name: test-bert-uncased
  instance_count: 1
  instance_type: ml.m5.xlarge

models:
  - !Model
    id: google-bert/bert-base-uncased
    source: huggingface

Advanced Configuration

deployment: !Deployment
  destination: aws
  endpoint_name: test-llama3-8b
  instance_count: 1
  instance_type: ml.g5.12xlarge
  num_gpus: 4

models:
  - !Model
    id: meta-llama/Meta-Llama-3-8B-Instruct
    source: huggingface
    predict:
      temperature: 0.9
      top_p: 0.9
      top_k: 20
      max_new_tokens: 250

Cloud-Specific Instance Types

AWS SageMaker Types

Choose your instance type based on your model’s requirements:

ml.m5.xlarge

Good for smaller models like BERT-base

  • 4 vCPU
  • 16 GB Memory
  • Available in free tier

ml.g5.12xlarge

Required for larger models like LLaMA

  • 48 vCPU
  • 192 GB Memory
  • 4 NVIDIA A10G GPUs

Remember to deactivate unused endpoints to avoid unnecessary charges!

GCP Vertex AI Types

n1-standard-4

Good for smaller models

  • 4 vCPU
  • 15 GB Memory
  • Cost-effective option

a2-highgpu-1g

For larger models

  • 12 vCPU
  • 85 GB Memory
  • 1 NVIDIA A100 GPU

Azure ML Types

Standard_DS3_v2

Good for smaller models

  • 4 vCPU
  • 14 GB Memory
  • Balanced performance

Standard_NC6s_v3

For GPU workloads

  • 6 vCPU
  • 112 GB Memory
  • 1 NVIDIA V100 GPU

Deployment Best Practices

  1. Use meaningful endpoint names that include:

    • Model name/version
    • Environment (dev/staging/prod)
    • Team identifier
  2. Start with smaller instance types and scale up as needed

  3. Always version your YAML configurations

  4. Set up monitoring and alerting for your endpoints

Make sure you setup budget monitory and alerts to avoid unexpected charges.

Troubleshooting Deployments

Common issues and their solutions:

  1. Deployment Timeout

    • Check instance quota limits
    • Verify network connectivity
  2. Instance Not Available

    • Try a different region
    • Request quota increase
    • Use an alternative instance type
  3. Model Loading Failure

    • Verify model ID and version
    • Check instance memory requirements
    • Validate Hugging Face token if required
    • Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue.