Deployment

Deployment Methods

Magemaker offers multiple ways to deploy your models to AWS, GCP and Azure. Choose the method that best fits your workflow.

Interactive Deployment

When you run the magemaker --cloud [aws|gcp|azure|all] command, you’ll get an interactive menu that walks you through the deployment process:

magemaker --cloud [aws|gcp|azure|all]

This method is great for:

First-time users
Exploring available models
Testing different configurations

YAML-based Deployment

For reproducible deployments and CI/CD integration, use YAML configuration files:

magemaker --deploy .magemaker_config/your-model.yaml

This is recommended for:

Production deployments
CI/CD pipelines
Infrastructure as Code (IaC)
Team collaborations

Multi-Cloud Deployment

Magemaker supports deployment to AWS SageMaker, GCP Vertex AI, and Azure ML. Here’s how to deploy the same model (facebook/opt-125m) to different cloud providers:

AWS (SageMaker)

deployment: !Deployment
  destination: aws
  endpoint_name: opt-125m-aws
  instance_count: 1
  instance_type: ml.m5.xlarge

models:
  - !Model
    id: facebook/opt-125m
    source: huggingface

GCP (Vertex AI)

deployment: !Deployment
  destination: gcp
  endpoint_name: opt-125m-gcp
  instance_count: 1
  machine_type: n1-standard-4
  accelerator_type: NVIDIA_TESLA_T4
  accelerator_count: 1

models:
  - !Model
    id: facebook/opt-125m
    source: huggingface

Azure ML

deployment: !Deployment
  destination: azure
  endpoint_name: opt-125m-azure
  instance_count: 1
  instance_type: Standard_DS3_v2

models:
  - !Model
    id: facebook-opt-125m
    source: huggingface

YAML Configuration Reference

Basic Deployment

deployment: !Deployment
  destination: aws
  endpoint_name: test-bert-uncased
  instance_count: 1
  instance_type: ml.m5.xlarge

models:
  - !Model
    id: google-bert/bert-base-uncased
    source: huggingface

Advanced Configuration

deployment: !Deployment
  destination: aws
  endpoint_name: test-llama3-8b
  instance_count: 1
  instance_type: ml.g5.12xlarge
  num_gpus: 4

models:
  - !Model
    id: meta-llama/Meta-Llama-3-8B-Instruct
    source: huggingface
    predict:
      temperature: 0.9
      top_p: 0.9
      top_k: 20
      max_new_tokens: 250

Cloud-Specific Instance Types

AWS SageMaker Types

Choose your instance type based on your model’s requirements:

ml.m5.xlarge

Good for smaller models like BERT-base

4 vCPU
16 GB Memory
Available in free tier

ml.g5.12xlarge

Required for larger models like LLaMA

48 vCPU
192 GB Memory
4 NVIDIA A10G GPUs

Remember to deactivate unused endpoints to avoid unnecessary charges!

GCP Vertex AI Types

n1-standard-4

Good for smaller models

4 vCPU
15 GB Memory
Cost-effective option

a2-highgpu-1g

For larger models

12 vCPU
85 GB Memory
1 NVIDIA A100 GPU

Azure ML Types

Standard_DS3_v2

Good for smaller models

4 vCPU
14 GB Memory
Balanced performance

Standard_NC6s_v3

For GPU workloads

6 vCPU
112 GB Memory
1 NVIDIA V100 GPU

Deployment Best Practices

Use meaningful endpoint names that include:
- Model name/version
- Environment (dev/staging/prod)
- Team identifier
Start with smaller instance types and scale up as needed
Always version your YAML configurations
Set up monitoring and alerting for your endpoints

Make sure you setup budget monitory and alerts to avoid unexpected charges.

Troubleshooting Deployments

Common issues and their solutions:

Deployment Timeout
- Check instance quota limits
- Verify network connectivity
Instance Not Available
- Try a different region
- Request quota increase
- Use an alternative instance type
Model Loading Failure
- Verify model ID and version
- Check instance memory requirements
- Validate Hugging Face token if required
- Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue.

Getting Started

Tutorials

Configurations

Core Concepts

Deployment Methods

Interactive Deployment

YAML-based Deployment

Multi-Cloud Deployment

AWS (SageMaker)

GCP (Vertex AI)

Azure ML

YAML Configuration Reference

Basic Deployment

Advanced Configuration

Cloud-Specific Instance Types

AWS SageMaker Types

ml.m5.xlarge

ml.g5.12xlarge

GCP Vertex AI Types

n1-standard-4

a2-highgpu-1g

Azure ML Types

Standard_DS3_v2

Standard_NC6s_v3

Deployment Best Practices

Troubleshooting Deployments

Getting Started

Tutorials

Configurations

Core Concepts

​Deployment Methods

​Interactive Deployment

​YAML-based Deployment

​Multi-Cloud Deployment

​AWS (SageMaker)

​GCP (Vertex AI)

​Azure ML

​YAML Configuration Reference

​Basic Deployment

​Advanced Configuration

​Cloud-Specific Instance Types

​AWS SageMaker Types

ml.m5.xlarge

ml.g5.12xlarge

​GCP Vertex AI Types

n1-standard-4

a2-highgpu-1g

​Azure ML Types

Standard_DS3_v2

Standard_NC6s_v3

​Deployment Best Practices

​Troubleshooting Deployments

Deployment Methods

Interactive Deployment

YAML-based Deployment

Multi-Cloud Deployment

AWS (SageMaker)

GCP (Vertex AI)

Azure ML

YAML Configuration Reference

Basic Deployment

Advanced Configuration

Cloud-Specific Instance Types

AWS SageMaker Types

GCP Vertex AI Types

Azure ML Types

Deployment Best Practices

Troubleshooting Deployments