Deployment
Learn how to deploy models using Magemaker
Deployment Methods
Magemaker offers multiple ways to deploy your models to AWS, GCP and Azure. Choose the method that best fits your workflow.
Interactive Deployment
When you run the magemaker --cloud [aws|gcp|azure|all]
command, you’ll get an interactive menu that walks you through the deployment process:
This method is great for:
- First-time users
- Exploring available models
- Testing different configurations
YAML-based Deployment
For reproducible deployments and CI/CD integration, use YAML configuration files:
This is recommended for:
- Production deployments
- CI/CD pipelines
- Infrastructure as Code (IaC)
- Team collaborations
Multi-Cloud Deployment
Magemaker supports deployment to AWS SageMaker, GCP Vertex AI, and Azure ML. Here’s how to deploy the same model (facebook/opt-125m) to different cloud providers:
AWS (SageMaker)
GCP (Vertex AI)
Azure ML
YAML Configuration Reference
Basic Deployment
Advanced Configuration
Cloud-Specific Instance Types
AWS SageMaker Types
Choose your instance type based on your model’s requirements:
ml.m5.xlarge
Good for smaller models like BERT-base
- 4 vCPU
- 16 GB Memory
- Available in free tier
ml.g5.12xlarge
Required for larger models like LLaMA
- 48 vCPU
- 192 GB Memory
- 4 NVIDIA A10G GPUs
Remember to deactivate unused endpoints to avoid unnecessary charges!
GCP Vertex AI Types
n1-standard-4
Good for smaller models
- 4 vCPU
- 15 GB Memory
- Cost-effective option
a2-highgpu-1g
For larger models
- 12 vCPU
- 85 GB Memory
- 1 NVIDIA A100 GPU
Azure ML Types
Standard_DS3_v2
Good for smaller models
- 4 vCPU
- 14 GB Memory
- Balanced performance
Standard_NC6s_v3
For GPU workloads
- 6 vCPU
- 112 GB Memory
- 1 NVIDIA V100 GPU
Deployment Best Practices
-
Use meaningful endpoint names that include:
- Model name/version
- Environment (dev/staging/prod)
- Team identifier
-
Start with smaller instance types and scale up as needed
-
Always version your YAML configurations
-
Set up monitoring and alerting for your endpoints
Make sure you setup budget monitory and alerts to avoid unexpected charges.
Troubleshooting Deployments
Common issues and their solutions:
-
Deployment Timeout
- Check instance quota limits
- Verify network connectivity
-
Instance Not Available
- Try a different region
- Request quota increase
- Use an alternative instance type
-
Model Loading Failure
- Verify model ID and version
- Check instance memory requirements
- Validate Hugging Face token if required
- Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue.