Deploying Llama 3 to SageMaker
Introduction
This tutorial guides you through deploying Llama 3 to AWS SageMaker using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the installation steps before proceeding.
Step 1: Setting Up Magemaker for AWS
Run the following command to configure Magemaker for AWS SageMaker deployment:
This initializes Magemaker with the necessary configurations for deploying models to SageMaker.
Step 2: YAML-based Deployment
For reproducible deployments, use YAML configuration:
Example YAML for AWS deployment:
For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through.
You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your AWS quotas before proceeding.
Step 3: Querying the Deployed Model
Once the deployment is complete, note down the endpoint id.
You can use the interactive dropdown menu to quickly query the model.
Querying Models
From the dropdown, select Query a Model Endpoint
to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response.
Or you can use the following code:
Conclusion
You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker’s interactive dropdown menu. For any questions or feedback, feel free to contact us at support@slashml.com.