Deploying Llama 3 to Azure
Introduction
This tutorial guides you through deploying Llama 3 to Azure ML platform using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the installation steps before proceeding.
You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your Azure quotas before proceeding.
The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog.
To find the relevnt model id, follow the steps in the [quick start](For Azure ML)
Step 1: Setting Up Magemaker for Azure
Run the following command to configure Magemaker for Azure deployment:
This initializes Magemaker with the necessary configurations for deploying models to Azure ML Studio.
Step 2: YAML-based Deployment
For reproducible deployments, use YAML configuration:
Example YAML for Azure deployment:
For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through.
Selecting an Appropriate Instance
For 8B parameter models, recommended instance types include:
- Standard_NC24ads_A100_v4 (optimal performance)
- Standard_NC24s_v3 (cost-effective option with V100)
If you encounter quota issues, submit a quota increase request in the Azure console. In the search bar search for Quotas
and select the subscription you are using. In the provider
select Machine Learning
and then select the relevant region for the quota increase
Step 3: Querying the Deployed Model
Once the deployment is complete, note down the endpoint id.
You can use the interactive dropdown menu to quickly query the model.
Querying Models
From the dropdown, select Query a Model Endpoint
to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response.
Or you can use the following code
Conclusion
You have successfully deployed and queried Llama 3 on Azure using Magemaker’s interactive dropdown menu. For any questions or feedback, feel free to contact us at support@slashml.com.