Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale

Webinar Details
Organizations are deploying LLMs for inference across many workloads. A common challenge that arises is how to scale and productionize these workloads cost effectively.

In this webinar with Anyscale and AWS, you will learn how to leverage AWS accelerator instances, including AWS Inferentia, to reliably serve LLMs at scale using vLLM and Ray, all hosted on Amazon EKS. You’ll also learn about Anyscale’s performance and enterprise capabilities to enable your most ambitious LLM and GenAI inference workloads.

Join this session to learn more about:
-How to use AWS Inferentia accelerators for leading price-performance.
-Building a complete LLM inference stack using vLLM, Ray, on EKS with AWS Inferentia.
-How to leverage AWS compute instances on Anyscale for optimized LLM Inference
-Anyscale’s managed enterprise LLM Inference offering with advanced cluster management optimizations, including dynamic auto-scaling, scale-to-zero, on-demand to spot, fault tolerance, zero downtime upgrades, and more.

Speakers
-Art Sedighi, Sr. Partner Solutions Architect
-Vara Bonthu, Principal OSS Specialist SA
-Akshay Malik, Engineering Manager
-Matt Connor, Product Manager

Is this webinar right for me?
This technical webinar is especially useful for AI Engineers who want to explore ways to operationalize generative AI models at scale while being cost efficient. It is also useful for Infrastructure Engineers who plan to support GenAI use cases and LLM Inference in their organizations.

[ad_2]

source