Speedrun deploying LLM Embedding models into Production

「ツール」は右上に移動しました。

利用したサーバー: wtserver1

11いいね 373 views回再生

Speedrun deploying LLM Embedding models into Production

If you need help with anything quantization or ML related (e.g. debugging code) feel free to book a 30 minute consultation session! https://calendly.com/oscar-savolainen

I'm also available for long-term freelance work, e.g. for training / productionizing models, teaching AI concepts, etc.

Video Summary:
In this video we speedrun deploying an LLM embedding model from Huggingface into production on a GPU instance.

We spin up a Runpod GPU instance, expose the required ports, install an infinity server inference engine, and make API requests to the server to obtain text embeddings.

Speedrun deploying LLM Embedding models into Production

コメント