Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver1
11いいね 373 views回再生

Speedrun deploying LLM Embedding models into Production

If you need help with anything quantization or ML related (e.g. debugging code) feel free to book a 30 minute consultation session! https://calendly.com/oscar-savolainen

I'm also available for long-term freelance work, e.g. for training / productionizing models, teaching AI concepts, etc.

Video Summary:
In this video we speedrun deploying an LLM embedding model from Huggingface into production on a GPU instance.

We spin up a Runpod GPU instance, expose the required ports, install an infinity server inference engine, and make API requests to the server to obtain text embeddings.

コメント