LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

「ツール」は右上に移動しました。

利用したサーバー: wtserver1

101いいね 3,254 views回再生

LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

LocalAI LLM Testing: Llama 3.3 70B Instruct Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

This week we are taking Llama 3.3 70B (https://huggingface.co/bartowski/Llam...) at a Q8 quant running 96k of context through some tests but focusing on showing the PCIe bandwidth during inference in a multi gpu setup. Hopefully providing more insight for hardware requirements, and general knowledge.

Just a fun night in the lab, grab your favorite relaxation method and join in.
Our website: https://robotf.ai
Machine specs here: https://robotf.ai/Machine_Lab_Specs

(These are affiliate-based links that help the channel if you purchase from them!)
A4500 20GB https://amzn.to/3TXtAYR

Machine Components:
30cm Gen 4 PCIe Extender https://amzn.to/3Unhclh
20cm Gen 4 PCIe Extender https://amzn.to/4eEiosA
2 Tb NVME https://amzn.to/3XYSokg
EVGA SuperNova 1600 G+ Power Supply https://amzn.to/3XWorBB
128GB Lexar SSD https://amzn.to/3TZYYGh
Nocuta NH-U12DX i4 CPU Cooler: https://amzn.to/3TZ7O6R
G.SKILL Ripjaws V Series DDR 128GB Kit https://amzn.to/4ev174M
Asus WS X299 SAGE/10G Logic Board https://amzn.to/4eOskz2
Core I9 7960x https://amzn.to/3NhMaHy
Open Air Case https://amzn.to/3U08Y27
Remote Power Switch https://amzn.to/3BubQOg

Your results may vary due to hardware, software, model used, context size, weather, wallet, and more!

LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

コメント