LLM Testing Behind the Scenes AMD Radeon 7600 XT ROCm set flash_attention to true! 🤦♂️
This week in the RoboTF lab:
Quick follow up on the interesting performance profile with ROCm and the quants versions.
TLDR - turn flash_attention to true and watch the performance difference and make me do a face palm for not trying the earlier.
This is the previous video on the flash_attention setting with Nvidia cards: • LocalAI LLM Tuning: WTH is Flash Attention...
RoboTF Website: https://robotf.ai/Mistral_7B_Leaderboard
Spend a ☕️ worth of time in the RoboTF lab, and let's put some data in front of us.
Our website: https://robotf.ai
Machine specs here: https://robotf.ai/Machine_Lab_Specs
(These are affiliate-based links that help the channel if you purchase from them!)
GPU Being Used: RX 7600 XT https://amzn.to/4e0znEd
GPU Bench Node (These are affiliate-based links that help the channel if you purchase from them!)
Open Air Case https://amzn.to/3U08Y27
30cm Gen 4 PCIe Extender https://amzn.to/3Unhclh
20cm Gen 4 PCIe Extender https://amzn.to/4eEiosA
1 TB NVME https://amzn.to/4gWFcFb
Corsair RM850x https://amzn.to/3NkITa4
128GB Lexar SSD https://amzn.to/3TZYYGh
G.SKILL Ripjaws V Series DDR 64GB Kit https://amzn.to/4dAZrWm
Core I9 9820x https://amzn.to/47UuIST
Nocuta NH-U12DX i4 CPU Cooler: https://amzn.to/3TZ7O6R
Supermicro CX299-PGF Logic Board https://amzn.to/3BxbWVr
Remote Power Switch https://amzn.to/3BubQOg
Your results may vary due to hardware, software, model used, context size, weather, wallet, and more!
コメント