@venelin_valkov

Full text tutorial (requires MLExpert Pro): https://www.mlexpert.io/prompt-engineering/deploy-llama-2-on-runpod

@islamicinterestofficial

Thanks for the video. I'm planning to deploy Vicuna-33B model. Can I deploy this on Runpod? If yes, how much RAM should I use estimated for GPU? Your reply will be highly appreciated

@nini_dev

Thanks for sharing an informative document. It really helps a lot!!

@DawnWillTurn

I was trying to use Qlora to fine tune Llama2 but has trouble to push it on hugging face. It gave me error code when I trying to unload the model and merge

@brianm7690

Thank you so much for the video.
I am new to this RunPod service so I am not too sure how to calculate the total cost indeed.
Does it charge by the stoage of your LLM model and the computing time everytime a request is made?
Or charge by the amount of time when the pod is running, no matter a request is made or not when the pod is running?

@hocklintai3391

I followed the script exactly. However, the generated swagger link don't work. There was no error, but the generated link don't lead to anywhere. I think because of this, the response.status_code later in the script becomes 404. Could there be a problem with runpod api?

@5hirish

I was unable to get it working with CodeLlama 2. I am getting error while run pod downloads safe tensors -> text_generation_launcher: An error occurred while downloading the model safe tensors using `hf_transfer`.

@sathvikreddy4807

hey there, 
how do I create a generative AI chatbox with my own data? 
let us say I have data regarding a company and I want to create a "chatgpt" kinda thingy which can answer the questions which I have related to that data 
I have juggled through the internet today and found 
1)  Data collection
2)  Data preprocessing 
3)   Selecting a pre trained model(cause it is easy than creating one)
4)   Fine tuning the model
5)   Iteration 

This is my understanding as of now 
so basically how do I have preprocess the data? 
do I have to learn NLP for that?

@Ryan-yj4sd

Is it possible to do batch requests, where I could run many calls at the same time? I want to run 5mm prompts.

@romainjouhameau2764

very cool ! I learned quite a few tricks, thanks
do you know how we could call a model already downloaded inside of our workspace ? so that we don't have to download the models again and again

@lapllacce

how to give api key in llama 2?

@NitinBhayana-h1q

how we can use using the adapter model

@myspam4194

Thank you for your interest in using Llama 2. Unfortunately, you do not meet the criteria to obtain a license at this time.

@deeplearning5408

How about protecting the API with some token or bearer or key? Probably we wouldn't deploy such an endpoint completely open.

@Danielbg9655

Great video, very informative!
I was wondering how would I give the pod my huggingface API Key in order to use gated models, as you mentioned in the video.
Keep up the good work!

@tahahuraibb5833

Can you deploy a fine tuned model?

@clear_lake

How much is it to run this (estimated per month)?

@Ryan-yj4sd

Why not use inference endpoint on HF?

@DawnWillTurn

This is under Apache 2.0?

@Ryan-yj4sd

great video