Thanks for the video. I'm planning to deploy Vicuna-33B model. Can I deploy this on Runpod? If yes, how much RAM should I use estimated for GPU? Your reply will be highly appreciated
Thanks for sharing an informative document. It really helps a lot!!
I was trying to use Qlora to fine tune Llama2 but has trouble to push it on hugging face. It gave me error code when I trying to unload the model and merge
Thank you so much for the video. I am new to this RunPod service so I am not too sure how to calculate the total cost indeed. Does it charge by the stoage of your LLM model and the computing time everytime a request is made? Or charge by the amount of time when the pod is running, no matter a request is made or not when the pod is running?
I followed the script exactly. However, the generated swagger link don't work. There was no error, but the generated link don't lead to anywhere. I think because of this, the response.status_code later in the script becomes 404. Could there be a problem with runpod api?
I was unable to get it working with CodeLlama 2. I am getting error while run pod downloads safe tensors -> text_generation_launcher: An error occurred while downloading the model safe tensors using `hf_transfer`.
hey there, how do I create a generative AI chatbox with my own data? let us say I have data regarding a company and I want to create a "chatgpt" kinda thingy which can answer the questions which I have related to that data I have juggled through the internet today and found 1) Data collection 2) Data preprocessing 3) Selecting a pre trained model(cause it is easy than creating one) 4) Fine tuning the model 5) Iteration This is my understanding as of now so basically how do I have preprocess the data? do I have to learn NLP for that?
Is it possible to do batch requests, where I could run many calls at the same time? I want to run 5mm prompts.
very cool ! I learned quite a few tricks, thanks do you know how we could call a model already downloaded inside of our workspace ? so that we don't have to download the models again and again
how to give api key in llama 2?
how we can use using the adapter model
Thank you for your interest in using Llama 2. Unfortunately, you do not meet the criteria to obtain a license at this time.
How about protecting the API with some token or bearer or key? Probably we wouldn't deploy such an endpoint completely open.
Great video, very informative! I was wondering how would I give the pod my huggingface API Key in order to use gated models, as you mentioned in the video. Keep up the good work!
Can you deploy a fine tuned model?
How much is it to run this (estimated per month)?
Why not use inference endpoint on HF?
This is under Apache 2.0?
great video
@venelin_valkov