Hyper-v

Fine-tuning Large Language Models (LLMs) | w/ Example Code



Want to learn more? I’m launching a 6-week live BootCamp for AI Builders. Learn more: …

[ad_2]

source

Related Articles

36 Comments

  1. UPDATE: Someone pointed out that the fine-tuned model here is overfitting, so I created an improved example that uses transfer learning: https://youtu.be/4QHg8Ix8WWQ

    👉More on LLMs: https://www.youtube.com/playlist?list=PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0

    References
    [1] Deeplearning.ai Finetuning Large Langauge Models Short Course: https://www.deeplearning.ai/short-courses/finetuning-large-language-models/
    [2] arXiv:2005.14165 [cs.CL] (GPT-3 Paper)
    [3] arXiv:2303.18223 [cs.CL] (Survey of LLMs)
    [4] arXiv:2203.02155 [cs.CL] (InstructGPT paper)
    [5] PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware: https://huggingface.co/blog/peft
    [6] arXiv:2106.09685 [cs.CL] (LoRA paper)
    [7] Original dataset source — Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.

  2. thank you sooo much for this content, it's so helpful and clear!!!

    if i don't use LoRa when fine-tuning and instead just specify `Trainer(model=AutoModelForSequenceClassification.from_pretrained(
    model_checkpoint, num_labels=2)` (which is what huggingface does in most of its documentation), what type of parameter training is that performing? retraining all of the parameters?

  3. Thank you, is a nice video and you have a clear explanation, actually I try to do this with GPT Neo model (EleutherAI/gpt-neo-1.3B) but when I do training, the Training Loss always have no log values and Validation Loss always NaN (when do it with BERT or distilbert, is run perfectly), Do you have any suggestions or reading resources to fix this?

  4. Hey, It was good but did your Model also take a lot of time to get fine-tuned while you were Pluging everything to the Trainer Class as for me it's taking nearly i lost the time it is taking but 10 epochs are getting trained for every 0.05 it/s.

  5. Why is truncation side set to the left? What is the difference between choosing "right" or "left"? I'm asking in a general sense, not just for the example shown here. Is there anyone that can provide some intuition or good reading references?

  6. I needed to know how parameter efficient finetuning works to finetune a voice encoder for emotion detection task. This video helped me a lot. I used LoRA for it. Thanks ❤

  7. Very amazing video!!! I have one question: when I use your code to fine-tune the model with my own dataset, but since my dataset is too large it leads to memory error (not gpu memory) when I read the dataset, what should I do to avoid this issue? Can I read and fine-tune in a small batch?

  8. I'm wanting to use an LLM for sentiment analysis and textual content classification and require a very specific JSON output structure for further processing and storage… would this be possible via fine-tuning? Could you (or anyone) perhaps point me in the right direction?

  9. Thanks for these great videos. I really love the fact that you focus on building an intuitive understanding as opposed to throwing jargons. Could you please start a langchain series?

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button