Claim your free trial on Decodo: https://visit.decodo.com/aOL4yR
🚀 Introduction:
================================
Learn how to crawl and scrape data from websites and use it to fine-tune a Large Language Model (LLM) with Python, Decodo, and Hugging Face. This tutorial walks you through the full pipeline-from web data extraction to LLM training for text classification.
You'll get practical experience with HTML parsing, DOM inspection, and building reliable web crawlers using Decodo's scraping API, complete with JavaScript rendering, proxy rotation, and anti-blocking features.
We'll then prepare the scraped dataset, clean and format it, and fine-tune a model like LLaMA for news article classification. The same workflow applies to instruction tuning, text generation, summarization, and more-just by changing the training objective.
Perfect for anyone looking to master dataset collection, custom LLM fine-tuning, and real-world NLP tasks like text classification.
💡 What You'll Learn:
================================
🌐 How to crawl and scrape structured data from websites using Decodo
🔧 Understanding HTML/CSS structure for scraping and navigating the DOM
🧠 The fundamentals of large language models (LLMs) and how fine-tuning works
🧹 How to clean and prepare datasets for machine learning
📊 How to train/ Fine tune an LLM text classifier using Hugging Face
🛠️ Switching a text generation model to a text classification model
🧪 How to evaluate fine-tuned LLMs for accuracy and performance
🔁 How to use the trained model for inference
🔗 Links:
================================
Decodo: https://visit.decodo.com/aOL4yR
Code Link/Google Colab: https://colab.research.google.com/dri...
News Website: https://www.npr.org/
Hugging Face LLama Model: https://huggingface.co/meta-llama/Lla...
My Trained Classifier: https://huggingface.co/AbdullahTarek/...
🔑 TIMESTAMPS
================================
0:00 - Introduction
12:30 - HTML Basics
23:47 - Dataset Crawling
1:04:08 - LLM and Fine Tuning Explained
1:13:23 - LLM Training
コメント