Skip to main content

Fine-tune an LLM

In this tutorial, you'll build a pipeline with Dagster that:

Loads a public Goodreads JSON dataset into DuckDB
Performs feature engineering to enhance the data
Creates and validates the data files needed for an OpenAI fine-tuning job
Generate a custom model and validate it

Prerequisites

To follow the steps in this guide, you'll need:

Basic Python knowledge
Python 3.9+ installed on your system. Refer to the Installation guide for information.
Familiarity with SQL and Python data manipulation libraries, such as Pandas.
Understanding of data pipelines and the extract, transform, and load process (ETL).

Step 1: Set up your Dagster environment

First, set up a new Dagster project.

Clone the Dagster repo and navigate to the project:
```
cd examples/dagster-llm-fine-tune
```

Create and activate a virtual environment:

MacOS
Windows

uv venv dagster_tutorial
source dagster_tutorial/bin/activate

uv venv dagster_tutorial
dagster_tutorial\Scripts\activate

Install Dagster and the required dependencies:
```
uv pip install -e ".[dev]"
```

Step 2: Launch the Dagster webserver

To make sure Dagster and its dependencies were installed correctly, navigate to the project root directory and start the Dagster webserver:

followed by a bash code snippet for

dagster dev

Next steps

Continue this tutorial with ingestion

Step 1: Set up your Dagster environment
Step 2: Launch the Dagster webserver
Next steps