Skip to content

How to Migrate an Existing Kedro Project

This guide shows you how to add Dagster orchestration to an existing Kedro project. Use this when you already have a working Kedro project with custom datasets, hooks, or configuration and want to run it through Dagster.

Prerequisites

  • An existing Kedro project (0.19.x or 1.x)
  • The project runs successfully with kedro run

1. Install Kedro-Dagster

Add the package to your project dependencies:

pip install kedro-dagster
uv add kedro-dagster

Verify the CLI is available:

kedro dagster --help

2. Initialize Dagster integration files

From your project root:

kedro dagster init

This creates three files:

File Purpose
conf/base/dagster.yml Orchestration configuration (jobs, executors, schedules)
src/<package>/definitions.py Dagster entry point that loads your project
dg.toml Dagster dg CLI configuration (Dagster >= 1.10.6)

If any of these files already exist (from a previous attempt), use --force to overwrite:

kedro dagster init --force

3. Check catalog compatibility

Most Kedro datasets work without changes. The translator wraps each dataset's save() and load() methods into Dagster IO managers automatically.

Datasets that need attention:

  • MemoryDataset: works, but the data lives only within the Dagster run. Cross-job data sharing is not supported.
  • Custom datasets with side effects: if your dataset's save() or load() method interacts with external services (APIs, message queues), verify it behaves correctly when called from within a Dagster op.
  • Datasets with credentials: credentials from conf/<env>/credentials.yml are loaded normally. Ensure the Dagster environment has access to the same credential files or environment variables.

4. Verify hooks work

Kedro hooks are preserved across the translation. If your project uses custom hooks, they will fire at the same lifecycle points in Dagster.

Test by starting the dev server and running a job:

kedro dagster dev

In the Dagster UI, launch a job and check the logs for your hook output.

Warning

Backfills and asset materializations triggered directly from the Dagster UI do not invoke Kedro pipeline-level hooks (before_pipeline_run, after_pipeline_run). Node-level and dataset-level hooks still fire.

5. Configure jobs

Edit conf/base/dagster.yml to define which pipelines become Dagster jobs:

jobs:
  full_pipeline:
    pipeline: __default__

  training_only:
    pipeline: data_science
    tags: [train]

If you have multiple Kedro pipelines registered in pipeline_registry.py, each can become a separate Dagster job with its own executor and schedule.

6. Verify the translation

List all generated Dagster definitions to confirm everything translated correctly:

kedro dagster list-defs

Start the UI and inspect:

kedro dagster dev

Check that:

  • All expected assets appear in the asset graph
  • Jobs contain the correct nodes
  • Parameters are visible in the job launchpad

Common migration issues

Problem: Node names contain invalid characters
Dagster requires ^[A-Za-z0-9_]+$ for names. The translator converts dots to double underscores automatically. If names still fail, check for other special characters in your node names.
Problem: Hook order differs from kedro run
Hooks fire at equivalent lifecycle points, but the exact timing may differ slightly because Dagster executes ops independently. Avoid hooks that depend on execution order between unrelated nodes.

Next steps

  • Explore the configuration: See all available dagster.yml options in the Configuration Reference.
  • Understand the translation: Learn how Kedro concepts map to Dagster in Architecture.
  • Troubleshoot issues: Consult the Troubleshooting guide for common migration problems.