How to Migrate an Existing Kedro Project¶
This guide shows you how to add Dagster orchestration to an existing Kedro project. Use this when you already have a working Kedro project with custom datasets, hooks, or configuration and want to run it through Dagster.
Prerequisites¶
- An existing Kedro project (0.19.x or 1.x)
- The project runs successfully with
kedro run
1. Install Kedro-Dagster¶
Add the package to your project dependencies:
Verify the CLI is available:
2. Initialize Dagster integration files¶
From your project root:
This creates three files:
| File | Purpose |
|---|---|
conf/base/dagster.yml |
Orchestration configuration (jobs, executors, schedules) |
src/<package>/definitions.py |
Dagster entry point that loads your project |
dg.toml |
Dagster dg CLI configuration (Dagster >= 1.10.6) |
If any of these files already exist (from a previous attempt), use --force to overwrite:
3. Check catalog compatibility¶
Most Kedro datasets work without changes. The translator wraps each dataset's save() and load() methods into Dagster IO managers automatically.
Datasets that need attention:
- MemoryDataset: works, but the data lives only within the Dagster run. Cross-job data sharing is not supported.
- Custom datasets with side effects: if your dataset's
save()orload()method interacts with external services (APIs, message queues), verify it behaves correctly when called from within a Dagster op. - Datasets with credentials: credentials from
conf/<env>/credentials.ymlare loaded normally. Ensure the Dagster environment has access to the same credential files or environment variables.
4. Verify hooks work¶
Kedro hooks are preserved across the translation. If your project uses custom hooks, they will fire at the same lifecycle points in Dagster.
Test by starting the dev server and running a job:
In the Dagster UI, launch a job and check the logs for your hook output.
Warning
Backfills and asset materializations triggered directly from the Dagster UI do not invoke Kedro pipeline-level hooks (before_pipeline_run, after_pipeline_run). Node-level and dataset-level hooks still fire.
5. Configure jobs¶
Edit conf/base/dagster.yml to define which pipelines become Dagster jobs:
If you have multiple Kedro pipelines registered in pipeline_registry.py, each can become a separate Dagster job with its own executor and schedule.
6. Verify the translation¶
List all generated Dagster definitions to confirm everything translated correctly:
Start the UI and inspect:
Check that:
- All expected assets appear in the asset graph
- Jobs contain the correct nodes
- Parameters are visible in the job launchpad
Common migration issues¶
- Problem: Node names contain invalid characters
- Dagster requires
^[A-Za-z0-9_]+$for names. The translator converts dots to double underscores automatically. If names still fail, check for other special characters in your node names. - Problem: Hook order differs from
kedro run - Hooks fire at equivalent lifecycle points, but the exact timing may differ slightly because Dagster executes ops independently. Avoid hooks that depend on execution order between unrelated nodes.
Next steps¶
- Explore the configuration: See all available
dagster.ymloptions in the Configuration Reference. - Understand the translation: Learn how Kedro concepts map to Dagster in Architecture.
- Troubleshoot issues: Consult the Troubleshooting guide for common migration problems.