How to Use Dagster Partitions¶
Experimental: read before starting
Dagster partitions support in Kedro-Dagster is experimental with the following limitations:
- Only
StaticPartitionsDefinitionis supported (no time-window or dynamic partitions). - Fan-out is static at translation time. Dynamic partitioning based on runtime information is not supported.
- Backfills or materializations from the Dagster UI do not trigger Kedro hooks. Kedro-MLflow integration relying on hooks will not function in those cases.
DagsterPartitionedDatasetreduces to Kedro'sPartitionedDatasetwhen run outside of Dagster, so the pipeline remains runnable withkedro run.
Kedro-Dagster provides two custom datasets to enable Dagster partitions in your Kedro project: DagsterPartitionedDataset and DagsterNothingDataset.
Define a partitioned dataset¶
Add a DagsterPartitionedDataset to your Kedro catalog:
my_partitioned_dataset:
type: kedro_dagster.datasets.DagsterPartitionedDataset
path: data/01_raw/my_data/
dataset:
type: pandas.CSVDataSet
partition:
type: dagster.StaticPartitionsDefinition
partition_keys: ["2023-01-01.csv", "2023-01-02.csv", "2023-01-03.csv"]
When a job includes a DagsterPartitionedDataset, Dagster schedules and materializes per-partition runs. You can select partition keys in the Dagster UI launchpad or use backfills for ranges.
If a node depends on or produces a DagsterPartitionedDataset, the translator creates per-partition Dagster ops for that node. These ops execute in parallel for each partition key. Downstream nodes are also fanned-out, respecting any defined partition mappings.
Map partitions to downstream datasets¶
If you want to map upstream partition keys to downstream partition keys, use partition_mappings on the upstream dataset:
my_upstream_dataset:
type: kedro_dagster.datasets.DagsterPartitionedDataset
path: data/01_raw/upstream/
dataset:
type: pandas.CSVDataset
partition:
type: dagster.StaticPartitionsDefinition
partition_keys: ["1.csv", "2.csv", "3.csv"]
partition_mappings:
my_downstream_dataset:
type: dagster.StaticPartitionMapping
downstream_partition_keys_by_upstream_partition_key:
1.csv: 10.csv
2.csv: 20.csv
3.csv: 30.csv
my_downstream_dataset:
type: kedro_dagster.datasets.DagsterPartitionedDataset
path: data/02_intermediate/downstream/
dataset:
type: pandas.CSVDataset
partition:
type: dagster.StaticPartitionsDefinition
partition_keys: ["10.csv", "20.csv", "30.csv"]
Pattern targets using {} syntax are also supported (e.g., {namespace}.partitioned_dataset).
If upstream and downstream partitions share the same keys, use IdentityPartitionMapping instead:
my_upstream_dataset:
type: kedro_dagster.datasets.DagsterPartitionedDataset
path: data/02_raw/upstream/
dataset:
type: pickle.PickleDataset
partition:
type: dagster.StaticPartitionsDefinition
partition_keys: ["A.pkl", "B.pkl", "C.pkl"]
partition_mappings:
my_downstream_dataset:
type: dagster.IdentityPartitionMapping
See DagsterPartitionedDataset for all parameters.
Supported partition types¶
| Type | Description |
|---|---|
StaticPartitionsDefinition |
Fixed set of partitions (dates, regions, model variants). Dagster docs |
| Mapping Type | Description |
|---|---|
StaticPartitionMapping |
Explicit upstream-to-downstream key mapping. Dagster docs |
IdentityPartitionMapping |
1:1 mapping where keys match exactly. Dagster docs |
Need other Dagster partition types?
Open an issue describing your requirements.
Enforce execution order with DagsterNothingDataset¶
If you need to enforce that one node completes before another without passing data between them, use DagsterNothingDataset:
my_nothing_dataset:
type: kedro_dagster.datasets.DagsterNothingDataset
metadata:
description: "Enforces preprocessing completion before collection."
These appear as Nothing assets in Dagster and only enforce execution dependencies.
See DagsterNothingDataset for details, and the Example Project for a practical usage.
See also¶
- Configuration Reference: job and pipeline filter options in
dagster.yml - Architecture: how datasets are translated to Dagster assets and IO managers
- Concepts: partitions feature overview and limitations