How to Use Dagster Partitions¶

Experimental: read before starting

Dagster partitions support in Kedro-Dagster is experimental with the following limitations:

Only StaticPartitionsDefinition is supported (no time-window or dynamic partitions).
Fan-out is static at translation time. Dynamic partitioning based on runtime information is not supported.
Backfills or materializations from the Dagster UI do not trigger Kedro hooks. Kedro-MLflow integration relying on hooks will not function in those cases.
DagsterPartitionedDataset reduces to Kedro's PartitionedDataset when run outside of Dagster, so the pipeline remains runnable with kedro run.

Kedro-Dagster provides two custom datasets to enable Dagster partitions in your Kedro project: DagsterPartitionedDataset and DagsterNothingDataset.

Define a partitioned dataset¶

Add a DagsterPartitionedDataset to your Kedro catalog:

my_partitioned_dataset:
  type: kedro_dagster.datasets.DagsterPartitionedDataset
  path: data/01_raw/my_data/
  dataset:
    type: pandas.CSVDataSet
  partition:
    type: dagster.StaticPartitionsDefinition
    partition_keys: ["2023-01-01.csv", "2023-01-02.csv", "2023-01-03.csv"]

When a job includes a DagsterPartitionedDataset, Dagster schedules and materializes per-partition runs. You can select partition keys in the Dagster UI launchpad or use backfills for ranges.

If a node depends on or produces a DagsterPartitionedDataset, the translator creates per-partition Dagster ops for that node. These ops execute in parallel for each partition key. Downstream nodes are also fanned-out, respecting any defined partition mappings.

Map partitions to downstream datasets¶

If you want to map upstream partition keys to downstream partition keys, use partition_mappings on the upstream dataset:

my_upstream_dataset:
  type: kedro_dagster.datasets.DagsterPartitionedDataset
  path: data/01_raw/upstream/
  dataset:
    type: pandas.CSVDataset
  partition:
    type: dagster.StaticPartitionsDefinition
    partition_keys: ["1.csv", "2.csv", "3.csv"]
  partition_mappings:
    my_downstream_dataset:
      type: dagster.StaticPartitionMapping
      downstream_partition_keys_by_upstream_partition_key:
        1.csv: 10.csv
        2.csv: 20.csv
        3.csv: 30.csv

my_downstream_dataset:
  type: kedro_dagster.datasets.DagsterPartitionedDataset
  path: data/02_intermediate/downstream/
  dataset:
    type: pandas.CSVDataset
  partition:
    type: dagster.StaticPartitionsDefinition
    partition_keys: ["10.csv", "20.csv", "30.csv"]

Pattern targets using {} syntax are also supported (e.g., {namespace}.partitioned_dataset).

If upstream and downstream partitions share the same keys, use IdentityPartitionMapping instead:

my_upstream_dataset:
  type: kedro_dagster.datasets.DagsterPartitionedDataset
  path: data/02_raw/upstream/
  dataset:
    type: pickle.PickleDataset
  partition:
    type: dagster.StaticPartitionsDefinition
    partition_keys: ["A.pkl", "B.pkl", "C.pkl"]
  partition_mappings:
    my_downstream_dataset:
      type: dagster.IdentityPartitionMapping

See DagsterPartitionedDataset for all parameters.

Supported partition types¶

Type	Description
`StaticPartitionsDefinition`	Fixed set of partitions (dates, regions, model variants). Dagster docs

Mapping Type	Description
`StaticPartitionMapping`	Explicit upstream-to-downstream key mapping. Dagster docs
`IdentityPartitionMapping`	1:1 mapping where keys match exactly. Dagster docs

Need other Dagster partition types?

Open an issue describing your requirements.

Enforce execution order with `DagsterNothingDataset`¶

If you need to enforce that one node completes before another without passing data between them, use DagsterNothingDataset:

my_nothing_dataset:
  type: kedro_dagster.datasets.DagsterNothingDataset
  metadata:
    description: "Enforces preprocessing completion before collection."

These appear as Nothing assets in Dagster and only enforce execution dependencies.

See DagsterNothingDataset for details, and the Example Project for a practical usage.

How to Use Dagster Partitions¶

Define a partitioned dataset¶

Map partitions to downstream datasets¶

Supported partition types¶

Enforce execution order with DagsterNothingDataset¶

See also¶

Enforce execution order with `DagsterNothingDataset`¶