Skip to content

How to Use Dagster Partitions

Experimental: read before starting

Dagster partitions support in Kedro-Dagster is experimental with the following limitations:

  • Only StaticPartitionsDefinition is supported (no time-window or dynamic partitions).
  • Fan-out is static at translation time. Dynamic partitioning based on runtime information is not supported.
  • Backfills or materializations from the Dagster UI do not trigger Kedro hooks. Kedro-MLflow integration relying on hooks will not function in those cases.
  • DagsterPartitionedDataset reduces to Kedro's PartitionedDataset when run outside of Dagster, so the pipeline remains runnable with kedro run.

Kedro-Dagster provides two custom datasets to enable Dagster partitions in your Kedro project: DagsterPartitionedDataset and DagsterNothingDataset.

Define a partitioned dataset

Add a DagsterPartitionedDataset to your Kedro catalog:

my_partitioned_dataset:
  type: kedro_dagster.datasets.DagsterPartitionedDataset
  path: data/01_raw/my_data/
  dataset:
    type: pandas.CSVDataSet
  partition:
    type: dagster.StaticPartitionsDefinition
    partition_keys: ["2023-01-01.csv", "2023-01-02.csv", "2023-01-03.csv"]

When a job includes a DagsterPartitionedDataset, Dagster schedules and materializes per-partition runs. You can select partition keys in the Dagster UI launchpad or use backfills for ranges.

If a node depends on or produces a DagsterPartitionedDataset, the translator creates per-partition Dagster ops for that node. These ops execute in parallel for each partition key. Downstream nodes are also fanned-out, respecting any defined partition mappings.

Map partitions to downstream datasets

If you want to map upstream partition keys to downstream partition keys, use partition_mappings on the upstream dataset:

my_upstream_dataset:
  type: kedro_dagster.datasets.DagsterPartitionedDataset
  path: data/01_raw/upstream/
  dataset:
    type: pandas.CSVDataset
  partition:
    type: dagster.StaticPartitionsDefinition
    partition_keys: ["1.csv", "2.csv", "3.csv"]
  partition_mappings:
    my_downstream_dataset:
      type: dagster.StaticPartitionMapping
      downstream_partition_keys_by_upstream_partition_key:
        1.csv: 10.csv
        2.csv: 20.csv
        3.csv: 30.csv

my_downstream_dataset:
  type: kedro_dagster.datasets.DagsterPartitionedDataset
  path: data/02_intermediate/downstream/
  dataset:
    type: pandas.CSVDataset
  partition:
    type: dagster.StaticPartitionsDefinition
    partition_keys: ["10.csv", "20.csv", "30.csv"]

Pattern targets using {} syntax are also supported (e.g., {namespace}.partitioned_dataset).

If upstream and downstream partitions share the same keys, use IdentityPartitionMapping instead:

my_upstream_dataset:
  type: kedro_dagster.datasets.DagsterPartitionedDataset
  path: data/02_raw/upstream/
  dataset:
    type: pickle.PickleDataset
  partition:
    type: dagster.StaticPartitionsDefinition
    partition_keys: ["A.pkl", "B.pkl", "C.pkl"]
  partition_mappings:
    my_downstream_dataset:
      type: dagster.IdentityPartitionMapping

See DagsterPartitionedDataset for all parameters.

Supported partition types

Type Description
StaticPartitionsDefinition Fixed set of partitions (dates, regions, model variants). Dagster docs
Mapping Type Description
StaticPartitionMapping Explicit upstream-to-downstream key mapping. Dagster docs
IdentityPartitionMapping 1:1 mapping where keys match exactly. Dagster docs

Need other Dagster partition types?

Open an issue describing your requirements.

Enforce execution order with DagsterNothingDataset

If you need to enforce that one node completes before another without passing data between them, use DagsterNothingDataset:

my_nothing_dataset:
  type: kedro_dagster.datasets.DagsterNothingDataset
  metadata:
    description: "Enforces preprocessing completion before collection."

These appear as Nothing assets in Dagster and only enforce execution dependencies.

See DagsterNothingDataset for details, and the Example Project for a practical usage.

See also