Building a Custom Docker Image

Build a custom Docker image for your script tasks.

You can bake all dependencies needed for your script tasks directly into the Kestra’s base image. Here is an example installing Python dependencies:

FROM kestra/kestra:latest

USER root
RUN apt-get update -y && apt-get install pip -y

RUN pip install --no-cache-dir pandas requests boto3

Then, point to that Dockerfile in your docker-compose.yml file:

services:
  kestra:
    build:
      context: .
      dockerfile: Dockerfile
    image: kestra-python:latest

Once you start Kestra containers using docker compose up -d, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS runner:

id: python_process
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    runner: PROCESS
    script: |
      import pandas as pd
      import requests
      import boto3
      print(f"Pandas version: {pd.__version__}")
      print(f"Requests version: {requests.__version__}")
      print(f"Boto3 version: {boto3.__version__}")

Building a custom Docker image for your script tasks

Imagine you use the following flow:

id: zip_to_python
namespace: company.team

variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"

tasks:
  - id: get_zipfile
    type: io.kestra.plugin.core.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"

  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"

  - id: parquet_output
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/pydata:latest
    env:
      FILE_ID: "{{ render(vars.file_id) }}"
    inputFiles: "{{ outputs.unzip.files }}"
    script: |
      import os
      import pandas as pd

      file_id = os.environ["FILE_ID"]
      file = f"{file_id}-divvy-tripdata.csv"

      df = pd.read_csv(file)
      df.to_parquet(f"{file_id}.parquet")
    outputFiles:
      - "*.parquet"

The Python task requires pandas to be installed. Pandas is a large library, and it’s not included in the default python image. In this case, you have the following options:

Install pandas in the beforeCommands property of the Python task.
Use one of our pre-built images that already include pandas, such as the ghcr.io/kestra-io/pydata:latest image.
Build your own custom Docker image that includes pandas.

1) Installing pandas in the `beforeCommands` property

id: install_pandas_at_runtime
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.core.runner.Process
    beforeCommands:
      - pip install pyarrow pandas
    script: |
      import pandas as pd
      print(f"Pandas version: {pd.__version__}")

2) Using one of our pre-built images

id: use_prebuilt_image
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/pydata:latest
    script: |
      import pandas as pd
      print(f"Pandas version: {pd.__version__}")

3) Building a custom Docker image

If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:

FROM python:3.11-slim
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion

Then, build the image:

docker build -t kestra-custom:latest .

Finally, use that image in your flow:

id: zip_to_python
namespace: company.team

variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"

tasks:
  - id: get_zipfile
    type: io.kestra.plugin.core.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"

  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"

  - id: parquet_output
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
      pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub
    containerImage: kestra-custom:latest # ⚡️ Use your custom image here
    env:
      FILE_ID: "{{ render(vars.file_id) }}"
    inputFiles: "{{ outputs.unzip.files }}"
    script: |
      import os
      import pandas as pd

      file_id = os.environ["FILE_ID"]
      file = f"{file_id}-divvy-tripdata.csv"

      df = pd.read_csv(file)
      df.to_parquet(f"{file_id}.parquet")
    outputFiles:
      - "*.parquet"

Note how the pullPolicy: NEVER property is used to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.

Building a Custom Docker Image

Building a custom Docker image for your script tasks

1) Installing pandas in the beforeCommands property

2) Using one of our pre-built images

3) Building a custom Docker image

1) Installing pandas in the `beforeCommands` property