Building a Custom Docker Image
Build a custom Docker image for your script tasks.
You can bake all dependencies needed for your script tasks directly into the Kestra’s base image. Here is an example installing Python dependencies:
FROM kestra/kestra:latest
USER rootRUN apt-get update -y && apt-get install pip -y
RUN pip install --no-cache-dir pandas requests boto3
Then, point to that Dockerfile in your docker-compose.yml
file:
services: kestra: build: context: . dockerfile: Dockerfile image: kestra-python:latest
Once you start Kestra containers using docker compose up -d
, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS
runner:
id: python_processnamespace: company.teamtasks: - id: custom_dependencies type: io.kestra.plugin.scripts.python.Script runner: PROCESS script: | import pandas as pd import requests import boto3 print(f"Pandas version: {pd.__version__}") print(f"Requests version: {requests.__version__}") print(f"Boto3 version: {boto3.__version__}")
Building a custom Docker image for your script tasks
Imagine you use the following flow:
id: zip_to_pythonnamespace: company.team
variables: file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks: - id: get_zipfile type: io.kestra.plugin.core.http.Download uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip type: io.kestra.plugin.compress.ArchiveDecompress algorithm: ZIP from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output type: io.kestra.plugin.scripts.python.Script taskRunner: type: io.kestra.plugin.scripts.runner.docker.Docker containerImage: ghcr.io/kestra-io/pydata:latest env: FILE_ID: "{{ render(vars.file_id) }}" inputFiles: "{{ outputs.unzip.files }}" script: | import os import pandas as pd
file_id = os.environ["FILE_ID"] file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file) df.to_parquet(f"{file_id}.parquet") outputFiles: - "*.parquet"
The Python task requires pandas to be installed. Pandas is a large library, and it’s not included in the default python
image. In this case, you have the following options:
- Install pandas in the
beforeCommands
property of the Python task. - Use one of our pre-built images that already include pandas, such as the
ghcr.io/kestra-io/pydata:latest
image. - Build your own custom Docker image that includes pandas.
1) Installing pandas in the beforeCommands
property
id: install_pandas_at_runtimenamespace: company.teamtasks: - id: custom_dependencies type: io.kestra.plugin.scripts.python.Script taskRunner: type: io.kestra.plugin.core.runner.Process beforeCommands: - pip install pyarrow pandas script: | import pandas as pd print(f"Pandas version: {pd.__version__}")
2) Using one of our pre-built images
id: use_prebuilt_imagenamespace: company.teamtasks: - id: custom_dependencies type: io.kestra.plugin.scripts.python.Script taskRunner: type: io.kestra.plugin.scripts.runner.docker.Docker containerImage: ghcr.io/kestra-io/pydata:latest script: | import pandas as pd print(f"Pandas version: {pd.__version__}")
3) Building a custom Docker image
If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:
FROM python:3.11-slimRUN pip install --upgrade pipRUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion
Then, build the image:
docker build -t kestra-custom:latest .
Finally, use that image in your flow:
id: zip_to_pythonnamespace: company.team
variables: file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks: - id: get_zipfile type: io.kestra.plugin.core.http.Download uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip type: io.kestra.plugin.compress.ArchiveDecompress algorithm: ZIP from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output type: io.kestra.plugin.scripts.python.Script taskRunner: type: io.kestra.plugin.scripts.runner.docker.Docker pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub containerImage: kestra-custom:latest # ⚡️ Use your custom image here env: FILE_ID: "{{ render(vars.file_id) }}" inputFiles: "{{ outputs.unzip.files }}" script: | import os import pandas as pd
file_id = os.environ["FILE_ID"] file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file) df.to_parquet(f"{file_id}.parquet") outputFiles: - "*.parquet"
Note how the pullPolicy: NEVER
property is used to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.