Installing Dependencies at Runtime
Install dependencies at runtime using beforeCommands
.
There are several ways of installing custom packages for your workflows. This page shows how to install dependencies at runtime using the beforeCommands
property.
Installing dependencies using beforeCommands
While you could bake all your package dependencies into a custom container image, often it’s convenient to install a couple of additional packages at runtime without having to build separate images. The beforeCommands
can be used for that purpose.
pip install package
Here is a simple example installing pip
packages requests
and kestra
before starting the script:
id: pipnamespace: company.team
tasks: - id: before_commands type: io.kestra.plugin.scripts.python.Script containerImage: python:3.11-slim beforeCommands: - pip install requests kestra > /dev/null script: | import requests import kestra
kestra_modules = [i for i in dir(kestra.Kestra) if not i.startswith("_")]
print(f"Requests version: {requests.__version__}") print(f"Kestra modules: {kestra_modules}")
pip install -r requirements.txt
This example clones a Git repository that contains a requirements.txt
file. The script task uses beforeCommands
to install those packages. Lastly, a task lists recently installed packages to validate that this process works as expected:
id: python_requirements_filenamespace: company.team
tasks: - id: wdir type: io.kestra.plugin.core.flow.WorkingDirectory tasks: - id: cloneRepository type: io.kestra.plugin.git.Clone url: https://github.com/kestra-io/examples branch: main
- id: print_requirements type: io.kestra.plugin.scripts.shell.Commands taskRunner: type: io.kestra.plugin.core.runner.Process commands: - cat requirements.txt
- id: list_installed_packages type: io.kestra.plugin.scripts.python.Commands containerImage: python:3.11-slim beforeCommands: - pip install -r requirements.txt > /dev/null commands: - ls -lt $(python -c "import site; print(site.getsitepackages()[0])") | head -n 20
And here is a simple version where we add the requirements.txt
file using the inputFiles
property:
id: python_requirements_filenamespace: company.team
tasks: - id: list_installed_packages type: io.kestra.plugin.scripts.python.Script env: PIP_ROOT_USER_ACTION: ignore inputFiles: requirements.txt: | polars requests kestra containerImage: python:3.11-slim beforeCommands: - pip install --upgrade pip - pip install -r requirements.txt > /dev/null script: | from kestra import Kestra import pkg_resources import re
with open('requirements.txt', 'r') as file: # find package names without versions required_packages = {re.match(r'^\s*([a-zA-Z0-9_-]+)', line).group(1) for line in file if line.strip()}
installed_packages = [(d.project_name, d.version) for d in pkg_resources.working_set]
kestra_outputs = {}
for name, version in installed_packages: if name in required_packages: kestra_outputs[name] = version
Kestra.outputs(kestra_outputs)
Shown in the example above, the WorkingDirectory
task is usually only needed if you use the git.Clone
task. In most other cases, you can use the inputFiles
property to add files to the script’s working directory.
Using Kestra’s prebuilt images
Many data engineering use cases require performing fairly standardized tasks such as:
- processing data with
pandas
- transforming data with
dbt-core
(using a dbt adapter for your data warehouse) - making API calls with the
requests
library
To solve those common challenges, the kestra-io/examples repository provides several public Docker images with the latest versions of those common packages. Many Blueprints use those public images by default. The images are hosted in GitHub Container Registry managed by Kestra’s team and those images follow the naming ghcr.io/kestra-io/packageName:latest
.
Example: running R script in Docker
Here is a simple example using the ghcr.io/kestra-io/rdata:latest
Docker image provided by Kestra to analyze the built-in mtcars
dataset using dplyr
and arrow
R libraries:
id: rCarsnamespace: company.team
tasks: - id: r type: io.kestra.plugin.scripts.r.Script containerImage: ghcr.io/kestra-io/rdata:latest outputFiles: - "*.csv" - "*.parquet" script: | library(dplyr) library(arrow) data(mtcars) # Load mtcars data print(head(mtcars)) final <- mtcars %>% summarise( avg_mpg = mean(mpg), avg_disp = mean(disp), avg_hp = mean(hp), avg_drat = mean(drat), avg_wt = mean(wt), avg_qsec = mean(qsec), avg_vs = mean(vs), avg_am = mean(am), avg_gear = mean(gear), avg_carb = mean(carb) ) final %>% print() write.csv(final, "final.csv") mtcars_clean <- na.omit(mtcars) # remove rows with NA values write_parquet(mtcars_clean, "mtcars_clean.parquet")
Installation of R libraries is time-consuming. From a technical standpoint, you could install custom R packages at runtime as follows:
id: rCarsnamespace: company.team
tasks: - id: r type: io.kestra.plugin.scripts.r.Script containerImage: ghcr.io/kestra-io/rdata:latest beforeCommands: - Rscript -e "install.packages(c('dplyr', 'arrow'))" > /dev/null 2>&1
However, that flow above might take up to 30 minutes, depending on the R packages you install.
Prebuilt Docker images such as ghcr.io/kestra-io/rdata:latest
can help you iterate much faster. Before moving to production, you can build your custom images with the exact package versions that you need.