How To Set up a Local Airflow Installation With Kubernetes

Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. This definition was provided by Airbnb when the platform was developed at the startup. In 2014, it was used to manage complex data engineering pipelines, and since then, it has been widely adopted. Some companies using Apache Airflow include Slack, Zoom, eBay, and Adobe, among others.

In this article, we dive into the intricacies of setting up a local airflow installation running in Kubernetes.

At the end of this guided and insightful installation, you will unlock new skills in setting up a local Airflow installation running in Kubernetes. These skills are necessary to accelerate your Airflow development and stay ahead in today's competitive tech landscape.

Don’t be discouraged by the intricacies of the installation, you can install locally by choosing your favorite OS flavor, Windows or Mac!  

The statistics speak!  

airflow

Credit: Astronomer

Are you ready to learn? 

Let's dive in for an insightful installation! 

1. Installation Prerequisites 

Please make sure that the following prerequisites are installed on your local machine:

If you are on Mac you should check the following tools 

kubectl

If you are on Windows you should visit the same tools with guided documentation for a particular setup: 

2. Building the Airflow Image With DockerFile

Docker remains the de facto standard for containerization in the industry, below is the DockerFile.

DockerFile

 Here we build an image named Apache/airflow with the version tag/tag 2.3.3.

build

3. Setup a Local Image Registry to Load Airflow Image

The below command is used to create a local container registry named "myregistry.localhost" with a specified port number of 12345. This local registry allows for storing and accessing Docker images within the Kubernetes cluster managed by k3d. It provides a convenient way to manage and distribute container images locally without the need for an external registry service.

localhost

4. Upload the Airflow Image to a Local Image Registry

Upload the Airflow Image to a Local Image Registry

5. Add Registry at /Etc/Hosts

The below entry is added to the /etc/hosts  file which maps the hostname to the IP address, which is the localhost. It is often used in development environments where services are running locally, and this hostname needs to be resolved to the localhost IP address.

Add Registry at /Etc/Hosts

6. Create the Kubernetes Cluster

Set configuration for mounting the local DAGs folder to the Kubernetes cluster.

The below command sets up the mounting of the DAGs folder to each of the Kubernetes nodes. This is important for connecting the local filesystem directly to each Kubernetes pod that will be running as a part of the Airflow cluster.

k3d.yaml contains the mounts from Mac to k3d container.

Create the Kubernetes Cluster

We need a Kubernetes cluster on which to run Airflow and k3d makes that very simple.

We need a Kubernetes cluster on which to run Airflow and k3d makes that very simple.

7. Test Airflow Image Creation Over Kubernetes

The command creates a new container named airflow-test using the specified Docker image and then starts an interactive Bash shell inside the container.

Test Airflow Image Creation Over Kubernetes

8. Deploy the Kubernetes Dashboard

#service-account.yaml

Applies the Configuration in the YAML file service-account.yaml to the Kubernetes cluster.

Deploy the Kubernetes Dashboard

This step deploys the Kubernetes dashboard to the cluster. It applies the required configuration from the official Kubernetes dashboard repository. 

This step deploys the Kubernetes dashboard to the cluster. It applies the required configuration from the official Kubernetes dashboard repository.

9. Kubernetes Dashboard URL

Sign in with the token from the above step.

Sign in with the token from the above step.

10. Instantiate Airflow Over Kubernetes

override-values.yaml values contain the mount points from the k3d container to the airflow scheduler and workers.

The Kubernetes Executor allows you to run all the Airflow tasks on Kubernetes as separate Pods.

A screenshot of a computer programDescription automatically generatedThe Kubernetes Executor allows you to run all the Airflow tasks on Kubernetes as separate Pods.

pvc-claim.yaml Persistent Volume Claim is a request for storage resources from a cluster.

pvc-claim.yaml Persistent Volume Claim is a request for storage resources from a cluster.

Helm command to install airflow in Kubernetes.

Helm command to install airflow in Kubernetes.

11. Forward Airflow UI From the k3d Container to the Local

This command sets up port forwarding from a local port (8080) to the port (8080) of the service named airflow-webserver within the airflow namespace in Kubernetes. This allows the browser to access the web server of the Airflow application running in the Kubernetes cluster locally on your machine.

Forward Airflow UI From the k3d Container to the Local

12. Stop the Cluster

The below command is used to stop the Kubernetes cluster created using k3d.

k3d cluster

13. Start the Cluster

The below command is used to re-create the Kubernetes cluster using k3d.

The below command is used to re-create the Kubernetes cluster using k3d.

14. Delete the Airflow Instance

Never run this unless you want to completely delete the airflow setup and create a new setup.

Delete the Airflow Instance

15. Delete the Cluster

 delete the cluster 

Conclusion 

Here are the steps to locally install Apache Airflow in Kubernetes. Widely adopted in the industry, Airflow is crucial for developers, DevOps engineers, machine learning engineers, and so many job roles, as data is ubiquitous, knowing how to work with different workflows provides significant value to companies in this competitive tech economy.

 

 

 

 

Top