The modern Data Warehouse increase in complexity it is necessary to have a dependable, scalable, intuitive, and simple scheduling and management program to monitor the flow of data and watch how transformations are completed.
Apache Airflow, help manage the complexities of their Enterprise Data Warehouse, is being adopted by tech companies everywhere for its ease of management, scalability, and elegant design. Airflow is rapidly becoming the go-to technology for companies scaling out large data warehouses.
The Introduction to the data pipeline management with Airflow training course is designed to familiarize participants with the use of Airflow schedule and maintain numerous ETL processes running on a large scale Enterprise Data Warehouse.
Table of contents:
- Introduction to Airflow
- Introduction to Airflow core concepts (DAGs, tasks, operators, sensors)
- Airflow UI
- Airflow Scheduler
- Airflow Operators & Sensors
- Advance Airflow Concepts (Hooks, Connections, Variables, Templates, Macros, XCom)
- SLA, Monitoring & Alerting
- Code examples
Participants should have a technology background, basic programming skills in Python and be open to sharing their thoughts and questions.
Participants need to bring their laptops. The examples tested on mac & ununtu machines. Participants can use any hosted airflow solutions such as Google cloud composer or Astronomer
-
install sqllite3
-
run
./airflow scheduler
to start the airflow scheduler. The installation script will install all the dependencies -
run in another terminal
./airflow webserver
-
on your browser visit
http://localhost:8080
to access airflow UI
Interested in contributing? Improving documentation? Adding more example? Check out Contributing.md
As stated in the License file all lecture slides are provided under Creative Commons BY-NC 4.0. The exercise code is released under an MIT license.
Author: