Skip to content
View baolsen's full-sized avatar

Block or report baolsen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
baolsen/README.md

Hi there 👋

github-snake

I'm a data engineer with over a decade of experience working with data across various industries, including Telcos and Financial Services in South Africa and EMEA.

🔱 Like a digital Neptune I have navigated the tumultous seas of small and big data, from the stormy shores of SAS, through the vast ocean of on-premise Hadoop and on to the cloud-kissed heights of AWS. I've shaped raw data into pearls of wisdom for decision-makers.

Evidently, I can also use Chat-GPT.

⚡ I am proficient in Python, PySpark, Terraform and SQL.

I have spent hours coaxing enterprise Hadoop into submission, wrangling with data, partitioning and re-partitioning.... and copying data from A to B, then B to C, then from C back to A. Yes, they pay me for that.

💬 When I'm not knee-deep designing data arctitecture or processing petabytes of data, you can find me pondering life's great questions. Like whether Kerberos would produce a better error message if I first sacrificied a few lambs (unfortunately none of his three heads posses the gift and curse of speech).

I possess excellent technical skills, with the ability to pick up new technologies quickly.

🔭 Currently I head up Data Engineering at cloudandthings.io , and oversee various data engineering projects. I've also developed a tool that I'm quite proud of to sync petabytes of data in near-realtime.

📫 Whether you're dreaming of scalable architectures or just want someone to geek out with about the latest data tech, you can reach me at [email protected]

Pinned Loading

  1. apache/airflow apache/airflow Public

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Python 37.3k 14.3k

  2. hashicorp/terraform-provider-aws hashicorp/terraform-provider-aws Public

    The AWS Provider enables Terraform to manage AWS resources.

    Go 9.9k 9.2k

  3. getmoto/moto getmoto/moto Public

    A library that allows you to easily mock out tests based on AWS infrastructure.

    Python 7.7k 2.1k

  4. apache/nifi apache/nifi Public

    Apache NiFi

    Java 4.9k 2.7k

  5. treeverse/lakeFS treeverse/lakeFS Public

    lakeFS - Data version control for your data lake | Git for data

    Go 4.5k 359

  6. cloudandthings/terraform-aws-github-runners cloudandthings/terraform-aws-github-runners Public

    Simple to use, self-hosted GitHub Action runners. Uses EC2 spot instances with configurable AutoScaling.

    HCL 17 1