For further developers or Data Scientist

Is better if you work in Gitpod, its easily
Run

pipenv install

You will need to install the dependencies of the Pipfile.lock to make this project work.

How use this project

Clone into your computer (or gitpod).
Add your transformations into the ./transformations/<pipeline>/ folder.
Configure the project.yml to specify the pipeline and transformations in the order you want to execute them. Each pipeline must have at least one source and only one destination. You can have multiple sources if needed.
Add new transformation files as you need them, make sure to include expected_inputs and expected_output as examples. The expected inputs can be an array of dataframes for multiple sources.
Update your project.yml file as needed to change the order of the transformations.
Validate your transformations running $ pipenv run validate.
Run your pipeline by running $ pipenv run pipeline --name=<pipeline_slug>
If you need to clean your outputs :$ pipenv run clear

Transformations

import pandas as pd
import numpy as np

def run(df):
    # ...
    return df

Streaming data

Pipelines also allow string chunks of data. For example:

pipenv run pipeline --name=clean_publicsupport_fs_messages --stream=stream_sample.csv

Note: --stream is the path to a csv file that contains all the streams you want to test, if the CSV contains multiple rows, each of them will be considered a separate stream and the pipeline will run once for each stream.

Fining the stream parameter in the transformation

Make sure to specify the stream optional parameter in the transformation function:

import pandas as pd
import numpy as np

def run(df, stream=None):
    # ...
    return df

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
notebooks		notebooks
output		output
pipelines		pipelines
sources		sources
utils		utils
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
clear.py		clear.py
project.yml		project.yml
run.py		run.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

For further developers or Data Scientist

How use this project

Transformations

Streaming data

Fining the stream parameter in the transformation

About

Releases

Packages

Contributors 5

Languages

breatheco-de/dataflow-project-events

Folders and files

Latest commit

History

Repository files navigation

For further developers or Data Scientist

How use this project

Transformations

Streaming data

Fining the stream parameter in the transformation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages