-
Notifications
You must be signed in to change notification settings - Fork 45
Table Data Structures
Workbench stores user data in various formats, on disk and in memory. Here are the most important formats.
Who stores it: a user, through HTTP form controls
What it looks like: "params" and "secrets" are dataclasses in Python, JSON Objects in JavaScript, and HTML forms from the user's point of view. Think of them as a JSON Object.
When it's stored: when a user creates a Step or or changes its parameters.
When it's read: in the Step's module's fetch()
and/or render()
function.
Where we store it: we store "params" and "secrets" as two columns in the step
table.
Why store it: so the user can make a Step do something useful.
The most obvious example of turning a parameter into a table is pastecsv
. The user pastes CSV, creating params={"csv":"A,B\na,b","has_header":True}
. pastecsv.render()
turns this input into a table that looks like "A": "a", "B": "b"
.
Who stores it: a module's fetch()
function.
What it looks like: whatever the module wants. For instance, loadurl
and gdrive
store HTTP-response headers and body in a custom format we call httpfile
. Design your module's fetch-result file to look as close to "raw data" as is practical.
When it's stored: when a module's fetch()
function runs.
When it's read: in a module's render()
function.
Where we store it: the stored-objects
object-storage bucket, with a path like {workflowId}/{stepId}/{uuid}
. The UUID is stored in the StoredObjects
database table.
Why store it: so a module author may edit render()
. The render()
function will parse the raw file; and if there's a bug, parsing might fail. The user shouldn't lose data when a module has a bug.
Many Workbench modules store files in Apache Parquet format. As a special case, Workbench automatically reads Parquet files. We recommend you do not pre-process fetched data to store it in Parquet format, since that defeats the purpose of this data layer. (See Why above.)
We encourage you to store using data formats that can be reused between modules. For instance, googlesheets
and loadurl
both store similar data: an HTTP response. They share a custom format we call "httpfile".
Fetch results are stored forever.When you deploy a fetch()
function, its output will be fed to every future version of render()
. Don't deploy your module until you choose a data format you can support forever. A cautionary tale: googlesheets.fetch()
and loadurl.fetch()
output Parquet files from 2017 to 2019. Now, their render()
functions must still support Parquet fetch-result files, to handle fetch results from 2017-2019.