Create semantic dataflow graphs of data science code.
Using this package, you can convert data science code to dataflow graphs with semantic content. The package works in tandem with the Data Science Ontology and our language-specific program analysis tools. Currently Python and R are supported.
For more information, please see our research paper on "Teaching machines to understand data science code by semantic enrichment of dataflow graphs".
We provide a CLI that supports the recording, semantic enrichment, and
visualization of flow graphs. To set up the CLI, install this package and add
the bin
directory to your PATH
. Invoke the CLI by running flowgraphs.jl
in your terminal.
The CLI includes the following commands:
record
: Record a raw flow graph by running a script.
Requirements: To record a Python script, you must install the Julia package PyCall.jl and the Python package flowgraph. Likewise, to record an R script, you must install the Julia package RCall.jl and the R package flowgraph.enrich
: Convert a raw flow graph to a semantic flow graph.visualize
: Visualize a flow graph using Graphviz.
Requirements: To output an image, using the--to
switch, you must install Graphviz.
All the commands take as primary argument either a directory, which is filtered by file extension, or a single file, arbitrarily named.
Record all Python/R scripts in the current directory, yielding raw flow graphs:
flowgraphs.jl record .
Convert a raw flow graph to a semantic flow graph:
flowgraphs.jl enrich my_script.py.graphml --out my_script.graphml
Visualize a semantic flow graph, creating and opening an SVG file:
flowgraphs.jl visualize myscript.graphml --to svg --open