spark-ttree

Implementation of ROOT I/O designed to get TTrees into Spark DataFrames. Consists of the following three components:

DataSource - Spark DataSourceV2 implementation
ArrayInterpretation - Accepts raw TBasket byte ranges and returns deserialzed arrays
root_proxy - Deserializes ROOT metadata to locate TBasket byte ranges

The scope of this project is only to perform vectorized (i.e. column-based) reads of TTrees consisting of relatively simple branches -- fundamental numeric types and both fixed-length/jagged arrays of those types.

Usage example

Note that the most recent version number can be found here. To use a different version, replace 1.0.0 with your desired version

import pyspark.sql

spark = pyspark.sql.SparkSession.builder \
    .master("local[1]") \
    .config('spark.jars.packages', 'edu.vanderbilt.accre:laurelin:1.0.0') \
    .getOrCreate()
sc = spark.sparkContext
df = spark.read.format('root') \
                .option("tree", "tree") \
                .load('small-flat-tree.root')
df.printSchema()

Known issues/not yet implemented functionality

The I/O is currently completely unoptimized -- there is no caching or prefetching. Remote reads will be slow as a consequence.
Arrays (both fixed and jagged) of booleans return the wrong result
Float16/Doubles32 are currently not supported
String types are currently not supported
C++ STD types are currently not supported (importantly, std::vector)

Name		Name	Last commit message	Last commit date
Latest commit History 474 Commits
.github/workflows		.github/workflows
config		config
docs		docs
scripts		scripts
src		src
testdata		testdata
.checkstyle		.checkstyle
.codeclimate.yml		.codeclimate.yml
.gitignore		.gitignore
.travis.settings.xml		.travis.settings.xml
.travis.yml		.travis.yml
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-ttree

Usage example

Known issues/not yet implemented functionality

About

Releases 4

Packages

Contributors 5

Languages

License

spark-root/laurelin

Folders and files

Latest commit

History

Repository files navigation

spark-ttree

Usage example

Known issues/not yet implemented functionality

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 5

Languages

Packages