Main Features

Optimized columnar storage on HDFS, S3, MinIO, local FS, GCS, and Redis. Significantly outperforms Parquet and ORC.
A distributed in-memory columnar cache that further improves I/O performance for data analytics.
External query engine integrations for Trino, Presto, Hive, and DuckDB. Significantly improves query performance in these query engines.
Metadata (database schema and data catalog) management for data lakes and warehouses.
Internal query accelerator (Pixels-Turbo) in serverless computing environments, including AWS Lambda and vHive (K8S+Knative+Firecracker).
REST query API that exposes Pixels as a serverless analytics service for external users.

Release Notes

What's Changed

debug hive by @bianhq in #1
make pixels-hive usable by @bianhq in #7
Finish pixels-hive by @bianhq in #8
refine docs of pixels-hive. by @bianhq in #9
refine pixels-hive doc and fix pixels-load. by @bianhq in #10
Refine code and docs. by @bianhq in #11
add comments by @bianhq in #18
Rename the prefix of packages to io.pixelsdb by @ray6080 in #22
[Issue 24]: add balancers for pixels cache. by @bianhq in #25
[Issue #24]: move file if it is not local by @bianhq in #28
HOTFIX: upgrade fastjson to version 1.2.58 for security by @bianhq in #29
[Issue #30]: make NUMA interleaved. by @bianhq in #31
[Issue #30]: add start-vmtouch and make it NUMA interleaved. by @bianhq in #32
[HOTFIX]: remove mysql-connector from dependencies due to license conflicts. by @bianhq in #33
[HOTFIX]: remove or upgrade insecure packages from dependencies. by @bianhq in #34
[HOTFIX]: rollback jackson to 2.8.1. by @bianhq in #35
[Issue #36]: add copyright and license notice. by @bianhq in #37
Revert "[Issue #36]: add copyright and licence notice." by @bianhq in #38
[Issue #36]: add copyright and license notice. by @bianhq in #39
[Issue #36]: fix copyright and licence notice. by @bianhq in #40
[Issue #36]: rename LICENCE file. by @bianhq in #41
[Issue #36]: update NOTICE. by @bianhq in #42
HOTFIX: import external jars when starting pixels. by @bianhq in #43
HOTFIX: hive docs. by @bianhq in #45
[Issue #47]: using different lock files for coordinator and datanode daemons. by @bianhq in #48
[Issue #49] fix bugs in VarcharArrayBlock by @bianhq in #50
[Issue #52]: implement direct shared memory read. by @bianhq in #53
[Issue #52]: reduce memory copy in BooleanColumnReader and IntegerColumnReader. by @bianhq in #54
[Issue #55]: reduce memory copy in column readers. by @bianhq in #56
[HOTFIX]: bug in DynamicIntArray.toArray() and redundant memory copy. by @bianhq in #57
[Issue #58]: add close method to resources. by @bianhq in #59
Issue #58: add gc threshold to optimize gc for small queries. by @bianhq in #62
[Issue #44]: implement Etcd metadata store. by @bianhq in #65
[Issue #67] implement cache read / write coordination. by @bianhq in #68
[Issue #67] implement three-phase cache update. by @bianhq in #69
Hotfix: fix bugs in CacheWriter initialization and cache update. by @bianhq in #70
HOTFIX: clean unused exceptions. by @bianhq in #71
[Issue #72]: optimize memory allocation and access in PixelsCacheReader. by @bianhq in #73
[Issue #72]: fix bugs in pixels-cache and implement loading radix from index file. by @bianhq in #77
[Issue #67]: Implement cache read lease and optimize read performance. by @bianhq in #79
[Issue #78]: avoid cache probing on uncached tables in Presto. by @bianhq in #80
[Issue #78]: avoid cache probing on uncached tables in Hive. by @bianhq in #81
[Issue #83]: implement JIT splitting for ordered path. by @bianhq in #84
[Issue #85]: fix list table error when schema is empty. by @bianhq in #86
[Issue #87]: remove explicit gc from PixelsReader. by @bianhq in #89
[Issue #88]: fix MAX_READER_COUNT. by @bianhq in #90
[Issue #91]: use three bytes for cache reader count. by @bianhq in #92
[Issue #94]: Support Date and Time types. by @bianhq in #95
[Issue #98]: fix insert/update related metadata service and getLayout. by @bianhq in #101
[Issue #99]: fix null value storage. by @bianhq in #102
[Issue #103]: fix and enhance predicate processing. by @bianhq in #104
[Issue #105]: fix endless execution for select count(*). by @bianhq in #106
[Issue #100]: refine type management and add varchar/char support. by @bianhq in #107
[Issue #108]: replace hdfs FileSystem api with the unified Storage api. by @bianhq in #110
[Issue #108]: implement LocalFS and global auto-increment id. by @bianhq in #112
[Issue #113]: collect the cumulative memory usage in pixels record reader. by @bianhq in #116
[Issue #115]: replace message queue implementation. by @bianhq in #117
[Issue #114]:support S3 storage and asynchronous I/O scheduling. by @bianhq in #118
[Issue #120]: support configurable S3 clients. by @bianhq in #122
[Issue #121]: fix listing objects. by @bianhq in #123
[Issue #124]: refine read path. by @bianhq in #125
[Issue #126]: add rate limit and request retry policy. by @bianhq in #127
[Issue #128]: implement request diversion and refine java package layout. by @bianhq in #130
[Issue #131]: implement projections for compact layout. by @bianhq in #134
[Issue #133]: fix and refine retry policy. by @bianhq in #135
[Issue #136]: refine thread factory for async read using sync client. by @bianhq in #137
[Issue #132]: upgrade supported Presto version from 0.192 to 0.215. by @bianhq in #138
[Issue #142]: fix mbps rate-limit. by @bianhq in #143
[Issue #145]: fix date type for Presto-0.215. by @bianhq in #148
[Issue #149]: fix configuration and dependency. by @bianhq in #150
[Issue #153]: add adaptive reading method. by @bianhq in #154
[Issue #144]: fix scripts and finish docs. by @bianhq in #157
[Issue #156]: fix empty schema. by @bianhq in #159
[Issue #158]: support loading data from and to arbitrary storage. by @bianhq in #161
[Issue #160]: compact from and to arbitrary storage, including tail files. by @bianhq in #162
[Issue #163]: fix bounded varchar/char type support. by @bianhq in #165
[Issue #164]: define storage scheme in CREATE statement. by @bianhq in #166
[Issue #167]: fix retained size calculation of VarcharArrayBlock. by @bianhq in #168
[Issue #169]: add session properties about layout-path enabling to pixels-presto. by @bianhq in #171
[Issue #172]: implement record cursor and enhance record reader. by @bianhq in #173
[Issue #174]: implement transaction server and pass query (trans) id into record reader. by @bianhq in #176
[Issue #175]: enhance transaction info of queries and pass query id into I/O schedulers. by @bianhq in #177
[Issue #179]: fix show tables from information_schema. by @bianhq in #180
[Issue #178]: enable metrics server by configuration parameter. by @bianhq in #182
[Issue #183]: fix show columns from information_schema. by @bianhq in #184
[Issue #181]: fix start and stop of pixels-daemon. by @bianhq in #185
[Issue #186]: skip cache initialization if cache is disabled. by @bianhq in #187
[Issue #139]: stop retrying the request from terminated queries. by @bianhq in #188
[Issue #190]: fix data copying between S3 buckets. by @bianhq in #191
[Issue #194]: support views. by @bianhq in #195
[Issue #196]: refine type management and support decimal. by @bianhq in #197
[Issue #196]: fix DecimalColumnVector and use decimal in TPC-H schema. by @bianhq in #200
[Issue #192]: support multi-thread compaction. by @bianhq in #201
[Issue #192]: refine compact, S3, PixelsCompactor, and support multi-thread copying. by @bianhq in #202
[Issue #205]: fix integer overflow in request merging. by @bianhq in #206
[Issue #210]: some minor improvements. by @xxchan in #204
[Issue #208]: fix column statistics for decimal. by @bianhq in #211
[Issue #207]: fix the data type metadata in the file footer. by @bianhq in #212
[Issue #209]: clean code. by @bianhq in #213
[Issue #189]: support folders on S3. by @bianhq in #215
[Issue #189]: revise docs and scripts. by @bianhq in #216
[Issue #217]: split presto and hive integrations into sub-projects. by @bianhq in #218
[Issue #220] revise license, readme, and pom. by @bianhq in #221
[Issue #222]: disable mock file locations for storage systems that do not provide data locality. by @bianhq in #223
[Issue #170]: implemented scan operator. by @TiannanSha in #229
[Issue #224]: clean and refine cache key and cache entry implementation. by @bianhq in #230
[Issue #231]: update docs and comments. by @bianhq in #232
[Issue #225]: upgrade to Hadoop 3.3.1 and clean dependencies. by @bianhq in #234
[Issue #231]: enable storage schemes in configuration. by @bianhq in #235
[Issue #233]: fix log4j configurations. by @bianhq in #236
[Issue #193]: revise the README under modules. by @bianhq in #237
[Issue #170]: fix dependencies and logging for pixels-lambda. by @bianhq in #243
[Issue #238]: Add a script to install pixels by @xxchan in #239
[Issue #170]: implement filter. by @TiannanSha in #246
[Issue #170]: clean the unused files and reformat the code. by @bianhq in #247
[Issue #170]: update poms for pixels-lambda. by @bianhq in #248
[Issue #249]: remove InvalidActivityException. by @bianhq in #250
[Issue #245]: support reading remote config file. by @bianhq in #252
[Issue #170]: optimize Pixels S3 writer and lambda. by @bianhq in #253
[Issue #170]: implement table scan filter and refine scan worker. by @bianhq in #254
[Issue #170]: support direct write back to on-premise minio. by @bianhq in #255
[Issue #170]: support lambda scan. by @bianhq in #256
[Issue #170]: fix the discrete filter for string-based columns. by @bianhq in #257
[Issue #170]: fix S3 folder deletion. by @bianhq in #259
HOTFIX: refine comments. by @bianhq in #260
[Issue #261]: move table scan predicates into pixels-executor. by @bianhq in #262
[Issue #170]: implement hash partitioned join. by @bianhq in #263
[Issue #170]: implement broadcast join. by @bianhq in #264
[Issue #265]: fix reading row group number. by @bianhq in #266
[Issue #268]: fix null value check for join. by @bianhq in #272
[Issue #271]: improve discrete column filter. by @bianhq in #273
[Issue #270]: support full outer join. by @bianhq in #275
[Issue #170]: enhance joins and implement join tree executor. by @bianhq in #281
[Issue #170]: support join endian. by @bianhq in #282
[Issue #170]: disable left full outer broadcast join. by @bianhq in #283
[Issue #170]: fix join input splits generation, refine join inputs and join operator. by @bianhq in #284
[Issue #170]: fix and refine join inputs and join workers. by @bianhq in #285
[Issue #258]: implement table and column statistics. by @bianhq in #287
[Issue #258]: add join advisor and fix join execution. by @bianhq in #288
[Issue #258]: support multi-pipeline join and fix bugs. by @bianhq in #289
[Issue #258]: implement partitioned chain join. by @bianhq in #290
[Issue #258]: implement split size capping. by @bianhq in #291
[Issue #170]: add invoker factory and get worker name from config file. by @bianhq in #292
[Issue #170]: implement work exception handling and join output collection. by @bianhq in #293
[Issue #294]: fix blocking splits when lambda scan is enabled. by @bianhq in #295
[Issue #203]: support long decimal with 38 max digit precision and scale. by @bianhq in #296
[Issue #297]: fix timestamp type, stat recorders, null value filtering, and pixels-load. by @bianhq in #298
[Issue #258]: upgrade Prometheus dependencies. by @josephhany in #300
[Issue #170] implement aggregation execution. by @bianhq in #301
[Issue #170]: fix null-pointer in scan worker. by @bianhq in #302
[Issue #170]: fix column stats recorders and split size capping. by @bianhq in #303
[Issue #305]: support deleting more than 1000 files from S3. by @bianhq in #306
[Issue #170]: support scan projection in scan worker. by @bianhq in #307
[Issue #170]: implement min/min in column stats in metadata. by @bianhq in #308
[Issue #170]: fix aggregation worker. by @bianhq in #309
[Issue #170]: improve multi-thread copying. by @bianhq in #310
[Issue #170] add metadata cache and cost-based splits index. by @bianhq in #311
[Issue #170]: merge outputs in lambda worker. by @bianhq in #312
[Issue #170]: fix loading path. by @bianhq in #313
[Issue #170]: add row count broadcast threshold. by @bianhq in #314
[Issue #170]: join optimization for very large datasets. by @bianhq in #315
[Issue #170]: optimizations for large joins. by @bianhq in #316
[Issue #170]: optimizing hash functions. by @bianhq in #317
[Issue #170]: implement partition projection. by @bianhq in #318
[Issue #170]: improve execution pipeline. by @bianhq in #319
[Issue #170]: improve join algorithm selection and broadcast split size adjustment. by @bianhq in #321
[Issue #170]: add script to run before each new instance became in-service. by @TiannanSha in #322
[Issue #170]: using multi-thread for column encoding in Pixels writer. by @bianhq in #323
[Issue #170]: update spot scripts. by @bianhq in #324
[Issue #170] collect performance metrics from serverless workers. by @bianhq in #325
[Issue #170]: add trans concurrency and GC monitor, and tune log level. by @bianhq in #326
[Issue #170]: update aggregation plan and spot vm user data. by @bianhq in #327
[Issue #170]: improve get num partitions. by @bianhq in #329
[Issue #170]: remove existence check from workers. by @bianhq in #330
[Issue #214]: implement multi-thread S3 output stream. by @bianhq in #331
[Issue #214]: enable retry policy in S3OutputStream. by @bianhq in #332
[Issue #170]: improve dictionary encoding and metrics collection. by @bianhq in #333
[Issue #170]: add startling executor and fix the inputs of multi-pipeline broadcast join. by @bianhq in #334
[Issue #170] fix hang in partitioned join worker. by @bianhq in #335
[Issue #170]: optimize file existence checking in getFileSchema. by @bianhq in #336
[Issue #170]: support Redis storage. by @bianhq in #337
[Issue #170]: support default user in Redis. by @bianhq in #338
[Issue #170]: fix null value processing in aggregation. by @bianhq in #339
[Issue #170]: improve string comparison and aggregation. by @bianhq in #340
[Issue #170]: fix empty file problem in aggregation. by @bianhq in #341
[Issue #170]: add partitioning to aggregation and implement starling aggregation. by @bianhq in #342
[Issue #170]: add null fraction and cardinality statistics into pixels-load. by @bianhq in #343
[Issue #170]: support cardinality estimation for aggregation. by @bianhq in #344
[Issue #345]: fix double start of retry policy. by @bianhq in #346
[Issue #347]: reconnect to S3 when fail to get object. by @bianhq in #348
[Issue #170]: support count aggregation. by @bianhq in #349
[Issue #350]: remove request division. by @bianhq in #351
[Issue #352]: support google cloud storage. by @bianhq in #353
pixels partitioned cache protocol by @Yeeef in #355
[Issue #357]: fix compilation and dependency problem. by @bianhq in #358
[Issue #357] clean unused files. by @bianhq in #359
[Issue #170]: move the code related to query planning to pixels-optimizer. by @bianhq in #360
[Issue #170]: refine query queues. by @bianhq in #361
[Issue #357]: update readme for pixels-trino. by @bianhq in #362
[Issue #357]: refine auto scaling metrics. by @bianhq in #363
[Issue #357]: update metrics collector. by @bianhq in #364
[Issue #365]: fix statistics collection. by @bianhq in #366
[Issue #367]: fix BinaryColumnVector for dictionary encoding. by @bianhq in #368
[Issue #369]: implement dictionary-encoded column vector. by @bianhq in #370
[Issue #371]: update metadata for view creation in Trino. by @bianhq in #372
[Issue #373]: support direct read on localFS. by @bianhq in #375
[Issue #374]: fix the column vectors. by @bianhq in #376
[Issue #377]: fix ByteBufferInputStream. by @bianhq in #378
[Issue #379]: support configurable direct/non-direct I/O in LocalFS. by @bianhq in #382
[Issue #380]: rename pixels-optimizer to pixels-planner. by @bianhq in #383
[Issue #381]: move out pixels-load and pixels-tools. by @bianhq in #384
[Issue #386]: fix getRowNumber in PixelsRecordReaderImpl. by @bianhq in #387
[Issue #385]: refine docs and some comments, and fix timestamp format for AWS CloudWatch metrics. by @bianhq in #389
[Issue #388]: fix encoded column vector reading. by @bianhq in #390
[Issue #394] fix non-encoded integer column reading. by @bianhq in #395
[Issue #393] support mmap in local file systems. by @bianhq in #396
[Issue #397] update install.sh. by @bianhq in #398
[Issue #399] update docs and support listing the paths and statuses of the files in multiple directories. by @bianhq in #400
[Issue #401] support different input and output storage scheme in compactor and create parent dir automatically for local fs. by @bianhq in #402
[Issue #403] support async read on local fs. by @bianhq in #404
[Issue #405] refine configuration properties. by @bianhq in #406
[Issue #405] fix comments. by @bianhq in #407
[Issue #409]: add an http server that provides restful api. by @bianhq in #411
[Issue #408]: improve exception handling in pixels-daemon. by @bianhq in #412
[Issue #410] implement the REST API for SQL execution. by @bianhq in #413
[Issue #414] pixels reads replicated content on string column after the first row batch, when the column is not encoded. by @yuly16 in #415
[Issue #417] the data in integer and long column reader is aligned by @yuly16 in #418
[Issue #416] build pixels-turbo, split storage adapters, cf invokers, and scaling handlers into separate modules by @bianhq in #420
[Issue #421] init pixels-proxy. by @bianhq in #425
[Issue #419] fix timezone offset for date column. by @bianhq in #422
[pixels-cli] fix session properties for stat. by @bianhq in #427
[Issue #421] update docs and clean Date related code. by @bianhq in #428
[Issue #421] implement basic pixels-proxy. by @bianhq in #430
[Issue #429] upgrade Prometheus and exporters. by @bianhq in #432
[Issue #433] prepare for vhive integrations. by @bianhq in #434
[Issue #431] update docs, comments, and error print. by @bianhq in #440
[Issue #441] add grpc example into pixels-server. by @bianhq in #442
[Issue #443] add add-opens into manifest. by @bianhq in #444
[docs] split the main readme into docs. by @bianhq in #445
[docs] fix links. by @bianhq in #446
[docs] fix build instructions. by @bianhq in #447
[docs] refine readme. by @bianhq in #448
[docs] refine docs for Pixels Turbo. by @bianhq in #449
[Issue #450] fix lambda invoker unit tests and minor problems in lambda workers. by @bianhq in #451
[Issue #452] enable reading other storages than s3 in serverless workers. by @bianhq in #453
[docs] update pixels-turbo settings. by @bianhq in #454
SQLglot transpile integration by @voidforall in #455
Finish pixels parser by @voidforall in #456
[Issue #431] refine transaction protocol. by @bianhq in #458
[Issue #431] update docs and add file headers. by @bianhq in #461
[Issue #459] update row count automatically. by @bianhq in #462
[Issue #423] pixels vhive invoker and worker. by @zhaoshihan in #463
[Issue #431] finish pixels query server. by @bianhq in #464
[Issue #423] refine docs and add license headers. by @bianhq in #465
[Issue #466] fix storage endpoint config. by @bianhq in #467
[Issue #468] add operator name to the input of the cloud function workers. by @bianhq in #469
[Issue #468] improve operator name setting. by @bianhq in #470
[Issue #437] redesign the schema of metadata. by @bianhq in #474
[Issue #471] implement the C++ reader for pixels. by @yuly16 in #473
[docs] add Duckdb and C++ reader introduction. by @bianhq in #475
HOTFIX: catalog metadata id by @voidforall in #476
[docs] remove outdated information for statistics collection. by @bianhq in #478
[docs] revise data compaction. by @bianhq in #479
[Issue #423] modify settings for vHive worker & invoker by @zhaoshihan in #481
[Issue #482] fix incorrect join plan for broadcast join after complete broadcast chain join. by @bianhq in #483
[Issue #484] fix post partition for right side broadcast chain join. by @bianhq in #486
[Issue #487] add argument check in pixels-planner. by @bianhq in #488
[Issue #435] remove request id from scan output path. by @bianhq in #489
[docs] Update install.md: Maven version should be above or equal to 3.6. by @yuly16 in #494
[Issue #471] Integrate pixels reader c++. by @yuly16 in #496
[Issue #391] minor changes. by @bianhq in #497
[Issue #493] Align the location of column byte buffer in pxl file. by @yuly16 in #495
[Issue #498] remove the orders array from dictionary encoding. by @bianhq in #499
[Issue #498] fix wrrong offset in buffer read. by @bianhq in #500
[Issue #485] clean metadata cache and make it transactional. by @bianhq in #501
[Issue #471] some fix for c++ reader. by @yuly16 in #502
[Issue #490] support relaxed and best-effort query execution. by @bianhq in #503
[docs] Update pixels-turbo/pixels-worker-vhive/README.md by @jasha64 in #507
[Issue #491] fix query service and revise comments. by @bianhq in #509
[Issue #498] remove the orders array from StringColumnReader in C++. by @yuly16 in #510
[Issue #512] Fix the out-of-range issue caused by splitting a row by @yuly16 in #513
[Issue #514] Change pixels c++ column vector from 4k alignment to 32 byte alignment by @yuly16 in #515
[Issue #516] enhance exception handling in base workers by @bianhq in #518
[Issue #519] fix wrong column chunk offsets in a multi-row-group file by @bianhq in #520
[Issue #517] load single tbl file into multiple paths by @bianhq in #523
[Issue #524] using full path as the key in file footer cache by @bianhq in #525
[Issue #522] add layout configurations into the file format by @bianhq in #526
[Issue #527] Fix the bug that iovecs might free twice by @yuly16 in #528
[Issue #521] support chunk aligned compact file by @bianhq in #532
[Issue #471] pixels reader c++ code refactor by @yuly16 in #533
[Issue #531] support configurable endianness by @bianhq in #534
[Issue #535] change project version to 0.1.0 by @bianhq in #536

New Contributors

@bianhq and @ray6080 were the first two authors of the project. They co-authored the basic framework of Pixels
@taoyouxian made contributions to the Presto and Hive integrations of Pixels
@mzp0514 made contributions to the initial implementation of data upserts in Pixels
@xxchan made contributions to the Trino integration of Pixels and the initial implementation of snapshot query execution
@TiannanSha made contributions to serverless query acceleration in AWS Lambda
@josephhany made contributions to exploring query cost estimation solutions
@Yeeef made contributions to extending in-memory columnar cache to SSDs
@yuly16 made contributions to the implementation of the C++ reader and DuckDB integration of Pixels, and improved the query performance on SSDs
@voidforall made contributions to the query service implementation and the hybrid query execution in DuckDB
@zhaoshihan made contributions to serverless query execution and performance profiling in vHive
@jasha64 made contributions to SSD performance benchmarking and serverless query execution in vHive

Full Changelog: https://github.com/pixelsdb/pixels/commits/v0.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pixels 0.1.0

Main Features

Release Notes

What's Changed

New Contributors

Contributors