Skip to content
/ jibby Public

High-performance streaming JSON-to-BSON decoder in Go

License

Notifications You must be signed in to change notification settings

xdg-go/jibby

Repository files navigation

jibby - High-performance streaming JSON-to-BSON decoder in Go

Go Reference Go Report Card Github Actions codecov License

jibby: A general term to describe an exceptionally positive vibe, attitude, or influence.

~ Urban Dictionary

The jibby package provide high-performance conversion of JSON objects to BSON documents. Key features include:

  • stream decoding - white space delimited or from a JSON array container
  • no reflection
  • minimal abstraction
  • minimal copy
  • allocation-friendly

Examples

import (
	"bufio"
	"bytes"
	"log"

	"github.com/xdg-go/jibby"
)

func ExampleUnmarshal() {
	json := `{"a": 1, "b": "foo"}`
	bson := make([]byte, 0, 256)

	bson, err := jibby.Unmarshal([]byte(json), bson)
	if err != nil {
		log.Fatal(err)
	}
}

func ExampleDecoder_Decode() {
	json := `{"a": 1, "b": "foo"}`
	bson := make([]byte, 0, 256)

	jsonReader := bufio.NewReaderSize(bytes.NewReader([]byte(json)), 8192)
	jib, err := jibby.NewDecoder(jsonReader)
	if err != nil {
		log.Fatal(err)
	}

	bson, err = jib.Decode(bson)
	if err != nil {
		log.Fatal(err)
	}
}

Extended JSON

Jibby optionally supports the MongoDB Extended JSON v2 format. There is limited support for the v1 format -- specifically, the $type and $regex keys use heuristics to determine whether these are extended JSON or MongoDB query operators.

Escape sequences are not supported in Extended JSON keys or number formats, only in naturally textual fields like $symbol, $code, etc. In practice, MongoDB Extended JSON generators should never output escape sequences in keys and number fields anyway.

Limitations

  • Maximum depth defaults to 200 levels of nesting (but is configurable)
  • Only well-formed UTF-8 encoding (including optional BOM) is supported.
  • Numbers (floats and ints) must conform to formats/limits of Go's strconv library.
  • Escape sequences not supported in extended JSON keys and some extended JSON values.

Testing

Jibby is extensively tested.

Jibby's JSON-to-BSON output is compared against reference output from the MongoDB Go driver. Extended JSON conversion is tested against the MongoDB BSON Corpus.

JSON parsing support is tested against data sets from Nicholas Seriot's Parsing JSON is a Minefield article. It behaves correctly against all "y" (must support) tests and "n" (must error) tests. It passes all "i" (implementation defined) tests except for cases exceeding Go's numerical precision or with invalid/unsupported Unicode encoding.

Performance

Performance varies based on the shape of the input data.

For a 92 MB mixed JSON dataset with some extended JSON:

           jibby 283.46 MB/s
   jibby extjson 207.42 MB/s
   driver bsonrw 43.77 MB/s
naive json->bson 43.25 MB/s

For a 4.3 MB pure JSON dataset with lots of arrays:

           jibby 107.15 MB/s
   jibby extjson 123.76 MB/s
   driver bsonrw 25.68 MB/s
naive json->bson 32.78 MB/s

For a 1 GB JSON dataset with small documents (~130 bytes) with only integer values:

           jibby 97.55 MB/s
   jibby extjson 97.44 MB/s
   driver bsonrw 18.20 MB/s
naive json->bson 14.51 MB/s

The jibby and jibby extjson figures are jibby without and with extended JSON enabled, respectively. The driver bsonrw figures use the MongoDB Go driver in a streaming mode with bsonrw.NewExtJSONValueReader. The naive json->bson figures use Go's encoding/json to decode to map[string]interface{} and the Go driver's bson.Marshal function.

Copyright and License

Copyright 2020 by David A. Golden. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0