Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Variant Type #1436

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

SpencerTorres
Copy link
Member

@SpencerTorres SpencerTorres commented Nov 21, 2024

Summary

Draft PR to discuss the possible Variant implementations. By implementing Variant, we can re-use a large portion of the logic for Dynamic, and then JSON. Partially resolves #1430.

Implementation

This implementation adds 3 major types to the module:

  • ColVariant - the column implementation for (de)serialization
  • Variant - a container to hold variant values (optional for (de)serialization)
  • VariantWithType - an extension of Variant, with the ability to provide a preferred type in cases where it is ambiguous to existing column type detection (such as Array(UInt8) vs String)

ColVariant

Serialization

// Variant(Array(Map(String, String)), Array(UInt8), Bool, Int64, String)
batch, err := conn.PrepareBatch(ctx, "INSERT INTO test_variant (c)")
require.NoError(t, err)
require.NoError(t, batch.Append(42)) // Accepts primitives
require.NoError(t, batch.Append(chcol.NewVariantWithType("test", "String"))) // Accepts Variants with type preference
require.NoError(t, batch.Append(true))
require.NoError(t, batch.Append(chcol.NewVariant([]uint8{0xA, 0xB, 0xC}).WithType("Array(UInt8)"))) 
require.NoError(t, batch.Append(nil)) // Accepts nil
require.NoError(t, batch.Append([]map[string]string{{"key1": "val1"}, {"key2": "val2"}})) // Accepts complex types

When values are appended via col.AppendRow(), the input v interface{} type is checked. If it is nil, a Null discriminator is appended. If it is a VariantWithType, then the specified column type will be appended along with its matching discriminator. The underlying column's AppendRow function is re-used so that we don't need to re-implement its logic.

As a catch-all, the input value will be tested against each column type until it succeeds. For example, Variant(Bool, Int64, String) will try to append as bool, int64, then string. If a value does not fit into any column type, it will return an error.

Sometimes types will conflict. Due to alphabetical sorting of the type, Array(UInt8) would be used before String since Array allows for string input. I have researched different solutions to this, including a type priority system, but it would be complex to implement. For now it is easiest to let the user simply input NewVariantWithType(int64(42), "Int64") or NewVariant(int64(42)).WithType("Int64") if they want a specific type within the variant. For complex types like maps, reflection will be used if a type isn't specified.

After all rows are appended, the Native format is used to serialize the data into the buffer. First with serializationVersion, then the uint8 array for discriminators, then each column's Encode function is re-used as usual (similar to Tuple).

Deserialization

The Native format deserializes the discriminators and builds a set of offsets for each column. This allows for storing multiple columns with mixed lengths. When the user wants to read a row, we can index into the correct row of each column to get the corresponding type.

In practice this looks like this:

var row chcol.Variant // Scan into variant

require.True(t, rows.Next())
err = rows.Scan(&row)
require.NoError(t, err)
require.Equal(t, int64(42), row.MustInt64()) // Variant provides convenience functions for returning a primitive

Or, if you know your types ahead of time, you can also scan directly into it:

var i int64 // Scan directly into int64
require.True(t, rows.Next())
err = rows.Scan(&i)
require.NoError(t, err)
require.Equal(t, int64(84), i)

This pattern works by simply calling the underlying column's ScanRow function. It is safest to scan into Variant however.
If you need to switch types on Variant for your own type detection, you can use variantRow.Any() or variantRow.Interface() to return any/interface{} respectively (provided both for preferred semantics).

Variant

Variant is simply a wrapper around any. It implements stdlib sql interfaces such as driver.Value and Scan. It also has convenience functions for primitives such as Int64. If you need to access the underlying value you can use Any(). This type can be constructed with the NewVariant(v) function.

The Variant type should be used in structs and when scanning from ColVariant. It can also be used for insertion, although VariantWithType may be required if there's overlap between types.

VariantWithType

VariantWithType is the same as Variant, but with a string included to specify the preferred type. You can use this for insertion when the Variant column has types that overlap. For example if you had Variant(Array(UInt8), String), a Go string would be inserted as an Array(UInt8). If you wanted to force this to be a ClickHouse String, you could use NewVariantWithType(v, "String") to provide the preferred type. If the preferred type is not present in the Variant, the row will fail to append to the block. Types can be added on an existing Variant by calling exampleVariant.WithType(t string), which will return a new VariantWithType.

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG
  • For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

@mshustov mshustov mentioned this pull request Nov 22, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for new JSON type
1 participant