Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i8 and u8 datatypes #85

Closed
d70-t opened this issue Mar 15, 2021 · 10 comments
Closed

i8 and u8 datatypes #85

d70-t opened this issue Mar 15, 2021 · 10 comments

Comments

@d70-t
Copy link

d70-t commented Mar 15, 2021

Thanks for building zarr.js!

I just started trying the library and accidentally tried to open a dataset which used <i8 data types and it crashed 😬 Then I discovered that this is because 64 bit integer types are missing in this mapping.

I don't have a good overview about the codebase and typescript in general yet, so I don't feel confident in preparing a PR, but it would be great if i8 and u8 would be supported as well.

@manzt
Copy link
Collaborator

manzt commented Mar 15, 2021

Hi, thanks for opening an issue! This is a known issue and somewhat hidden in the documentation: https://guido.io/zarr.js/#/getting-started/remote-data?id=int64-support

"int64 support in browsers is tricky. Because all numbers are internally represented as floating point numbers, whole numbers larger than 2^53-1 can not be reliably represented. BigIntand BigInt64Array solve this, but they are not supported in every modern browser yet (in particular Edge and Safari). For maximum compatibility save yourself some hassle and try to avoid int64/uint64 zarr arrays."

This is somewhat outdated as Edge has added BigInt64Arrays, but unfortunately Safari still does not. Adding support would include extending BigInt64Array and BigUint64Array to the mapping you have found:

export const DTYPE_TYPEDARRAY_MAPPING: { [A in DtypeString]: TypedArrayConstructor<TypedArray> } = {
  /* ... */
  '<i8': BigInt64Array,
  '>i8': BigInt64Array,
  '<u8': BigUint64Array,
  '>u8': BigUint64Array,
}

as well as the string literal types here.

However, the main issue is that this script will throw an Error in browsers where there is not support. One option is to prefix these imports with globalThis, (e.g. globalThis.BigInt64Array) which would evaluate to undefined in unsupported browsers. However, this will likely lead to hard to debug undefined errors. A better solution would be to change DTYPE_TYPEDARRAY_MAPPING to a function that safely returns the TypedArray constructor (or throws an error if not available). Thoughts @gzuidhof?

PS: As the documentation notes, it's easiest to avoid using 64-bit integers in zarr.js (and on the web in general because they are difficult to work with). If you have control over the data you are generating, I would recommend thinking about whether you really require the entire range of integers that 64-bit integer offers. If you can avoid using them, you will have an easier time in your web-application and the size of data transferred to your application will be much smaller. (Apologies if this is information you've already considered!)

@gzuidhof
Copy link
Owner

gzuidhof commented Mar 15, 2021

You summarized exactly what I would have typed too, I think we can map it to the correct type directly from globalThis (but even for globalThis we will need a fallback as it's only supported since a couple years or so).

What do you think about something like this?

const INT64TYPE = globalThis.BigInt64Array || () => {throw new Error("This runtime does not support ..."};
const UINT64TYPE = ...

export const DTYPE_TYPEDARRAY_MAPPING: { [A in DtypeString]: TypedArrayConstructor<TypedArray> } = {
  /* ... */
  '<i8': INT64TYPE,
  '>i8': UINT64TYPE,
  '<u8': UINT64TYPE,
  '>u8': UINT64TYPE, 
}

Maybe somewhere deep down these arrays will have weird behaviors still though, as they will return a BigInteger.. So the best practice of avoiding them in any Javascript environment is still true

@manzt
Copy link
Collaborator

manzt commented Mar 16, 2021

I think this is on the right path, but I don't think the anonymous function will work, unfortunately. The TypedArray constructors are called with the new keyword, so I don't think they are interchangeable and you will get a different, non-informative error.

const f = () => { throw new Error("I'm an error") };
new f();
// Uncaught TypeError: f is not a constructor

One pattern I've seen recently that might fit well here is using an IIFE to encapsulate any sort of setup logic. We can dynamically add to the mapping depending on what is available.

export const DTYPE_TYPEDARRAY_MAPPING: { [A in DtypeString]: TypedArrayConstructor<TypedArray> } = (() => {
  const mapping = { /* ... */ };
  if ("BigInt64Array" in globalThis) {
    mapping['<i8'] = globalThis.BigInt64Array;
    mapping['>i8'] = globalThis.BigInt64Array;
    mapping['<u8'] = globalThis.BigUint64Array;
    mapping['>u8'] = globalThis.BigUint64Array;
  } else {
    // Do something else? Options:
    //   1.) Wrap mapping in proxy that throws when accessing key `<i8`, ....
    //   2.) Don't do anything and will get unsupported dtype error like before.
  }
  return mapping;
})();

I like this pattern because it removes the constants from the global scope (not a big deal here bc we are using esm), but moreover because it can be dynamic and keeps the constants/logic in one place.

@d70-t
Copy link
Author

d70-t commented Mar 16, 2021

I'd like to give a general 👍, your suggestions look like a very good way forward. I tried to find out about how one could possible create some alternative implementation of BigInts, but as I learned, operator overloading is not a thing in JS, so this way seems to be infeasible as a third option. Thus, for now I'd be happy to get a clear notification that 64bit integers are not supported on the currently used browser once such an array is being accessed.

There's one thing regarding this comment:

Don't do anything and will get unsupported dtype error like before.

Currently the behavior I see is:

Unhandled Rejection (TypeError): Cannot read property 'BYTES_PER_ELEMENT' of undefined

from here. To me this is a very indirect way of indicating an "unsupported dtype error" 😬

@manzt
Copy link
Collaborator

manzt commented Mar 16, 2021

Ah, yes – thanks for pointing that our @d70-t . That is certainly not the behavior that we want! Loud and clear "dtype is not supported, got X", and not something cryptic. I'll work on a fix / better error msg.

@gzuidhof
Copy link
Owner

I'm not too picky, I think it is mostly a matter of flavor, any of the proposed solutions I would be happy with:

  • A plain function instead of lookup object (makes the code that uses it slightly more ugly).
  • An object that gets setup with IIFE (good idea! Does execute some code upon import).
  • A mock class (constructor) that throws in the constructor (the least changes required).

@manzt
Copy link
Collaborator

manzt commented Mar 17, 2021

Ok took a stab at this today, and of course it is not so simple. Things blew up in TypeScript (thankfully we are using TS), mostly due to setting/getting scalar values because we need to handle the bigint case now. For the time being, I'll make a fix that at least has a much clearer error message, but adding support for BigInt/BigUint will be much more challenging and not something I'm able to take on at the moment...

@ilan-gold
Copy link
Contributor

@manzt I think we should be able to support this now, no? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt64Array

@manzt
Copy link
Collaborator

manzt commented Mar 14, 2023

Right, should not be as much of a blocker now since implemented in many browsers. However, introducing bigint arrays will require some more work to integrate with TypeScript and will likely result in a breaking change. For example,

import { openArray } from "zarr";

function calculate(scalar: number) {
   // a function that requires a number
}

let arr = await openArray({ /* ... */ });
let { data } = await arr.getRaw(null);
calculate(data[0]); 
               //^? error: data[0] is bigint | number

@dsteinmo
Copy link

dsteinmo commented Aug 2, 2023

Are there plans to merge #143 at any time in the near future? This limitation impacts our ability to use dates whose underlying datatype is an int (without running into the 2038 problem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants