Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example files are using legacy timezone names (US/Pacific) #2049

Closed
bdice opened this issue Oct 16, 2024 · 8 comments
Closed

Example files are using legacy timezone names (US/Pacific) #2049

bdice opened this issue Oct 16, 2024 · 8 comments

Comments

@bdice
Copy link

bdice commented Oct 16, 2024

The example ORC files use a timezone of US/Pacific which is no longer included in all Linux distributions. Ubuntu 24.04, for example, has moved this to a separate tzdata-legacy package. This can cause issues for ORC file readers on systems missing that legacy time zone data.

Should the example ORC files be updated to use a more current time zone name, like America/Los_Angeles?

Verifying the time zone in the stripe footers:

wget https://github.com/apache/orc/raw/refs/heads/main/examples/TestOrcFile.testDate1900.orc
orc-metadata -v TestOrcFile.testDate1900.orc
# Shows stripe footers with "timezone": "US/Pacific"

Additional context

https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/2058249
apache/arrow#40633
pandas-dev/pandas#56292
rapidsai/cudf#16998 (comment)

@dongjoon-hyun
Copy link
Member

Thank you for reporting, @bdice .

cc @williamhyun , @wgtmac , too.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 23, 2024

To @bdice , according to our official Java tool, the type of column time is timestamp without timezone, isn't it?

$ orc-tools version
ORC 2.0.2

$ orc-tools meta ./examples/TestOrcFile.testDate1900.orc | grep Type
Processing data file examples/TestOrcFile.testDate1900.orc [length: 30941]
Type: struct<time:timestamp,date:date>

Please see here. Given that there is no timezone, I'm not sure if the root cause is the file.

ORC includes two different forms of timestamps from the SQL world:

  • Timestamp is a date and time without a time zone, which does not change based on the time zone of the reader.
  • Timestamp with local time zone is a fixed instant in time, which does change based on the time zone of the reader.

Instead, it looks like the C++ library side issue because orc-metadata is based on C++ library. BTW, ORC-1481 was fixed already at Apache ORC 2.0.0. Do you mean that you hit this issue with Apache ORC 2.0+?

@wgtmac
Copy link
Member

wgtmac commented Oct 24, 2024

It looks like a breaking change of timezone name from TZDB. I will take a look. cc @ffacs

@dongjoon-hyun
Copy link
Member

Thank you so much, @wgtmac .

@wgtmac
Copy link
Member

wgtmac commented Oct 26, 2024

https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/2058249 has explained the root cause that tzdata has moved timezone files like US/Pacific to a separate tzdata-legacy library without providing symlinks by intention so it is a breaking change to legacy ORC files. At the same time, some downstream projects depending on Apache ORC C++ library uses ORC files from https://github.com/apache/orc/tree/main/examples for CI validation. These CI jobs start to fail once they upgrade to Ubuntu 24.04 which uses the new version of tzdata without tzdata-legacy installed.

IMO, we should not change TestOrcFile.testDate1900.orc as it is a good example to check if tzdata-legacy is required. One thing that I don't understand is that we have CI jobs running on Ubuntu 24.4 but they do not fail.

@bdice
Copy link
Author

bdice commented Oct 28, 2024

IMO, we should not change TestOrcFile.testDate1900.orc as it is a good example to check if tzdata-legacy is required.

That is fine with me! I have worked around this by installing tzdata-legacy on Ubuntu 24.04. I can see the potential value here. I am okay with closing this issue with no action, if that is acceptable to others.

Another possible course of action would be to leave TestOrcFile.testDate1900.orc as-is, and update the timezone names in TestOrcFile.testDate2038.orc (currently also using US/Pacific).

2038 test file output

Using orc 2.0.2:

$ orc-metadata -v TestOrcFile.testDate2038.orc
{ "name": "TestOrcFile.testDate2038.orc",
  "type": "struct<time:timestamp,date:date>",
  "attributes": {},
  "rows": 212000,
  "stripe count": 28,
  "format": "0.12", "writer version": "HIVE-8732", "software version": "ORC Java",
  "compression": "zlib", "compression block": 10000,
  "file length": 95787,
  "content": 94762, "stripe stats": 686, "footer": 314, "postscript": 24,
  "row index stride": 10000,
  "user metadata": {
  },
  "stripes": [
    { "stripe": 0, "rows": 15000,
      "offset": 3, "length": 6410,
      "index": 153, "data": 6194, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 3, "length": 21 },
        { "id": 1, "column": 1, "kind": "index", "offset": 24, "length": 78 },
        { "id": 2, "column": 2, "kind": "index", "offset": 102, "length": 54 },
        { "id": 3, "column": 1, "kind": "data", "offset": 156, "length": 507 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 663, "length": 5416 },
        { "id": 5, "column": 2, "kind": "data", "offset": 6079, "length": 271 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 1, "rows": 5000,
      "offset": 6413, "length": 2214,
      "index": 76, "data": 2075, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 6413, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 6425, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 6462, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 6489, "length": 171 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 6660, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 8463, "length": 101 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 2, "rows": 10000,
      "offset": 8627, "length": 4321,
      "index": 76, "data": 4182, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 8627, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 8639, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 8676, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 8703, "length": 340 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 9043, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 12651, "length": 234 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 3, "rows": 10000,
      "offset": 12948, "length": 4326,
      "index": 77, "data": 4186, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 12948, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 12960, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 12998, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 13025, "length": 341 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 13366, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 16974, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 4, "rows": 5000,
      "offset": 17274, "length": 2229,
      "index": 76, "data": 2090, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 17274, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 17286, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 17323, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 17350, "length": 174 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 17524, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 19327, "length": 113 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 5, "rows": 10000,
      "offset": 19503, "length": 4401,
      "index": 77, "data": 4261, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 19503, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 19515, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 19553, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 19580, "length": 416 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 19996, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 23604, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 6, "rows": 5000,
      "offset": 23904, "length": 2268,
      "index": 76, "data": 2129, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 23904, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 23916, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 23953, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 23980, "length": 210 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 24190, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 25993, "length": 116 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 7, "rows": 10000,
      "offset": 26172, "length": 4397,
      "index": 77, "data": 4257, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 26172, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 26184, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 26222, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 26249, "length": 419 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 26668, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 30276, "length": 230 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 8, "rows": 5000,
      "offset": 30569, "length": 2269,
      "index": 76, "data": 2130, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 30569, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 30581, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 30618, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 30645, "length": 213 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 30858, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 32661, "length": 114 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 9, "rows": 10000,
      "offset": 32838, "length": 4390,
      "index": 77, "data": 4250, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 32838, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 32850, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 32888, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 32915, "length": 411 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 33326, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 36934, "length": 231 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 10, "rows": 5000,
      "offset": 37228, "length": 2268,
      "index": 76, "data": 2129, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 37228, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 37240, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 37277, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 37304, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 37515, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 39318, "length": 115 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 11, "rows": 10000,
      "offset": 39496, "length": 4399,
      "index": 77, "data": 4259, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 39496, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 39508, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 39546, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 39573, "length": 414 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 39987, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 43595, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 12, "rows": 5000,
      "offset": 43895, "length": 2266,
      "index": 76, "data": 2127, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 43895, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 43907, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 43944, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 43971, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 44182, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 45985, "length": 113 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 13, "rows": 10000,
      "offset": 46161, "length": 4395,
      "index": 77, "data": 4255, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 46161, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 46173, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 46211, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 46238, "length": 412 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 46650, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 50258, "length": 235 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 14, "rows": 5000,
      "offset": 50556, "length": 2267,
      "index": 76, "data": 2128, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 50556, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 50568, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 50605, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 50632, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 50843, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 52646, "length": 114 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 15, "rows": 10000,
      "offset": 52823, "length": 4401,
      "index": 77, "data": 4261, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 52823, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 52835, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 52873, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 52900, "length": 414 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 53314, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 56922, "length": 239 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 16, "rows": 5000,
      "offset": 57224, "length": 2272,
      "index": 76, "data": 2133, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 57224, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 57236, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 57273, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 57300, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 57511, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 59314, "length": 119 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 17, "rows": 10000,
      "offset": 59496, "length": 4396,
      "index": 76, "data": 4257, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 59496, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 59508, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 59545, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 59572, "length": 414 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 59986, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 63594, "length": 235 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 18, "rows": 10000,
      "offset": 63892, "length": 4399,
      "index": 77, "data": 4259, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 63892, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 63904, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 63942, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 63969, "length": 416 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 64385, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 67993, "length": 235 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 19, "rows": 5000,
      "offset": 68291, "length": 2265,
      "index": 76, "data": 2126, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 68291, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 68303, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 68340, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 68367, "length": 210 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 68577, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 70380, "length": 113 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 20, "rows": 10000,
      "offset": 70556, "length": 4398,
      "index": 77, "data": 4258, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 70556, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 70568, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 70606, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 70633, "length": 413 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 71046, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 74654, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 21, "rows": 5000,
      "offset": 74954, "length": 2263,
      "index": 76, "data": 2124, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 74954, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 74966, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 75003, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 75030, "length": 206 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 75236, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 77039, "length": 115 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 22, "rows": 10000,
      "offset": 77217, "length": 4403,
      "index": 77, "data": 4263, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 77217, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 77229, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 77267, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 77294, "length": 417 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 77711, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 81319, "length": 238 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 23, "rows": 5000,
      "offset": 81620, "length": 2266,
      "index": 77, "data": 2126, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 81620, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 81632, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 81670, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 81697, "length": 207 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 81904, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 83707, "length": 116 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 24, "rows": 5000,
      "offset": 83886, "length": 2267,
      "index": 77, "data": 2127, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 83886, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 83898, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 83936, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 83963, "length": 213 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 84176, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 85979, "length": 111 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 25, "rows": 5000,
      "offset": 86153, "length": 2265,
      "index": 76, "data": 2126, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 86153, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 86165, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 86202, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 86229, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 86440, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 88243, "length": 112 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 26, "rows": 10000,
      "offset": 88418, "length": 4399,
      "index": 77, "data": 4259, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 88418, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 88430, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 88468, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 88495, "length": 414 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 88909, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 92517, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 27, "rows": 2000,
      "offset": 92817, "length": 1945,
      "index": 76, "data": 1808, "footer": 61,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 92817, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 92829, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 92866, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 92893, "length": 89 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 92982, "length": 1661 },
        { "id": 5, "column": 2, "kind": "data", "offset": 94643, "length": 58 }
      ],
      "timezone": "US/Pacific"
    }
  ]
}

@wgtmac
Copy link
Member

wgtmac commented Oct 30, 2024

@bdice I think we can keep those files are they are created by legacy writers: "format": "0.12", "writer version": "HIVE-8732", "software version": "ORC Java". We can use the latest writer to generate new file with equivalent data but with new timezone names.

@dongjoon-hyun
Copy link
Member

Thank you all. Let me close this issue because it seems that we agree that the old files should be kept in AS-IS. Feel free to make a PR for the newly proposed file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants