Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize size of TzOffset #165

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

pitdicker
Copy link
Contributor

Three changes that bring the size of DateTime<Tz> down from 48 bytes to 20 bytes:

  • The build script collects all abbreviations and concatenates them into one string. We compactly store an index and length into this string in one i16. All abbreviations are at most 6 characters and the entire string is ca. 650 characters. So it fits easily.
  • The base offset from UTC doesn't fit into an i16, it needs 17 bits. We give it 18 bits of an i32. The DST offset from the base offset is much smaller, it can fit into the remaining 14 bits. This spares one i32.
  • There are 2 bytes of padding in the FixedTimespan type, and 2 bytes of padding in the TzOffset type. If we don't wrap the FixedTimespan but copy its two fields into TzOffset we get rid of the padding, which is 33% of the type.

A binary compiled with these size optimizations is 1,5Mb smaller 🎉.

Fixes #27.

Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the integers containing bitshifted fields should be wrapped in a newtype with a const API that can handle both construction and getting out the individual values.

Instead of simply concatenating all abbrevations, how much smaller does the full string get if you concatenate only abbreviations that aren't already in the earlier string?

let mut abbreviations: Vec<_> = abbreviations.iter().collect();
abbreviations.sort();
let mut abbreviations_str = String::new();
for abbr in abbreviations.drain(..) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is abbrevations.join("")?

@@ -208,16 +225,17 @@ impl FromStr for Tz {{
first: FixedTimespan {{
utc_offset: {utc},
dst_offset: {dst},
name: \"{name}\",
abbreviation: {index_len},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's suffix the field name with _idx?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very large memory size for DateTime carrying Tz from chrono-tz
2 participants