Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: unicode-data-names #110

Merged
merged 11 commits into from
Jun 5, 2024
Merged

Perf: unicode-data-names #110

merged 11 commits into from
Jun 5, 2024

Conversation

wismill
Copy link
Collaborator

@wismill wismill commented Mar 11, 2023

Follow-up of #107.

Added comparison to ICU.

@wismill
Copy link
Collaborator Author

wismill commented Mar 11, 2023

Results (Linux x64, 8 × AMD Ryzen 5 2500U, GHC 9.2.7):

All
  Unicode.Char.General.Names
    name
      String: OK (2.61s)
        40.4 ms ± 318 μs, 155 MB allocated, 1.6 KB copied,  35 MB peak memory, 36% less than baseline
    correctedName
      String: OK (0.68s)
        44.7 ms ± 744 μs, 155 MB allocated, 1.7 KB copied,  35 MB peak memory, 34% less than baseline
    nameOrAlias
      String: OK (20.75s)
        40.6 ms ± 504 μs, 155 MB allocated, 1.5 KB copied,  35 MB peak memory, 36% less than baseline
    nameAliasesByType
      String: OK (6.83s)
        109  ms ± 1.4 ms, 518 MB allocated, 3.0 KB copied,  35 MB peak memory, 21% less than baseline
    nameAliasesWithTypes
      String: OK (1.19s)
        18.8 ms ± 302 μs,  52 MB allocated, 343 B  copied,  35 MB peak memory, 58% less than baseline
    nameAliases
      String: OK (1.97s)
        15.4 ms ± 182 μs,  51 MB allocated, 285 B  copied,  35 MB peak memory,  9% less than baseline

@harendra-kumar
Copy link
Member

How does it compare to ICU?

@wismill
Copy link
Collaborator Author

wismill commented Mar 11, 2023

How does it compare to ICU?

About 5 times faster. However I did not check much the implementation. It does not rely on a reliable library as text-icu: I added the cbits first to enable testing. Benchmark give us a rough idea; else we had nothing to compare to!

@wismill
Copy link
Collaborator Author

wismill commented Mar 11, 2023

@harendra-kumar Any idea how to allow failure of GHC-head in the CI? Only results I got are ongoing discussions to implement the feature but no temporary fix.

How could we mark it as always succeeding? We should then always check the details.

@harendra-kumar
Copy link
Member

How could we mark it as always succeeding? We should then always check the details.

Use ignore_error: true, like here . Note, you will have to add ignore_error to all other CIs as false.

@wismill wismill force-pushed the wip/perf/unicode-data-names branch 5 times, most recently from ce623d6 to e0ebf65 Compare March 11, 2023 22:35
@wismill
Copy link
Collaborator Author

wismill commented Mar 11, 2023

I updated the CI. We should probably use fail-fast: true, shouldn’t we?

@harendra-kumar
Copy link
Member

I updated the CI. We should probably use fail-fast: true, shouldn’t we?

It will save on resources, by cancelling all CIs on first error, but it will not show you errors in all CIs.

@wismill wismill force-pushed the wip/perf/unicode-data-names branch from e0ebf65 to 62e6331 Compare March 12, 2023 13:02
@wismill
Copy link
Collaborator Author

wismill commented Mar 12, 2023

I added 2 optional APIs: ByteString and Text. The ByteString one is slightly less performant than Text, which I did not expect. I guess it is good enough for a first version.

Latests benchmarks:

All
  Unicode.Char.General.Names
    name
      String
        unicode-data: OK (5.28s)
          41.1 ms ± 597 μs, 155 MB allocated, 2.7 KB copied,  35 MB peak memory
        icu:          OK (1.43s)
          204  ms ± 7.3 ms, 524 MB allocated, 8.4 KB copied,  51 MB peak memory, 4.96x
      ByteString
        unicode-data: OK (0.82s)
          26.3 ms ± 496 μs,  68 MB allocated, 1.0 KB copied,  51 MB peak memory
      Text
        unicode-data: OK (1.55s)
          24.4 ms ± 685 μs,  64 MB allocated, 875 B  copied,  51 MB peak memory
        icu:          OK (4.02s)
          129  ms ± 2.1 ms, 277 MB allocated, 3.7 KB copied,  51 MB peak memory, 5.30x
    correctedName
      String
        unicode-data: OK (0.38s)
          54.7 ms ± 2.1 ms, 151 MB allocated, 3.1 KB copied,  51 MB peak memory
        icu:          OK (4.76s)
          316  ms ± 6.2 ms, 729 MB allocated, 9.6 KB copied,  52 MB peak memory, 5.78x
      ByteString
        unicode-data: OK (0.95s)
          30.4 ms ± 848 μs,  68 MB allocated, 957 B  copied,  52 MB peak memory
      Text
        unicode-data: OK (3.66s)
          28.8 ms ± 872 μs,  64 MB allocated, 741 B  copied,  52 MB peak memory
        icu:          OK (1.71s)
          245  ms ± 8.9 ms, 475 MB allocated, 6.2 KB copied,  52 MB peak memory, 8.50x
    nameAliasesByType
      String
        unicode-data: OK (1.06s)
          149  ms ± 4.9 ms, 514 MB allocated, 6.3 KB copied,  52 MB peak memory
      ByteString
        unicode-data: OK (2.28s)
          150  ms ± 3.7 ms, 516 MB allocated, 5.7 KB copied,  52 MB peak memory
      Text
        unicode-data: OK (0.45s)
          148  ms ± 3.6 ms, 508 MB allocated, 9.3 KB copied,  52 MB peak memory
    nameAliasesWithTypes
      String
        unicode-data: OK (1.50s)
          23.7 ms ± 713 μs,  52 MB allocated, 577 B  copied,  52 MB peak memory
      ByteString
        unicode-data: OK (0.36s)
          23.3 ms ± 774 μs,  51 MB allocated, 821 B  copied,  52 MB peak memory
      Text
        unicode-data: OK (1.47s)
          23.2 ms ± 329 μs,  52 MB allocated, 558 B  copied,  52 MB peak memory
    nameAliases
      String
        unicode-data: OK (1.21s)
          18.8 ms ± 367 μs,  51 MB allocated, 531 B  copied,  52 MB peak memory
      ByteString
        unicode-data: OK (4.66s)
          18.3 ms ± 497 μs,  51 MB allocated, 452 B  copied,  52 MB peak memory
      Text
        unicode-data: OK (2.41s)
          18.8 ms ± 721 μs,  51 MB allocated, 460 B  copied,  52 MB peak memory

@wismill wismill force-pushed the wip/perf/unicode-data-names branch 3 times, most recently from 5904a4a to 871e6f2 Compare March 15, 2023 13:51
@wismill wismill marked this pull request as ready for review March 15, 2023 14:21
@wismill
Copy link
Collaborator Author

wismill commented Mar 15, 2023

@harendra-kumar I am statisfied with the current state, although there are some points that surprise me:

  • Text is faster than ByteString for name, nameOrAlias and correctedName, but only when requiring templates. ByteString API allocates much more than Text on templates.
  • ByteString < Text < String for nameAliases and nameAliasesWithTypes (expected), but the 3 perform almost equal for nameAliasesByType.

Anyway, the perf is already good and I spent a good amount of time on this, so if there is no suggestion to improve these, I think we can merge as it is.

Note: this time the benchmarks are run on a smaller set of characters:

  • char with a name or alias for name-oriented functions: name, nameOrAlias and correctedName;
  • char with an alias for alias-oriented functions.

So the results cannot be compared with the previous ones.

Latest benchmarks:

All
  Unicode.Char.General.Names
    name
      String
        unicode-data: OK (1.37s)
          189  ms ± 3.5 ms, 813 MB allocated, 9.3 KB copied,  89 MB peak memory, 23% less than baseline
        icu:          OK (4.66s)
          654  ms ± 9.3 ms, 2.1 GB allocated,  15 KB copied, 119 MB peak memory, 3.46x
      ByteString
        unicode-data: OK (1.74s)
          52.8 ms ± 364 μs, 157 MB allocated, 991 B  copied, 119 MB peak memory
      Text
        unicode-data: OK (1.39s)
          41.3 ms ± 1.2 ms, 125 MB allocated, 805 B  copied, 120 MB peak memory
        icu:          OK (6.07s)
          194  ms ± 2.1 ms, 380 MB allocated, 2.0 KB copied, 120 MB peak memory, 4.69x

All
  Unicode.Char.General.Names
    correctedName
      String
        unicode-data: OK (0.61s)
          196  ms ± 4.7 ms, 804 MB allocated,  12 KB copied,  89 MB peak memory, 21% less than baseline
        icu:          OK (2.35s)
          768  ms ±  27 ms, 2.3 GB allocated,  18 KB copied, 112 MB peak memory, 3.92x
      ByteString
        unicode-data: OK (1.79s)
          55.7 ms ± 914 μs, 157 MB allocated, 994 B  copied, 112 MB peak memory
      Text
        unicode-data: OK (0.37s)
          43.1 ms ± 1.5 ms, 118 MB allocated, 1.4 KB copied, 119 MB peak memory
        icu:          OK (0.90s)
          294  ms ± 6.3 ms, 568 MB allocated, 5.4 KB copied, 119 MB peak memory, 6.84x

All
  Unicode.Char.General.Names
    nameOrAlias
      String
        unicode-data: OK (1.41s)
          194  ms ± 3.0 ms, 813 MB allocated, 8.7 KB copied,  89 MB peak memory, 23% less than baseline
      ByteString
        unicode-data: OK (3.65s)
          56.0 ms ± 1.2 ms, 158 MB allocated, 783 B  copied, 116 MB peak memory
      Text
        unicode-data: OK (3.05s)
          46.7 ms ± 1.6 ms, 125 MB allocated, 625 B  copied, 120 MB peak memory

All
  Unicode.Char.General.Names
    nameAliasesWithTypes
      String
        unicode-data: OK (2.28s)
          147  ms ± 2.0 ms, 643 MB allocated, 7.6 KB copied,  87 MB peak memory, 16% less than baseline
      ByteString
        unicode-data: OK (2.16s)
          66.6 ms ± 2.3 ms, 248 MB allocated, 1.4 KB copied, 101 MB peak memory
      Text
        unicode-data: OK (1.52s)
          96.7 ms ± 2.1 ms, 286 MB allocated, 1.5 KB copied, 101 MB peak memory

All
  Unicode.Char.General.Names
    nameAliasesByType
      String
        unicode-data: OK (1.00s)
          134  ms ± 2.8 ms, 541 MB allocated, 6.2 KB copied,  87 MB peak memory, 47% less than baseline
      ByteString
        unicode-data: OK (2.03s)
          131  ms ± 1.3 ms, 541 MB allocated, 5.7 KB copied, 111 MB peak memory
      Text
        unicode-data: OK (0.42s)
          129  ms ± 3.9 ms, 541 MB allocated, 7.2 KB copied, 111 MB peak memory

All
  Unicode.Char.General.Names
    nameAliases
      String
        unicode-data: OK (1.89s)
          121  ms ± 3.1 ms, 541 MB allocated, 5.8 KB copied,  87 MB peak memory, 11% less than baseline
      ByteString
        unicode-data: OK (0.63s)
          37.5 ms ± 1.4 ms, 146 MB allocated, 1.0 KB copied, 107 MB peak memory
      Text
        unicode-data: OK (1.04s)
          66.4 ms ± 2.1 ms, 186 MB allocated, 1.1 KB copied, 107 MB peak memory

@wismill
Copy link
Collaborator Author

wismill commented Mar 16, 2023

Well, I gave a final try: going low lovel with primops proved to be more efficient than FFI.

The downside is that if primops or ByteString internals change, we will have to adapt. But:

  • For ByteString internals we rely on package bounds.
  • The CI passes, so these primops seem very stable.

These low-level stuff could be applied to Text API as well, but let’s call it a day.

New benchmark results, which show ByteString and Text API now much closer:

All
  Unicode.Char.General.Names
    name
      String
        unicode-data: OK (1.31s)
          182  ms ± 3.2 ms, 813 MB allocated, 8.5 KB copied,  91 MB peak memory, 27% less than baseline
      ByteString
        unicode-data: OK (0.69s)
          40.7 ms ± 1.0 ms, 127 MB allocated, 922 B  copied, 119 MB peak memory
      Text
        unicode-data: OK (0.64s)
          37.5 ms ± 915 μs, 122 MB allocated, 877 B  copied, 119 MB peak memory

All
  Unicode.Char.General.Names
    correctedName
      String
        unicode-data: OK (2.85s)
          186  ms ± 697 μs, 815 MB allocated, 8.3 KB copied,  91 MB peak memory, 28% less than baseline
      ByteString
        unicode-data: OK (0.72s)
          43.5 ms ± 1.4 ms, 127 MB allocated, 920 B  copied, 115 MB peak memory
      Text
        unicode-data: OK (0.70s)
          41.8 ms ± 938 μs, 122 MB allocated, 877 B  copied, 115 MB peak memory

All
  Unicode.Char.General.Names
    nameOrAlias
      String
        unicode-data: OK (1.35s)
          187  ms ± 1.4 ms, 813 MB allocated, 8.7 KB copied,  91 MB peak memory, 24% less than baseline
      ByteString
        unicode-data: OK (0.75s)
          47.0 ms ± 944 μs, 127 MB allocated, 925 B  copied, 117 MB peak memory
      Text
        unicode-data: OK (0.74s)
          44.0 ms ± 1.4 ms, 122 MB allocated, 877 B  copied, 117 MB peak memory

@wismill wismill requested a review from harendra-kumar March 16, 2023 05:21
, text >= 2.0 && < 2.1
include-dirs: cbits
c-sources: cbits/icu.c
cc-options: -Wall -Wextra -pedantic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use some optimization option e.g. -O2 or -Ofast.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, these are added automatically when passing -O2 to Cabal, which I always do. cabal check complains about them in the .cabal file.

-fwarn-identities
-fwarn-incomplete-record-updates
-fwarn-incomplete-uni-patterns
-fwarn-tabs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try using -O2 if that makes any impact on perf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always run cabal with -O2.

name :: Char -> Maybe String
name = fmap unpack . DerivedName.name
name (C# c#) = case DerivedName.name c# of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that we are making the implementation significantly more complex to make these performance gains. Is the performance gain required for unicode-data-names? Are these APIs to be used at places where the additional performance matters? Maybe that is the reason why ICU performance is not so good, because they do not care about the perf of these APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same concern about Text/ByteString APIs. If the performance does not matter just the String APIs are enough. More APIs add significantly more maintenance overhead.

I know I am a bit late about this, you have done a lot of hard work on this. But in general we should optimize only if it matters. I do not have much idea where these APIs are used, so I might be wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The hard work is already done! It was a good exercise to start with a package with relatively small API. Next come unicode-data.
  2. You should not worry about the special cases for the names. These are specific (huge) ranges of characters related to CJK. Any addition with the same template will be processed automatically, because I designed the generator and the optimisation to require no maintenance effort. Any new template will be treated as standard name; a manual change would be required only if optimisation is worth it; else it will work normally.
  3. The implementation is quite similar for the 3 APIs.
  4. It is not only about speed perf, but also about library/executable size.
  5. You suggested from the beginning to use CStrings. The current code is not a simple lookup any more and requires byte arrays manipulation. So it seems quite natural to use the standard bytestring and text, doesn’t it?
  6. The only maintenance I see is updating at each new Unicode version. The Text and ByteString API rely on internal modules of third-party libs, but these are already considered almost public and seem quite stable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also mark the new API as experimental and without guarantee to be maintained.

Copy link
Collaborator Author

@wismill wismill Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the author of this package, I would like to have the best performance, but obviously I will not be optimising it forever. This is only the first round after its creation and I am very glad with it.
I am also the current maintainer, so it is designed with low maintenance effort in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I went through the code only at a high level. Did not dig deep into details. I am assuming your tests will cover any issues.

Would have been nice if we had a single interface which could be adapted to string/bytestring/text. And the users of this library themselves could do it. We are using internals of bytestring/text so whenever they change we will have to adapt to the change. And it is unlikely that users will use these flags. You are enthusiastic and have time at this point, but at some point of time you will get busier and this will become hard for you to maintain. But I will leave this to your judgement, and I know it can be removed if we cannot maintain it.

name :: Char -> Maybe String
name = fmap unpack . DerivedName.name
name (C# c#) = case DerivedName.name c# of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I went through the code only at a high level. Did not dig deep into details. I am assuming your tests will cover any issues.

Would have been nice if we had a single interface which could be adapted to string/bytestring/text. And the users of this library themselves could do it. We are using internals of bytestring/text so whenever they change we will have to adapt to the change. And it is unlikely that users will use these flags. You are enthusiastic and have time at this point, but at some point of time you will get busier and this will become hard for you to maintain. But I will leave this to your judgement, and I know it can be removed if we cannot maintain it.

@wismill wismill force-pushed the wip/perf/unicode-data-names branch from c8b25cd to 2349c54 Compare June 3, 2024 16:34
@wismill
Copy link
Collaborator Author

wismill commented Jun 3, 2024

Rebased

@wismill wismill force-pushed the wip/perf/unicode-data-names branch from 2349c54 to 4d34300 Compare June 4, 2024 11:25
wismill added 8 commits June 4, 2024 13:34
Encode the explicit length of the aliases.

Add filter “WithNameOrAlias” to benchmark [skip ci]
Create internal experimental package `icu`.
Previously we skipped the entire test on Unicode version mismatch between
ICU & `unicode-data`, but we can run the test for all characters defined
in both libs.
@wismill wismill force-pushed the wip/perf/unicode-data-names branch from 4d34300 to e8dd9f6 Compare June 4, 2024 11:39
@wismill
Copy link
Collaborator Author

wismill commented Jun 4, 2024

We should use text-builder-linear for Text and ByteString. I am going to leave this at the moment, because this repo really needs some love. Will open an issue to not forget about it.

@wismill wismill merged commit b93e747 into master Jun 5, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants