Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native_assets_builder] Use file content hashing #1750

Merged
merged 5 commits into from
Nov 28, 2024
Merged

Conversation

dcharkes
Copy link
Collaborator

@dcharkes dcharkes commented Nov 26, 2024

This PR changes the caching behavior for hooks to be file content hashing instead of last modified timestamps.

Closes: #1593.

In addition to using file hashes, a timestamp is passed in to detect if files were modified during a build. The moment of hashing contents is after a build is finished, but we should consider files changed after the build started to invalidate the build. If this happens, the build succeeds, but the cache is invalidated and a warning is printed to the logger.

The implementation was modeled after the FileStore in flutter_tools. However, it was adapted to support caching directories and to match the code style in this repository.

Directory caching is defined as follows: the hash of the names of the children. This excludes recursive descendants, and excludes the contents of children. For recursive caching (and glob patterns), the populator of the cache should do the glob/recursion to add all directories and files.

Testing

  • The existing caching tests (such as pkgs/native_assets_builder/test/build_runner/build_runner_caching_test.dart) cover caching behavior.
  • Now that last-modified are no longer used, some sleeps have been removed from tests. 🎉

Performance

Adding a stopwatch to pkgs/native_assets_builder/test/build_runner/build_runner_caching_test.dart for the second invocation of the hooks so that it is cached.

  • lastModified timestamps: 0.028 seconds (pre this PR)
  • package:crypto md5: 0.047 seconds (current PR)
  • package:xxh3 xxh3: 0.042 seconds

The implementation does not use parallel system IO for loading files (no Future.wait), but does use async I/O to allow flutter_tools to run other Targets in parallel.

The (pre and post this PR) implementation is fast enough for a handful of packages with native assets in a flutter run. The implementation (pre and post this PR) is not fast enough for hot restart / hot reload with 10+ packages with native assets. So, when exploring support for that, we'll need to revisit the implementation.

Related issues not addressed in this PR

Copy link

PR Health

Breaking changes ✔️
Package Change Current Version New Version Needed Version Looking good?
Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

API leaks ✔️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbols
License Headers ✔️
// Copyright (c) 2024, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
no missing headers

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/ffigen/example/libclang-example/generated_bindings.dart
pkgs/ffigen/example/shared_bindings/generate.dart
pkgs/ffigen/example/shared_bindings/lib/generated/a_gen.dart
pkgs/ffigen/example/shared_bindings/lib/generated/a_shared_b_gen.dart
pkgs/ffigen/example/shared_bindings/lib/generated/base_gen.dart
pkgs/ffigen/example/simple/generated_bindings.dart
pkgs/ffigen/lib/src/header_parser/clang_bindings/clang_bindings.dart
pkgs/ffigen/test/collision_tests/expected_bindings/_expected_decl_decl_collision_bindings.dart
pkgs/ffigen/test/collision_tests/expected_bindings/_expected_decl_symbol_address_collision_bindings.dart
pkgs/ffigen/test/collision_tests/expected_bindings/_expected_decl_type_name_collision_bindings.dart
pkgs/ffigen/test/collision_tests/expected_bindings/_expected_reserved_keyword_collision_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_comment_markup_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_dart_handle_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_enum_int_mimic_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_forward_decl_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_functions_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_imported_types_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_native_func_typedef_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_opaque_dependencies_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_packed_structs_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_regress_384_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_sort_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_struct_fptr_fields_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_typedef_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_unions_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_varargs_bindings.dart
pkgs/ffigen/test/large_integration_tests/_expected_cjson_bindings.dart
pkgs/ffigen/test/large_integration_tests/_expected_libclang_bindings.dart
pkgs/ffigen/test/large_integration_tests/_expected_sqlite_bindings.dart
pkgs/ffigen/test/native_test/_expected_native_test_bindings.dart
pkgs/jni/lib/src/third_party/generated_bindings.dart
pkgs/jni/lib/src/third_party/global_env_extensions.dart
pkgs/jni/lib/src/third_party/jni_bindings_generated.dart
pkgs/jnigen/android_test_runner/lib/main.dart
pkgs/jnigen/example/in_app_java/lib/android_utils.dart
pkgs/jnigen/example/kotlin_plugin/example/lib/main.dart
pkgs/jnigen/example/kotlin_plugin/lib/kotlin_bindings.dart
pkgs/jnigen/example/kotlin_plugin/lib/kotlin_plugin.dart
pkgs/jnigen/example/pdfbox_plugin/lib/pdfbox_plugin.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/pdmodel/PDDocument.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/pdmodel/PDDocumentInformation.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/pdmodel/_package.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/text/PDFTextStripper.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/text/_package.dart
pkgs/jnigen/lib/src/bindings/descriptor.dart
pkgs/jnigen/lib/src/bindings/printer.dart
pkgs/jnigen/lib/src/elements/elements.g.dart
pkgs/jnigen/test/jackson_core_test/third_party/bindings/com/fasterxml/jackson/core/_package.dart
pkgs/jnigen/test/type_path_test.dart
pkgs/jnigen/tool/command_runner.dart
pkgs/native_assets_builder/test_data/native_dynamic_linking/bin/native_dynamic_linking.dart
pkgs/objective_c/lib/src/ns_input_stream.dart
pkgs/swift2objc/lib/src/config.dart
pkgs/swift2objc/lib/src/generate_wrapper.dart
pkgs/swift2objc/lib/src/generator/_core/utils.dart
pkgs/swift2objc/lib/src/generator/generator.dart
pkgs/swift2objc/lib/src/parser/parsers/declaration_parsers/parse_initializer_declaration.dart
pkgs/swift2objc/lib/src/transformer/transformers/transform_globals.dart
pkgs/swift2objc/lib/src/transformer/transformers/transform_variable.dart

@coveralls
Copy link

coveralls commented Nov 26, 2024

Coverage Status

coverage: 88.818% (+0.008%) from 88.81%
when pulling 0f89668 on xxh3-hashing
into 4069de3 on main.

@dcharkes dcharkes marked this pull request as ready for review November 26, 2024 19:00
@dcharkes dcharkes requested a review from mkustermann November 26, 2024 19:00
@dcharkes
Copy link
Collaborator Author

dcharkes commented Nov 26, 2024

@jtmcdole @jonahwilliams I can't seem to add you as reviewers on this PR (possibly because it's not the flutter org). But PTAL as Martin is OOO currently.

N.B. This will add package:xxh3 as a dependency to flutter_tools upon rolling.

Copy link

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to pull in xxh3 as a flutter_tools dependency? do we know/trust the maintainer?

@jtmcdole
Copy link

@jonahwilliams - I trust it in as much as any other package that we've pulled in. Here's the performance numbers:

#1593 (comment)

@dcharkes
Copy link
Collaborator Author

Do we want to pull in xxh3 as a flutter_tools dependency? do we know/trust the maintainer?

I guess this is a @jtmcdole question. Based on #1593 (comment).

@mraleph and I both looked through the source code while playing around with it. But of course we don't know about future versions of the package. Do we have a policy for how we vet third_party packages?

(If we are not happy with adding this third party dependency, I can swap the PR to use a slower hashing algorithm.)

@jonahwilliams
Copy link

If thats OK with you its OK with me.

@jtmcdole
Copy link

I don't think we want the slower hashing algorithm when it comes to building native assets.

If the package changes in the future to pull in new deps; we can revisit - but its a dep-less package today: https://github.com/SamJakob/xxh3/blob/master/pubspec.yaml

@jtmcdole
Copy link

I would LGTM; but I don't have access?

import '../utils/file.dart';
import '../utils/uri.dart';

class FileSystemCache {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider fusing API calls which are always used together, this leads to more robust and simpler APIs:

  • cache.readCacheFile and findOutdatedFileSystemEntity
  • reset, hashFiles and persist.

Also the name cache implies something else, I think.

This yields something like this as an API:

final class DependenciesHashFile {
   DependenciesHashFile({required File storage});

   Future<bool> isUpToDate();

   // Records hashes of files, returns`true` if any file has changed after `buildStart`.
   Future<bool> updateAfterBuild({required DateTime buildStart});
}

@dcharkes dcharkes changed the title [native_assets_builder] Use XXH3 hashing [native_assets_builder] Use file content hashing Nov 27, 2024
@dcharkes
Copy link
Collaborator Author

dcharkes commented Nov 27, 2024

Swapped the hashing algorithm to md5 from package:crypto. See perf numbers in the PR description.

@dcharkes dcharkes requested a review from mraleph November 27, 2024 18:02
@jtmcdole
Copy link

Swapped the hashing algorithm to md5 from package:crypto. See perf numbers in the PR description.

How did you test the perf? Will native assets be a large set of small files, smaller set of larger files, or something else? In my linked comment, I had xxh3 at 100ms vs >600ms for md5.

MD5 should be fine as we're not using it for crypto reasons.

@jtmcdole
Copy link

xxh3 at 100ms vs >600ms for md5.

This only mattered for "large" data like 165MB. Maybe something media heavy (game assets?) - but if all our libraries are small, then this is a wash.

@auto-submit auto-submit bot merged commit 081b195 into main Nov 28, 2024
37 checks passed
@auto-submit auto-submit bot deleted the xxh3-hashing branch November 28, 2024 10:22
Copy link
Member

@mkustermann mkustermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few pending comments

mustCompile = sourceLastChange == kernelLastChange ||
sourceLastChange.isAfter(kernelLastChange);
mustCompile =
(await dependenciesHashes.findOutdatedFileSystemEntity()) != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider logging the file name that caused to re-compile it.

@@ -696,8 +709,28 @@ ${e.message}
kernelFile,
depFile,
);

if (success) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider using early returns and less nesting (on the lines above) - it makes it easier to read this code:

if (!mustCompile) {
  return (true, kernelFile, dependenciesHashFile);
}

final success = await _compileHookForPackage(...):
if (!success) {
  return (false, ...);
}

...
return (true, ...);


if (success) {
// Format: `path/to/my.dill: path/to/my.dart, path/to/more.dart`
final depFileContents = await depFile.readAsString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider moving this code (that knows about the format of .d files - into it's own function - e.g. parseDepFile(<file-content>))

for (final uri in fileSystemEntities) {
int hash;
if (validBeforeLastModified != null &&
(await uri.fileSystemEntity.lastModified())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any indication that we benefit from using async/await here? (it's more complicated and may as well be slower)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In flutter_tools the native assets builder is run in a Target concurrently with other Targets. Sync I/O prevents flutter_tools making progress on the other targets.

I had to change some sync I/O to async I/O in flutter_tools in other Targets a while back to speed up flutter builds because of this.

/// If [validBeforeLastModified] is provided, any entities that were modified
/// after [validBeforeLastModified] will get a dummy hash so that they will
/// show up as outdated. If any such entity exists, its uri will be returned.
Future<Uri?> hashFiles(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems to handle directories as well, so should this be hashFileAndDirectoryContents?

/// [Directory] hashes are a hash of the names of the direct children.
class FileSystemHashes {
FileSystemHashes({
this.version = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a version?

In the current PR if this field every changed to a backwards incompatible version, it seems nothing would change because the code doesn't use this field at all (apart from serializing and deserializing it)

this.hash,
);

factory FilesystemEntityHash._fromJson(Map<String, dynamic> json) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's best to type json maps as Map<String, Object> because then one gets static type errors if a lookup like json['foo'] is used in wrong way (using dynamic makes json['foo'].doesNotExist() just work at compile-time)

graphs: ^2.3.1
logging: ^1.2.0
# native_assets_cli: ^0.9.0
native_assets_cli:
path: ../native_assets_cli/
package_config: ^2.1.0
xxh3: ^1.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this dependency on a 3rd party package?

@@ -659,29 +684,17 @@ ${e.message}
final depFile = File.fromUri(
outputDirectory.resolve('../hook.dill.d'),
);
final dependenciesHashFile = File.fromUri(
outputDirectory.resolve('../hook.dependencies_hash_file.json'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we re-build the kernel file if the Dart SDK has changed? (otherwise invoking the old kernel on newer SDK may just crash)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why you review the code! 🤓 🙏

auto-submit bot pushed a commit that referenced this pull request Nov 29, 2024
Addresses the remaining comments on #1750.

I'm unsure how to test the recompiling of the kernel file if Dart changes without writing a test that downloads a different Dart SDK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[native_assets_builder] Caching strategy for hook invocations
6 participants