Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore replacing build/project setup and modularizing codebase #10011

Open
niloc132 opened this issue Oct 17, 2024 · 0 comments
Open

Explore replacing build/project setup and modularizing codebase #10011

niloc132 opened this issue Oct 17, 2024 · 0 comments

Comments

@niloc132
Copy link
Member

While I'm not familiar with the history within Google of this project and how it was built, externally it has used Ant since first being open sourced.

It appears that at least part of the build used Bazel/Blaze, but this was never formally used in the open source CI tooling (Jenkins while Google managed the project and GitHub Actions since then). It was used at least in a limited fashion for https://github.com/google/j2cl to depend on some GWT classes, but those classes have since been copied into the j2cl repository. Once there was a Bazel rules repository for GWT (https://github.com/bazelbuild/rules_gwt/), but I never saw it get real maintenance, and it pulled GWT and its dependencies from Maven Central, rather than actually using the Bazel build files in the GWT repository.

Additionally, the build files are pretty un-idiomatic Bazel, with huge globs to include hundreds thousands of classes (rather than dozens) in a single java_library - Bazel was likely added after the fact, with Ant as the primary way to build.

Ant's own usage surprised me 15+ years ago when I personally started working with building GWT itself - it is slow to build, there are a lot of undocumented properties that can be set to control it, IDEs are difficult to configure to use it, and the "build" code has historically been a blend of xml, Python, Java, and shell. The outputs are generally consumable from other platforms, but it can often take substantial decoding work to understand why some detail of the output is the way that it is.

Some specific goals for this task:

  • Support running the compiler without com.google.gwt.user.User and its associated classpath. This will require separating the "gwt language" and "java internal implementation" details out into their own artifact. Multiple benefits:
    • Allows decoupling "gwt-user" and "gwt-dev" versions from each other, by declaring which parts of GWT are required by the compiler, and which parts are suggested runtime code.
    • Allows dropping legacy com.google.gwt client/shared packages in favor of migrated external org.gwtproject migrated artifacts, with no unused classpath elements (flute, tapestry, gwt-user itself, etc)
  • Update long out of date dependencies, and let automated tooling help keep them up to date (e.g. the "Selenium" support is using 1.0-beta-1 of selenium-java-client-drivers, a release that can't even be found in maven central). Most of these dependencies can be obtained from maven central with no modifications, and the rest we can continue to source from the tools repo.
  • Build artifacts by composing their inputs, rather than the current approach of putting most/all bytecode and dependencies in a large jar, then removing unnecessary components (as we do to send SDK jars to maven). Then shade late - we don't need to produce shaded artifacts just to run tests or build samples (though it could still be worth having integration tests that make use of shaded jars).
  • Migrate to build tooling that is well supported by IDEs to make it easier for contributors to import the project into their IDE, build and run tests, and expect that this outcome will be reflected in CI.
  • Limit the size of artifacts in the build, allowing for faster builds.
  • Simplify build/CI commands for easier (or at least less confusing) steps to take to create and validate a build
  • Use this opportunity to untangle package dependencies, and prune dead code:
    • There are four top-level types called Util, and another Utility. Many/most of these are specific to a given class or package, and can be renamed to be less ambiguous. Other calls can be replaced by usage of java.nio.Files, or guava/commons artifacts.
    • The com.google.gwt.util package is a dumping ground - mostly compiler CLI arg handlers, but also server classes, and Utility to silently close or easily copy files, and the com.google.gwt.dev.util package is used not just for the compiler ("dev"), but also for user generators.
    • Splitting these packages between modules is ostensibly a "Bad" thing to do, but com.google.gwt.core itself is split already between gwt-dev and gwt-user
    • Related, com.google.gwt.core.ext.linker and com.google.gwt.core.linker are nearly related, but not quite - the former is the "api for linkers", and the latter is the "standard linkers" (except SoycReportLinker, which requires some of its internals be used by the compiler directly).
  • requestfactory should be separate modules, optionally shaded together, rather than rely on a bytecode processing tool to extract required classes for each.
  • Continue to produce the same zip of shaded jars, and ensure their contents are as close to the same as we can manage.

Non-goal, but was part of the original goals when I first started looking at this: Extract the Generator and Linker APIs to their own artifacts, so projects that use them can avoid dependencies on the compiler's internals. This may still be a helpful goal, and is still attainable, but in the years since I started playing with this giant refactor, this has stopped being as important as it once was.

Also out of scope is substantially splitting up gwt-user. The existing gwt-user jar more or less will continue to exist as a single project (more on that below).


The evolving implementation consists of several pieces:

  • A set of Maven pom files (notes on Gradle below), declaring internal and external dependencies.
  • A shell script that git mvs files around into a hierarchy that allows for much small modules (which in turn build more quickly).
  • A set of other changes necessary to untangle inter-module dependencies.

This approach has been incrementally built over many years and rebased without major upset as Java and the GWT codebase have changed, and should allow this work to continue until we're ready to actually land it all at once. The huge move commit can then be ignored when reviewing to ensure that no source files are changed, only build files added and removed, and other changes should then merge cleanly before or after this. Spending a few hours every year or so has gone well to keep this moving.

The goal is that the other changes can be merged before the eventual refactor, and the pom files and the output of the shell script can be run and squashed into a single commit.


The current status of this project is that the build artifacts can be used for any project that doesn't use requestfactory, and doesn't run tests - the compiler and dev mode tooling all work, and a subset of samples are built. There are still 900+ files in the compiler, and user continues (and will continue to) remain a beast at 3544 files. No tests are compiled or run (same as ant dist and ant dist-dev, but I'm working on moving those now. No shading happens yet either.

Total build time with no tests, 3.5minutes, and a little over 2 minutes of that is the subset of samples that are migrated. Compare with ant dist-dev, also about 3.5 minutes locally, with no samples (though this does produce at least some shaded artifacts).

[INFO] Reactor Summary for gwt-parent 1.0-SNAPSHOT:
[INFO] 
[INFO] gwt-parent ......................................... SUCCESS [  0.224 s]
[INFO] external tools ..................................... SUCCESS [  0.315 s]
[INFO] javaemul ........................................... SUCCESS [  2.258 s]
[INFO] emul ............................................... SUCCESS [  0.006 s]
[INFO] emul-base .......................................... SUCCESS [  0.090 s]
[INFO] emul-jre ........................................... SUCCESS [  0.183 s]
[INFO] gwt-core ........................................... SUCCESS [  0.629 s]
[INFO] lang ............................................... SUCCESS [  0.376 s]
[INFO] util ............................................... SUCCESS [  1.325 s]
[INFO] ext ................................................ SUCCESS [  0.594 s]
[INFO] linkers ............................................ SUCCESS [  0.633 s]
[INFO] gwt-dev-parent ..................................... SUCCESS [  0.010 s]
[INFO] compiler ........................................... SUCCESS [  8.327 s]
[INFO] devmode ............................................ SUCCESS [  2.469 s]
[INFO] tools parent ....................................... SUCCESS [  0.006 s]
[INFO] api-checker ........................................ SUCCESS [  0.423 s]
[INFO] gwt-user ........................................... SUCCESS [  8.583 s]
[INFO] requestfactory ..................................... SUCCESS [  0.779 s]
[INFO] dynatable .......................................... SUCCESS [ 18.356 s]
[INFO] hello .............................................. SUCCESS [ 13.817 s]
[INFO] json ............................................... SUCCESS [ 17.295 s]
[INFO] mail ............................................... SUCCESS [ 18.452 s]
[INFO] showcase ........................................... SUCCESS [01:54 min]
[INFO] samples ............................................ SUCCESS [  0.004 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:30 min
[INFO] Finished at: 2024-10-16T14:32:29-05:00
[INFO] ------------------------------------------------------------------------

Why not Gradle?

I'm not against Gradle, and use it daily for other projects, but the fact is that I've been rebasing this over about nine years and the maven wiring hasnt changed in any meaningful way - I don't think I can go more than a few point releases with Gradle for nontrivial projects without hitting some new "this won't work in the next release".

Similarly, Maven is well-enough understood by many developers to make incremental changes

Additionally, there is a de-facto standard plugin for use with GWT for Maven (which already works with the refactored compiled) and standard project layouts (including archetypes to easily create projects that follow best practices), while there are many Gradle plugins, but none maintained consistently enough to work in as many cases as the one Maven plugin, and no project templates, effectively leaving users with "here's several plugins, figure it out for yourself".

With that said, the actual modularization done here would work with Gradle build files instead of Maven, and arguably would work better (as Gradle allows "circular" dependencies where A-test can depend on A-main and B-main while B-main depends on A-main). Gradle's parallelization would probably be of limited use here (the slowest part of the build will be either memory-intensive compiles of samples or many test runs all in the user project, and tasks in the same module can't race each other IIRC), but caching could still provide some benefits. If there was a maintained, standardized gradle plugin and set of guidance for using correctly with Gradle, I would definitely be open to changing this.

Why not break up user?

Part of user will be broken up, but only simple parts: core, lang, emulation, and junit3 are the projects that are split out already, but most/all of the rest is in the process of being split out into separately versioned. It likely doesn't make sense to invest heavily here, but instead to consider a build flag to skip building user at all, if needed.

The junit3 project will depend on user, to get rpc and other tools that it needs to build, communicate with the server. It might make a good future issue to remove this dependency any/or plan for junit 4/5 support.

On the other hand, the core/lang/emul projects will be what user depends on - some of these are required to exist in all programs though not be accessible by user code, others are required to be in all programs and be accessible by user code, and others still are expected by most/all user programs, but the compiler does not technically require them. Some of these categories have dependencies that they must be built with, which muddies the water a bit. The issue #9922 specifically exists to break some of these dependencies, and make it easier to limit the scope of classes that any given project must have (e.g. Impl depends on StackTraceCreator depends on GWT depends on Impl).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant