diff --git a/blog/modules/ROOT/pages/5-executable-jars.adoc b/blog/modules/ROOT/pages/5-executable-jars.adoc index cbcf46a4255..15e79d53faa 100644 --- a/blog/modules/ROOT/pages/5-executable-jars.adoc +++ b/blog/modules/ROOT/pages/5-executable-jars.adoc @@ -9,8 +9,8 @@ _{author}, {revdate}_ include::mill:ROOT:partial$gtag-config.adoc[] -One feature of the Mill JVM build tool is that the assembly jars it creates are -directly executable: +One feature of the https://mill-build.org[Mill JVM build tool] is that the +assembly jars it creates are directly executable: ```bash > ./mill show foo.assembly @@ -20,11 +20,14 @@ directly executable: Hello World ``` -Most other JVM build tools require you to run their assemblies via `java -jar`, -or require you to use `jlink` or `jpackage` which are much more heavyweight. While not +Other JVM build tools also can generate assemblies, but most need you to run them +via `java -jar` or `java -cp`, +or require you to use https://docs.oracle.com/en/java/javase/11/tools/jlink.html[jlink] or +https://docs.oracle.com/en/java/javase/17/docs/specs/man/jpackage.html[jpackage] +which are much more heavyweight and troublesome to set up. While not groundbreaking, Mill's executable assemblies is a nice convenience that makes your JVM code built with Mill fit more nicely into command-line centric workflows common in modern -systems. +software systems. This article will discuss how Mill's executable assemblies are implemented, so perhaps other build tools and toolchains will be able to provide the same convenience @@ -77,9 +80,15 @@ possible, we first need to understand what a `.jar` file is. ## What is an Assembly Jar? -An "assembly" jar is just a jar file that includes all transitive dependencies, and a jar file -is just a zip file. You can see that from the command line, where although you normally use -`jar tf` to list the contents of a `.jar` file, `unzip -l` works as well: +An "assembly" jar is just a jar file that includes all transitive dependencies. +What makes an assembly different from a "normal" jar is that it should (in theory) contain +everything needed to run you JVM program. In contrast, most "normal" jars do not contain +their dependencies, and you need to separately go download those dependencies before you +can run your jar + +One thing that many people don't know is that jar files are just zip files. You can see +that from the command line, where although you normally use `jar tf` to list the contents +of a `.jar` file, `unzip -l` works as well: ```bash > jar tf /Users/lihaoyi/test/out/foo/assembly.dest/out.jar @@ -87,7 +96,8 @@ META-INF/MANIFEST.MF META-INF/ foo/ foo/Foo.class - +``` +```bash > unzip -l /Users/lihaoyi/test/out/foo/assembly.dest/out.jar Archive: /Users/lihaoyi/test/out/foo/assembly.dest/out.jar warning [/Users/lihaoyi/test/out/foo/assembly.dest/out.jar]: 203 extra bytes at beginning or within zipfile @@ -106,7 +116,7 @@ In this case, the example project only has one `Foo.java` source file, compiled `Foo.class` JVM class file. Larger projects will have multiple class files, including those from upstream modules and third-party dependencies. -In addition to the class files, jars also can contain metadata. For example, we can see +In addition to the compiled class files, jars also can contain metadata. For example, we can see this generated `out.jar` contains a `META-INF/MANIFEST.MF` file, which contains some basic metadata including the `Main-Class: foo.Foo` which is the entrypoint of the Java program: @@ -126,15 +136,205 @@ that we use to make the `out.jar` file executable ## What is a Zip file? -A zip file is an archive made of multiple smaller files, individually compressed, -concatenated together followed by a "directory" listing out the starting offsets of -every file within the archive. This diagram from Wikipedia lists it out: - -https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/ZIP-64_Internal_Layout.svg/2880px-ZIP-64_Internal_Layout.svg.png +A https://en.wikipedia.org/wiki/ZIP_(file_format)[zip file] is an archive made of multiple smaller files, individually compressed, +concatenated together followed by a "directory listing" containing the _reverse offsets_ of +every file within the archive, relative to the directory listing. + +```graphviz +digraph G { + label="archive.zip" + node [shape=box width=0 height=0 style=filled fillcolor=white] + zip [shape=record label=" Foo.class | MANIFEST.MF | ... | directory"] + zip:f2:n -> zip:f1:n [label="reverse offset"] + zip:f2:n -> zip:f0:n +} +``` The typical way someone reads from a zip file is a follows: * Seek to the end of zip, which contains the directory listing +* Find the metadata containing the offset for the entry you want in the directory listing +* Seek backwards using that offset to the start of the entry you want +* Read and decompress your entry + +Unlike `.tar.gz` files, the entries within a `.zip` file are compressed individually. This +is convenient for use cases like Java classfiles where you want to lazily load them +individually on-demand without having to first decompress the whole archive up front. + +## Executable Zip Archives + +One quirk of the above Zip format is that _the zip data does not need to start at the +beginning of the file_! The zip data can be at the end of an arbitrarily long file, and +as long as programs can scan to the end of the zip to find the directory listing, they +will be able to extract the zip. + +```graphviz +digraph G { + node [shape=box width=0 height=0 style=filled fillcolor=white] + label="archive.zip" + extra_label:s -> zip:fe:n [color=red penwidth=3] + extra_label [color=white style=invisible] + zip [shape=record label=" ...extra data... | Foo.class | MANIFEST.MF | ... | directory"] + zip:f2:n -> zip:f1:n + zip:f2:n -> zip:f0:n +} +``` + +Thus, we can actually use the `.zip` format in two ways: + +1. As a `.zip` file, which is read and extracted starting from the end of the file on the right +2. As something else, such as a bash script, which is read and executed starting from start of the file on the left + +This technique is used in common Zip +https://en.wikipedia.org/wiki/Self-extracting_archives[self-extracting archives], and although +this article is about Jars, `.jar` files are really just ``.zip``s with a different name! +So we can prepend a bash script to our `.jar` file, one that will run `java` with the current +executable `"$0"` as the classpath, and any of the current executable's command-line +arguments `"$@"`as the Java program's command-line arguments: + +```graphviz +digraph G { + label="out.jar" + left [shape=plaintext label="bash script starts executing here"] + right [shape=plaintext label="zip/jar is unpacked starting from here"] + + node [shape=box width=0 height=0 style=filled fillcolor=white] + zip [shape=record label=" exec java -cp \"$0\" 'foo.Foo' \"$@\" | Foo.class | MANIFEST.MF | ... | directory"] + zip:f2:n -> zip:f1:n + zip:f2:n -> zip:f0:n + left -> zip:fe:n [color=red penwidth=3] + zip:f2:s -> right [dir=back color=red penwidth=3] +} +``` + +If you use `less out.jar` to look at what's inside the Jar file, it looks like this: + +```bash +exec java $JAVA_OPTS -cp "$0" 'foo.Foo' "$@" +PK^C^D^T^@^H^H^H^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@^T^@^Q^@META-INF/MANIFEST.MFUT^M^@^G<97>^Pvg<97>^Pvg<97>^Pvgeɱ +<80> ^P^@^?^_81s1-^OR^P^CuESC^Z{<8B>JNcҕ(<.=L7<8F>XjE^W^]ٕln +N<91>%3ri^T*<8F>1<8B>CD<81><82>^WPK^G^HB?^Xo[^@^@^@n^@^@^@PK^C^D +^@^@^@^@^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@ ^@^Q^@META-INF/UT^M^@^G<97>^Pvg<97>^Pvg<97>^PvgPK^C^D +^@^@^@^@^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@^D^@^Q^@foo/UT^M^@^G<97>^Pvg<97>^Pvg<97>^PvgPK^C^D^T^@^H^H^H^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@^M^@^Q^@foo/Foo.classUT^M^@^G<97>^Pvg<97>^Pvg<97>^Pvgm<90>J@^T<86>[<9A>4z-BH]<<98>^G^Q<8A><8B>B.\XL2^R^S҅<82>^K^_<87>^R^DA<85>^C<^C ^NJ([Fh-WQ2/'^K1^Hc<99><94>P^FcESCu^V^^\^W^M!^S1S:gQ7(~R<99>da<96><8A>(^^ֱJh<9C>^KލNA^Kk^V.:X't<96><88>^HֽT®^gap0cNk^?5rg<82>^Ld".x"hxR<89>#&=v<99>^K u<<9E>NH^Z^G^F>|#J s%<8E>ESC9^SESC<99>^K&Z1,^V^?ЃB +/+hN]^ZNeESCPK^G^H<94>r+6 ^A^@^@<9F>^A^@^@PK^A^B^T^@^T^@^H^H^H^@`"ZB?^^Xo[^@^@^@n^@^@^@^T^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@META-INF/MANIFEST.MFUT^E^@^G<97>^PvgPK^A^B +^@ +^@^@^@^@^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@ ^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@META-INF/UT^E^@^G<97>^PvgPK^A^B +^@ +^@^@^@^@^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@^D^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@foo/UT^E^@^G<97>^PvgPK^A^B^T^@^T^@^H^H^H^@`"Z<94>r+6 ^A^@^@<9F>^A^@^@^M^@ ^@^@^@^@^@^@^@^@^@^@^@^Y^A^@^@foo/Foo.classUT^E^@^G<97>^PvgPK^E^F^@^@^@^@^D^@^D^@ +^A^@^@<85>^B^@^@^@^@ +/Users/lihaoyi/test/out/foo/assembly.dest/out.jar (END) +``` + +Here, you can see a single line of `exec java $JAVA_OPTS -cp "$0" 'foo.Foo' "$@"` which +is the bash script we prepended to the zip, followed by the un-intelligible compressed +class file data that makes up the `.jar`. Since now you are running the Java program +via `./out.jar` instead of `java -jar`, we expose the `JAVA_OPTS` environment variable +as a way to pass flags to the `java` command that ends up being run.] + +## What about Windows? + +The self-executing jar file above works by prepending a shell script. This works on Unix +environments like Linux or Mac, but not on the Windows machines which are also very common. + +To fix this, we can replace our shell script zip prefix with a "universal" script that +is both a valid `.sh` program as well as valid `.bat` program, the latter being the +standard windows command line language. Thus, instead of: + +```bash +exec java $JAVA_OPTS -cp "$0" 'foo.Foo' "$@" +``` + +We can instead use: + +```bash +@ 2>/dev/null # 2>nul & echo off & goto BOF +: +exec java $JAVA_OPTS -cp "$0" 'foo.Foo' "$@" +exit + +:BOF +setlocal +@echo off +java %JAVA_OPTS% -cp "%~dpnx0" foo.Foo %* +endlocal +exit /B %errorlevel% +``` + + +This universal launcher script is worth digging into. + +In a `sh` shell: + +* `@ 2>/dev/null # 2>nul & echo off & goto BOF` is an invalid command, but we ignore + the error because we pipe it to `/dev/null` + +* It then runs the `exec java -cp` command + +* We `exit` the script before we hit the invalid shell code below + +In a `bat` environment: + +* We run the first line, doing nothing, until we hit `goto BOF`. This jumps over the `exec java` + line which is not valid `bat` code, to go straight to the `:BOF` label + +* We then run `java -cp`, but with slightly different syntax from the unix/shell version above + (e.g. `%~dpnx0` instead of `$0`) for windows/bat compatibility + +* We then `exit` the script, using `/B %errorlevel%` which is the windows syntax for propagating + the exit code, before we hit the compressed data below which is not valid `bat` code. + +As a result, we have a short script that we can call either from `sh` or `bat`, +that forwards arguments and the script itself (which is also a `.jar` file) to `java -cp`, +and then forwards the exit code back from `java -cp` to the caller. Although the script may +look fragile, the strong backwards compatibility of `.sh` and `.bat` scripts means that +once working it is unlikely to break in future versions of Mac/Linux/Windows. + +If we look at the file using `less -n20`, we can now see our universal launcher script +pre-pended to the blobs of compressed classfile data that make up the rest of the jar: + +```bash +@ 2>/dev/null # 2>nul & echo off & goto BOF +: +exec java $JAVA_OPTS -cp "$0" 'foo.Foo' "$@" +exit + +:BOF +setlocal +@echo off +java %JAVA_OPTS% -cp "%~dpnx0" foo.Foo %* +endlocal +exit /B %errorlevel% +PK^C^D^T^@^H^H^H^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@^T^@^Q^@META-INF/MANIFEST.MFUT^M^@^G<97>^Pvg<97>^Pvg<97>^Pvgeɱ +<80> ^P^@^?^_81s1-^OR^P^CuESC^Z{<8B>JNcҕ(<.=L7<8F>XjE^W^]ٕln +N<91>%3ri^T*<8F>1<8B>CD<81><82>^WPK^G^HB?^Xo[^@^@^@n^@^@^@PK^C^D +^@^@^@^@^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@ ^@^Q^@META-INF/UT^M^@^G<97>^Pvg<97>^Pvg<97>^PvgPK^C^D +^@^@^@^@^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@^D^@^Q^@foo/UT^M^@^G<97>^Pvg<97>^Pvg<97>^PvgPK^C^D^T^@^H^H^H^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@^M^@^Q^@foo/Foo.classUT^M^@^G<97>^Pvg<97>^Pvg<97>^Pvgm<90>J@^T<86>[<9A>4z-BH]<<98>^G^Q<8A><8B>B.\XL2^R^S҅<82>^K^_<87>^R^DA<85>^C<^C ^NJ([Fh-WQ2/'^K1^Hc<99><94>P^FcESCu^V^^\^W^M!^S1S:gQ7(~R<99>da<96><8A>(^^ֱJh<9C>^KލNA^Kk^V.:X't<96><88>^HֽT®^gap0cNk^?5rg<82>^Ld".x"hxR<89>#&=v<99>^K u<<9E>NH^Z^G^F>|#J s%<8E>ESC9^SESC<99>^K&Z1,^V^?ЃB +/+hN]^ZNeESCPK^G^H<94>r+6 ^A^@^@<9F>^A^@^@PK^A^B^T^@^T^@^H^H^H^@`"ZB?^^Xo[^@^@^@n^@^@^@^T^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@META-INF/MANIFEST.MFUT^E^@^G<97>^PvgPK^A^B +^@ +^@^@^@^@^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@ ^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@META-INF/UT^E^@^G<97>^PvgPK^A^B +^@ +^@^@^@^@^@`"Z^@^@^@^@^@^@^@^@^@^@^@^@^D^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@foo/UT^E^@^G<97>^PvgPK^A^B^T^@^T^@^H^H^H^@`"Z<94>r+6 ^A^@^@<9F>^A^@^@^M^@ ^@^@^@^@^@^@^@^@^@^@^@^Y^A^@^@foo/Foo.classUT^E^@^G<97>^PvgPK^E^F^@^@^@^@^D^@^D^@ +^A^@^@<85>^B^@^@^@^@ +/Users/lihaoyi/test/out/foo/assembly.dest/out.jar (END) +``` + +On Windows you cannot run `./out.jar` from the command line, as Windows uses the file +extension to determine how to execute a program. But if you rename the Jar file from +`out.jar` to `out.bat`, you can then use `./out.bat` or `./out` to execute it + +## Conclusion + +The executable assembly jars that Mill generates are very convenient; it means that +you can use Mill to compile (almost) any Java program into an executable you can run with +`./out.jar`, as long as you have the appropriate version of Java globally installed. This +is much easier than setting up JLink or JPackage. + +The Mill JVM build tool provides these executable assembly jars out-of-the-box, the SBT +build tool as part of the https://github.com/sbt/sbt-assembly[SBT Assembly] plugin. +Maven and Gradle do not provide this by default but it is pretty easy to set up yourself +simply by concatenating a shell script with an assembly jar, as described above. +Although running Java programs via +`java -jar` or `java -cp` is not a huge hardship, removing that friction really helps your +Java programs and codebase feel like a first class citizen on the command-line. -## What about Windows? \ No newline at end of file