This will hopefully serve as the documentation for the website once/if executed.
High-level plan
-
Kill all use and trace of the Apache CMS
-
Publish html directly to git
-
Allow for several sources to publish html
The result will be several sources, that can be run and managed independently, feeding content into the git repo housing our live html website.
This is a pragmatic perspective that sets us up to get a best-of-breed outcome acknowledging trends in all our website endevors:
-
All tools we’ve used have been heavily extended
-
Content takes a hit each tool change
-
All tools have limitations (strenghts/weaknesses)
-
Filling gaps involves extensions (bullet one)
-
Tools last on average 2-5 years
-
Many types of content actually exist: javadoc, release notes, download pages
-
We will always be in a hybrid situation
Think of it as "microservices for content" and avoiding a monolith.
Ideally this sets us up to acknowledge and embrace evolving our website tech without many of the above disadvantages. If we have a clean CSS and simple menu, we should be able to take HTML from anywhere.
When we want to add a new content source we do not need to figure out how to get it to work "through" the existing generator or redo everything that already works, we simply have it generate content directly to html directly to our site git.
As long as we maintain a common CSS and look and feel, we’re good.
That’s one point of view. I agree with a lot of it, but I’m worried… Perhaps I can characterize it as thinking that more options are better. However, nothing stopped anyone from migrating all the content from CMS to an external .md based system, or from .md to .adoc. But no one did it. Instead, people piled system on top of system, duplicating content in a partially intelligable state, and reducing the amount of organization and coherence at every step. I think reasonable priorities are:
-
as few systems involved in the website as possible
-
avoid extensions and writing our own systems, we don’t have time to maintain them let alone document them.
-
organize the input to the website generation process.
-
discourage adding more systems.
My most profound teacher said to pay attention to the balance between structure and behavior. We need simple and fairly rigid structure here so behavior can go towards improving the documentation rather than wondering how or why.
I’m very worried about having more than one process push content to the git "website" repo. Right now, we effectively have that process with the svn repo, and there’s some content that seems to have been added and abandoned. No one will tell me about it, including the person who put it there.
As a result, a requirement in my mind is that publishing the website needs to start with deleting everything previously there. The next step is adding the entire current content.
Also, "publishing straight to git" actually means committing to a local repository (with tooling this could be bare). This gives a preview opportunity, followed by pushing to the apache repo.
Apache allows a project to designate a git repository as their "website." All files in that repository are published as-is to the internet as the project’s website. HTML must be committed to this repository as it does not offer any generation of any kind.
TODO: what is the process for getting one of these repos?
TODO: can we get Infra to do a svn-git migration of our current flat-html?
In the new architecture each content generator publishes rendered html directly to the site git.
Perhaps…. I think that something more like a maven build is more likely to be useful. If there needs to be more than one content generator, then there should be some sort of process to build the entire site and push it to git. Without this, no one will be able to understand how the site is built.
For Antora, it’s easy to set up a gitlab ci pipeline that builds a site and publishies it on netlify. I don’t know if anything similar is possible for github, but I thnk it’s worth investigating. My idea is something like any commit results in updating a preview site. If you like it you can push to "master", so asf shows it.
The following is a rough outline of the types of content:
-
Versioned documentation for a software distribution
-
Community/Developer documentation
-
Website front-page and "marketing" pages such as major features, benefits, etc
-
Examples
-
Javadoc
-
Release notes and download pages
-
Contributors page
Another possible division would be
-
asciidoc content
-
human written
-
generated
-
-
non-asciidoc content
-
human written
-
generated
-
A perpendicular categorization is:
-
Stuff that fits well in Antoras’s information architecture
-
Stuff that is outside the scope of Antrora’s infomration architecture.
While it’s easy to regard Antora as an asciidoc processor, IMO its more important contribution is providing a consistent and easy to understand information architecture framework.
All of our "product documentation" efforts to date have been in some way wiki-like in nature. They allow any kind of content to take any shape and do not encourage structure.
As a result our content is all miscellaneous odds and ends that do not fit together in any significant chapters or flow. Said another way we’re all "blog" and no "book."
The proposal for this is to use Antora tied to an effort to create a documentation outline that encourages contribution on-rails. Gaps in the documentation should be obvious, which hopefully encourages contribution
Learning how our community works and how to contribute (be a developer) is also an experience that really needs to be on-rails.
The proposal for this is to use Antora tied to an effort to create a deliberately smaller outline of how to get involved.
This content should be very focused on "developer onboarding", something all open source projects must nail to grow.
When people come to the website they must get a human-perfect orientation that gives them the most important information in highlighted form with the least clicking.
There is no proven structure for gaining someone’s immediate attention and not losing them. They need to know "why TomEE", ideally with some pictures or video. There also needs to be a very small handful of pages to highlight features and further pull people in.
The proposal for this is to use the existing Jbake setup as it is free-form and enforces no structure. These pages must be enabled to continuously discard/reinvent (revolve vs evolve) and keep trying different ways to get people’s attention.
The examples section of the website are arguably the only truly successful part of the site in its current form. Both the Front-page and product-documentation parts of the site fall short of accomplishing what they should do.
The current library of examples is 180 and growing as the #1 place where new contributors find success contributing to TomEE. After improvements made in Dec 2018, contributions over the next 12 months doubled bringing in over 40 contributors all the examples.
The proposal for this is to continue the existing Jbake setup as it has proven to be very successful for this application and more enhancements are planned, such as:
-
Adding contributors faces to each example page
-
Automatically linking code to related online javadoc
-
Automatically suggesting related examples
The current "tomee-site-generator" will clone 34 repositories and branches across TomEE, Jakarta EE and MicroProfile to generate clean javadoc trees of each one.
The Javadoc tree for TomEE is created taking all modules and combining them into one tree so people get a single, fully-linked javadoc tree and do not need to be burdened by several small modules.
The Javadoc tree for Jakarta EE is created in the same spirit, grabbing the correct release branch of each API and version in Jakarta EE 8 and combining it together into one fully-linked "jakartaee-8.0" tree spanning the full platform.
The Javadoc tree for MicroProfile is created in the same spirit, grabbing the correct release branch of each API and version in MicroProfile 2.0 and combining it together into one fully-linked "microprofile-2.0" tree spanning the full MicroProfile umbrella spec.
Several motivations exist to grabbing the Jakarta EE and MicroProfile javadoc and publishing it on the TomEE site.
-
Oracle will no longer publish "javaee" docs. There is no plan current in the Jakarta EE side of the fence to publish unified javadoc. There is an industry gap we can fill that will generate website traffic to TomEE.
-
MicroProfile does not current publish fully-combined javadoc. There is a gab currently. We can fill this as well to provide value to the industry and generate traffic to TomEE.
-
A future plan for our examples is to link code to javadoc. Linking to javadoc on our own site has the advantage that they never leave the site and links are guaranteed stable.
-
Reverse linking. The javadoc itself can have links to the relevant examples that show how that class is used. This can be done having an index of each example, what api classes it uses and then inserting multiple
@see
links in the source prior to javadoc generation.
The proposal is to decouple this code from the current
tomee-site-generator
code as it is a separate concern, does take a
very long time to generate, and following the spirit of this overall
proposal should be fully independent and not be mixed in with anything
JBake-related.
I applaud this. Once the javadoc is generated, which I would expect only really needs to happen for a release, there’s the question of how to get it into the site. An advantage of including it in the Antora processed content is that links into it will be checked by Antora while building the site. However, at the moment including pregenerated javadoc is not built into Antora although it’s an experiment I plan to make soon, and definitely in Antora’s future.
The release notes and download page data at one point came entirely from https://svn.apache.org/repos/asf/tomee/sandbox/release-tools/
When this process was working at its best, release notes and download page entries were generated automatically as part of the release process.
Release cadence slowed and these tools decayed due to lack of knowledge transfer in their existence and how to maintain them.
As we increase our release cadence we have renewed need to automate the release overhead of updating download pages and creating release notes.
The proposal is to move this code from svn "sandbox" to a proper git repo and employ automation techniques to cause download pages and release notes to be automatically updated. This time not by a tool run by the person doing the release, but by a CI job based on the same technique we will need to automate publishing of docs or examples when they are updated.
The automated job will run on a timer and simply check dist.apache.org for a new release. It can also be manually triggered and re-run at any time via the corresponding CI job.
We have had several attempts at maintaining a contributors page, none of them successful.
Manual attempts only reflected some individuals. Automated attempts were too clever and have broken over time.
The proposal is to create code to run via a CI job triggered via a git webhook that simply screen-scrapes this page when the TomEE repo is updated:
This will allow us to ensure all 98 and growing contributors are listed and the page is updated when the contributor list changes as PRs are merged.
In the future we can potentially do more to encourage contributors by highlighting them on the TomEE website.