In which it is explained what monolithic codebases are, what monorepos are, and how monorepos solve the problems monolithic codebases have.
A monolithic codebase is a codebase in which modularization is done via source code.
So, for example, if one module wants to import another module,
it will just require('../another-module')
and thus directly import it from its source code.
Modularization exists in monolithic codebases,
but it is implicit in the folder structure of the codebase, and not so much in explicit
mechanisms like NPM packages. So if we want to "package" multiple files into one "package",
we just create a directory for them. But nothing stops other files from importing any file
in that package, and peeking at its internals.
There are many problems with monolithic codebases:
- Monolithic codebases tend to have tightly-coupled code. Because it is so easy to call code from one module to another, people do that, thus generating a sphaghetti set of relations between all modules in the codebase. This is the main problem of monolithic codebases, and is the main reason for the other problems listed below.
- Monolithic code is hard to reason about, given that every module links to every other module. We either understand all the code, or none of it.
- If the code is built (for example, if it's in TypeScript), building it takes a long time, and as the codebase grows, so does the build time.
- Tests take a long time to run. Because it is hard to reason about which tests need to run and which can be skipped, it is common to run them all, when a codebase changes, even if only a small subset needs to run because of that change.
- To mitigate these problems, some monolithic codebases separate parts of their code into separate packages having their own lifecycle: their own git repo, their own CI jobs, and their own publishing lifecycle. Unfortunately, managing all those packages is a chore, and handling the dependencies between them is difficult. This ensures that the number of such separate packages is usually small (delegated to "common componenents"-style packages).
Please take this critique as it should be taken: many monolithic codebases are wonderfully good, designed in a good way, and are mostly loosely-coupled. But the structure of a monolithic codebase tends to make this goodness an uphill struggle—we do not get loose-coupling because this is a monolithic codebase, but despite it.
Monorepos solve the main problem that monolithic code bases have: tightly-coupled code. How they do that is simple: your git repo holds not one codebase, but many (a couple, tens, hundreds, even thousands!). Since Bilt is used for NPM monorepos, each codebase in a monorepo is an NPM package.
So a Bilt monorepo is just one git repo with lots of independent NPM packages, each linked with one another using the regular NPM mechanism: dependencies. That is all it is.
How does that solve the above problems in monolithic codebases? By enabling a code structure that allows for separating your codebase into separate NPM packages, your codebase can now be more easily modularized and loosely-coupled. In other words, monorepos by themselves don't solve the problem, but by enabling us to easily manage tens and hundreds of NPM packages, they encourage us to do so and thus modularize our codebase well.
Monorepos encourage us to create a lot of packages by making it easy to create one:
just create a folder in the monorepo, npm init
it, and if it need to use other packages,
create dependencies on them the regular NPM way: through dependencies
in the package.json
.
If we do that, our codebase becomes more readable, more maintainable, and we can enjoy less friction in our development. Why?
- Packages have a specific way of communicating between them: the "main" module in the package exports a set of functions/classes/constants/whatever, and that is the only interface to the package. Whatever happens inside the package, is that package's domain. Because it is so easy to create a package in a monorepo, it is done often: separating out chunks of code that can be loosely-coupled to other code and packaging them into a separate package in your monorepo. And the ease in which packages are created in monorepos means that packages are be used for that purpose.
- Packages are a good way of hiding information, so a person reading the codebase can understand only a subset of those packages (especially if the other packages they're using have good documentation 😉).
- Builds are fast, because each package is independently packaged, and so if we change just one package, we can build only that package (and its dependents).
- Similarly, tests are separated by package, so if we change just one package, we need to run the tests only for that package (and its dependents). No need to run all the tests in the monorepo.
- Separating code into packages is easy, and there is no hassle in managing all of them. (Except for one problem: building and publishing them. But this is the problem that Bilt solves: building, testing, and publishing packages in a monorepo.)
Monorepos can also hold more than one project. Typically today, monorepo git repositories are uni-project (see, for example, babel)—they hold multiple packages of one project only. Due to the limitations of current build tools for monorepos, a monorepo cannot be scaled to more than tens of packages. But a monorepo can (and should) hold all the code of multiple projects of a company, thus allowing them to share some of the packages between them. This multi-project monorepo approach was made popular by companies such as Google and Facebook. And Bilt was built specifically to handle multi-project monorepos.
In monolith codebases, building, running the tests, and publishing, is easy: we build the whole codebase, run all the tests, and then publish the generated artifacts (static directory, docker image, or npm package). Very slow to run, because we need to build and test everything, but very easy to manage.
In monorepos, this won't cut it anymore. Yes, we can build all the packages, but first we need to figure out the order we need to build them. Why?
Because packages are linked to one another using NPM dependencies. So if, for example, package A has a dependency on package B, we need to first build package B, publish it, and then build package A.
Note: Why not just use
npm link
to link all the packages (like Lerna does)? Becausenpm link
is a wonderful development tool, but is problematic in a number of edge cases, and doesn't work in the large, for example, when generating Docker images, because Docker doesn't recognize host symlinks.
And if we have many packages (as we should if we're using a monorepo), figuring out the build order according to the dependencies, and doing it manually, is practically impossible.
But even if we had that build order, then building all the packages anytime just one package is built negates a lot of the usefulness of monorepos. We want to build only that package, and all the packages that depend on it (recursively).
And that is what Bilt does. Deal with building, testing, and publishing whatever needs to be built, and doing it in the correct order.