-
Notifications
You must be signed in to change notification settings - Fork 53
Spec
npm's installation mechanism uses a nested
directory structure relying on Node's node_modules
fallback.
Due to npm's popularity as a package manager, technology, as well as a company, this algorithm has been popularized as the only "viable" option for installing Node.js packages in the past.
While npm 3 famously introduced a flattened dependency graph, a lot of the original problems remain, including, but in no way limited to:
- Unpredictable dependency graphs after subsequent installations
- Significant performance penalty due to "flattening" of dependency graph
- Redundant dependencies due to partially flattened dependency graph
ied
, an alternative package manager
for Node.js, as well as its fork pnpm
, rely on a different, symlink-based
algorithm, which aims to address most of these issues.
As part of this document, this algorithm will be formally specified and documented.
In order to understand the proposed algorithm, it's essential to understand Node's module system and way of resolving dependencies.
Node looks for dependencies in node_modules
, when the specified dependency
couldn't be located, the parent directory's node_modules
folder will
recursively be traversed until the root of the file system is reached.
This enabled implementors of package managers to create circular dependency
graphs by relying on Node's require
call "falling back" to the parent
directory.
npm exploits this implementation by creating nested dependency graphs.
Especially npm 2 famously created deeply nested node_modules
directories that
caused issues due to a limit on the maximum possible file system path under
Windows. Consequently, npm 3 introduced an additional step following a
successful package installation that flattened the dependency graph in question.
The proposed symlink-based installation mechanism handles each package
installation in an isolated manner, which is fundamentally different from the
traditional node_modules
-fallback based approach, in which the directory
structure implies the dependency graph.
Instead, symbolic links can be used in order to explicitly define dependencies.
E.g. a dependency graph in which a depends on b and c and b, can be expressed using the following directory structure:
[I] ~/g/s/g/a/i/sandbox (master) $ tree node_modules/
node_modules/
├── a
│ └── node_modules
│ ├── b -> ../../b
│ └── c -> ../../c
├── b
└── c
6 directories, 0 files
Linking packages instead of copying gives several advantages:
- packages can be shared between several projects
- the structure of node_modules is the same as the structure of the dependency tree
- there are no long path names in node_modules (there is a limit on the path length on Windows)
The following section outlines a "MVP" installation mechanism. The goal is not to focus on the underlying logic of the used resolvers (such as GitHub, npm, tarball, etc.), but to describe handling of circular dependencies and other edge cases that might arise from using the proposed algorithm.
Installing dependencies is by definition a recursive problem. A dependency depends on n dependencies that might themselves depend on more dependencies. Including, but not limited to the dependency itself or an "ancestor" dependency.
An "ancestor" dependency is a dependency that (implicitly) depends on a the current dependency, thus causing a circular dependency.