Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing big dependencies in one single global location #2125

Closed
ORESoftware opened this issue Dec 2, 2016 · 39 comments
Closed

Storing big dependencies in one single global location #2125

ORESoftware opened this issue Dec 2, 2016 · 39 comments

Comments

@ORESoftware
Copy link

ORESoftware commented Dec 2, 2016

Feature request, on the latest versions of most everything.

Creating libraries for NPM, often times we see locally installed modules taking up a lot of space on disk - tapjs/tapjs#333

IMO there is no reason NPM cannot act more like Maven and put all modules and every different version of each module in the same global location on disk. That way we don't store the same version of the same package twice on the same machine.

Just curious if Yarn considers this an actual problem to be solved and if anyone is working on solving it. Thanks!

@ORESoftware
Copy link
Author

ORESoftware commented Dec 2, 2016

So if people are interested in this, there are some constraints - tools like NVM will switch the NPM global package location. So what I am thinking is that Yarn or NPM or whoever should choose another truly global location of modules to store things in one place. In other words, npm install -g x, should still put x in the same global location as is today. But there would be another option, let's call it npm install -gg to install a package x in some truly global location that could be picked up by any local project...

@notpushkin
Copy link

@ORESoftware Your proposal is awesome, and in fact, there is a global location where Yarn stores the modules, it is ~/.yarn-cache. Modules are stored there in the raw form though, with no dependencies installed.

What we can do with this?

  1. Let's say we depend on module [email protected] and it doesn't depend on anything, or its dependencies are satisfied by other versions we have in the global tree. Then, we can just symlink node_modules/a it to the ~/.yarn-cache/npm-a-1.0.9 directory (given it's installed from npm, that is).1

  2. Let's say we depend on modules [email protected] and [email protected], and x also depends on [email protected]. Our node_modules looks like this now:

    node_modules/
      r/                        (1.1.0)
        ...r files
      x/
        ...x files              (no pun intended :)
        node_modules/
          r/                    (2.0.0)
            ...r files
    

    We still can symlink node_modules/r to ~/.yarn-cache/npm-r-1.1.0` and `node_modules/x/node_modules/r` to ~/.yarn-cache/npm-r-2.0.0` (given r itself has no dependencies). What about x? We can symlink every single of its files to the cache, but this looks a bit too hackish.

Now, the ultimate problem with all this is that symlinks are broken in Node right now. However, this is being fixed right now! See #2133 and nodejs/node#10107. /cc @phestermcs


1npm- in the folder name doesn't indicate anything right now: I've installed react from https://github.com/preact-compat/react, and it was stored in ~/.yarn-cache/npm-react-0.1.0. A (semi-)related issue: #2033

@notpushkin
Copy link

Apart from that, just the first step shrinked my node_modules on Songbee/desktop from 188,4 MB down to 137,6 MB. Here's a simple Python script I used.

@ORESoftware
Copy link
Author

I am going to try to understand what you said as best I can. What I was thinking would require a structure similar to that in ~/.npm, where the structure is like this:

screenshot from 2016-12-03 22-20-43

I am not sure why every local dependency could not symlinked to something like

~/.npm/foo/<version>/package.tgz
~/.yarn-cache/foo/<version>/package.tgz

although instead of .tgz where we'd have limitations because the code is compressed, so it would be better to just have this instead:

~/.npm/foo/<version>/foo
~/.yarn-cache/foo/<version>/foo

where

so it would be something like

cd project && ln -s ~/.yarn-cache/foo/<version-in-project's-pkg.json>/foo ./node_modules/foo

I am not sure I follow the reasoning you have about when it would be ok to symlink and when it wouldn't, I just want to make sure it's clear that I am talking about having all relevant versions of all relevant NPM packages in the cache directory.

@notpushkin
Copy link

notpushkin commented Dec 4, 2016

Updated the script, now my node_modules is only 83,5 kB (what?!).

@ORESoftware, that's because Yarn doesn't store (or symlink) packages' dependencies in ~/.yarn-cache. If you have the following structure:

node_modules/
  r/                        (1.1.0)
    ...r files
  x/
    ...x files
    node_modules/
      r/                    (2.0.0)
        ...r files

and replace node_modules/r and node_modules/x with symlinks, you'd now have

node_modules/
  r/                        (symlink to ~/.yarn-cache/npm-r-1.1.0/)
    ...r files
  x/                        (symlink to ~/.yarn-cache/npm-x-0.4.0/)
    ...x files
    # no node_modules here!

This can be solved by adding symlinks in ~/.yarn-cache/*/node_modules, though.

@ORESoftware
Copy link
Author

ORESoftware commented Dec 4, 2016

Yeah, I see no reason to have anything but symlinks in the <local-project>/node_modules dir.... would love to see a clear example of that not working, because I have been wondering why NPM chose to do local installations preferred since the beginning.

@notpushkin
Copy link

I have been wondering why NPM chose to do local installations preferred

That's because node isn't working with symlinks correctly right now. Again, see nodejs/node#10107 for more info on that.

@ORESoftware
Copy link
Author

cool thanks

@ORESoftware
Copy link
Author

ORESoftware commented Dec 4, 2016

The symlink discussion is a little over my head ATM but I will follow it

@ORESoftware
Copy link
Author

Well, regarding the symlink discussion - the pnpm package manager seems to be using the symlink methodology we are discussing here.. I am sure you have heard of pnpm - so maybe if y'all haven't, open up talks with pnpm. TBH, I think it's in pnpm's best interest to merge with this lib somehow, at least merge features.

https://github.com/rstacruz/pnpm

@ghost
Copy link

ghost commented Dec 5, 2016

@ORESoftware FWIW when installing with pnpm, it's still creating copies of modules for that install, but only one per module. It does use symlinks, but only from within the local install to the local copy, and not to a machine level copy. Being able to symlink to a machine level store would mean after the first install, the second would take literally about 1-2 seconds (i.e. once a module is on your machine, it can then be symlinked to whenever used again, in however many projects used).

Using a machine level store is not possible with node currently. The changes in the issue @iamale referenced enable node to symlink to machine level stores.

Further, because of an inefficiency in how node internally keeps track of modules, it will create new instances of a module for every place there's a symlink to it, which can consume a fair amount more memory than is necessary, and if a module is an add-on, it may crash when symlinked to more than once.

Lastly, because of how node incorrectly handles any symlinked directories used in the "main.js" path passed on its command line to start a program, certain tools and other things still don't work correctly.

pnpm and ied both use a flat storage structure with symlinks, which can save room and decrease install times a bit, but we're all fundamentally suffering from how node doesn't really work well with symlinks right now. The referenced issue fixes node for everyone.

@ORESoftware
Copy link
Author

@phestermcs interesting

@zkochan
Copy link

zkochan commented Dec 5, 2016

@phestermcs actually pnpm uses a global store, so it saves a package only once per machine. More details about it here

ied currently uses a local store, so it does make a single copy of a dependency per package. However, pnpm and ied collaborators are working on some shared specs and we all agreed that a single store per machine is the way to go.

Here's the store spec (still in draft)

Here we discuss the collaboration and specs

P.S. @phestermcs what you are doing is awesome! ❤️

@notpushkin
Copy link

A few ideas:

  1. While we should use symlinks with caution, what about hardlinks? I think they should work, althouth there are a few caveats: the dependencies should be on the same device, and every file needs to be hardlinked instead of a single directory. Also, no idea how it'd work on Windows.

  2. What if we scrape all the node_modules stuff and instead just patch require so that it'd look into our package.json (or a lockfile), determine a version required and find it in the global store? I don't think it would be too hard to implement.

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

honestly, this might be sound crazy, but just have the user change NODE_PATH in the bashrc or zshrc.

what that allows for, is if the user wants to install something local, so be it, and that overrides everything. but otherwise it would fall back to the modules found in the NODE_PATH

I don't know how you could hot patch require reliably, for every possible entry point of an application

NODE_PATH is reliable, and is no longer going to be deprecated

however, I don't know how this would work for the root user

@notpushkin
Copy link

honestly, this might be sound crazy, but just have the user change NODE_PATH in the bashrc or zshrc.

The problem would then be that it's impossible to store several versions of a package in the global store.

I don't know how you could patch require reliably, for every possible entry point of an application

There aren't that many entry points in applications nowadays, and that's why require hooks work (e. g. see babel-register). Patching require isn't really a new thing, too.

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

well you might have to add 1000 entries to NODE_PATH

.global_store/
    babel-cli/
           3.3.4/
           4.5.5/
           4.5.6/
    istanbul/
          2.3.4/
          3.4.5/

so NODE_PATH would look like:

wherever/.global_store/babel-cli/3.3.4/node_modules:
wherever/.global_store/istanbul/2.3.4/node_modules

maybe you would only add items to NODE_PATH if they were in your package.json file

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

@iamale Well, is patching require fundamentally different than adding to NODE_PATH? Not as far as I can tell, all it does is change where node looks for deps, right?

Unless, of course, you change the precedence rules of the require function, then I guess it would be different. But I don't see the need to change the precedence rules, if you have local modules, those should probably be picked up first.

@notpushkin
Copy link

@ORESoftware Interesting. But crazy. But interesting.

@ORESoftware
Copy link
Author

I frankly just don't know how performant it would or woudln't be to add a lot of entries to NODE_PATH, I'd have to look at the source

@notpushkin
Copy link

notpushkin commented Dec 5, 2016

I think it would be really bad in terms of performance. Although a simple test is what we need here.

@ORESoftware
Copy link
Author

Well, as long as there aren't that many folders in each NODE_PATH entry, it should be OK, not great, but doable

@ljharb
Copy link

ljharb commented Dec 5, 2016

You don't ever want to set NODE_PATH. That makes global modules requireable, which they should never be.

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

No, not at all @ljharb

if you do this in your bashrc / zshrc / bash_profile

NODE_PATH=x

you overwrite the existing NODE_PATH

what you just said is false :)

@notpushkin
Copy link

notpushkin commented Dec 5, 2016

@ORESoftware Also, it wouldn't solve a situation with multiple dependency versions, would it? If our app depends on a, b and c@2 and a and b both depend on c@1, what do we do?

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

AFAICT it would be an abusable system to depend on users to properly set NODE_PATH. But if the package manager told users exactly what to do, it would reasonable to depend on the users to play along. I think it would all work with NODE_PATH, but I just don't know how well it would work

@ljharb
Copy link

ljharb commented Dec 5, 2016

@ORESoftware there is no existing NODE_PATH by default in node - if you have one, your user profile is setting it. If you do that, then you'll be able to require what's in x even though it's not in node_modules, which is a very bad thing.

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

ok, but you were just saying that using NODE_PATH would point to global modules, I don't know about that. And of course, x, is just an example. I was talking about using NODE_PATH to point to globally installed cache of modules, for example, ~/.global_store, not the modules installed by npm install -g...

@ljharb
Copy link

ljharb commented Dec 5, 2016

Right. I'm saying that all global modules should be in npm root -g, and NODE_PATH should never be set.

@ORESoftware
Copy link
Author

well, this whole conversation is about moving all local modules into a single store to prevent duplication, I think we just are arguing about semantics right now. I was suggesting, merely as an idea, to use NODE_PATH to solve the problem.

@ORESoftware
Copy link
Author

I don't think NODE_PATH would make anything much more complicated than a bunch of symlinks, which sounds no less complicated :)

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

@iamale I think I know what you are saying. As a POC, I would just add every single path in .global_store to NODE_PATH.

So, ATM, npm install will put all the dependencies in ./node_modules, let's assume that was a flat structure, using yarn makes it flat apparently, so let's assume it can be made flat easily.

instead of writing those flat directories to ./node_modules, you'd write those flat dirs to .global_store, in a flat way, then you'd have to add all the new dirs to NODE_PATH.

the problem, of course, is now multiple babel versions are now on the NODE_PATH, so which will your program require from? LOL IDK.

Now I am seeing a little bit more why NPM works the way it does :)

@ORESoftware
Copy link
Author

@ljharb

actually, from my experience, NODE_PATH is set by default, try it on your system. It appears to be set to $(npm root -g)

This could be overidden like I said above

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

I am not sure if mine is set because I use NVM or not

echo $NODE_PATH
/usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript

wonder wth /usr/share/javascript is !?

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

@ljharb I see this issue

nvm-sh/nvm#586

do you happen to know if NVM reversed its course and does set NODE_PATH ?

@ljharb
Copy link

ljharb commented Dec 5, 2016

@ORESoftware no, nvm will never set NODE_PATH again. node itself considers it legacy, and it won't work whenever node supports ES modules.

@ORESoftware
Copy link
Author

ORESoftware commented Dec 5, 2016

first of all, I really doubt they will ever fully deprecate NODE_PATH. Such a thing is a mainstay in pretty much every programming environment that I am familiar with. I doubt there is some technical reason that it won't work with ES6 modules.

e.g. Java classpath => https://en.wikipedia.org/wiki/Classpath_(Java)

but yep, you appear to be correct that $NODE_PATH is empty upon a fresh install of node, just tested it. so for some reason some foreign agent is changing my NODE_PATH on my system as is, weird

@zikaari
Copy link

zikaari commented Aug 9, 2017

Another concern that should be considered is that "referencing" deps in <local-project>/node_modules that live in a global folder using any method (hardlinks/symlinks etc) would fail in case the user has two machines (explained below) and project lives on one machine and is shared over network (SMB, FTP etc) and user also works on the same project on second machine.

I for one have a mac and windows PC. Project lives on windows PC and folder is shared over network, I have this setup to test electron apps on both platforms.

@arcanis
Copy link
Member

arcanis commented Jun 2, 2019

Closing; we won't support symlink by default as we've designed Plug'n'Play which fits our bill better by entirely removing the node_modules from the equation.

More info: https://yarnpkg.github.io/berry/features/pnp

@arcanis arcanis closed this as completed Jun 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants