It depends. The art of dependency management in Javascript

This is Part 1 of a series of blog posts I entitled “It depends. Harden your Javascript deployment process”, where I’d like to shed some light on how one can greatly minimize the risk of Javascript project development and deployment, using techniques and tools that are already widely used in other IT communities. Sadly, I came across many projects still missing these crucial building blocks. In most cases, such mistakes are not intentional, they’re caused by the lack of knowledge of how things work, and what traps to avoid. So I encourage you to read on, and keep an eye out on the upcoming posts in this series.

In short

Always lock and version-control your project’s dependency graph using either npm shrinkwrap (for older NPM versions), or make use of auto-generated lock files (for NPM 5+ or Yarn). Don’t get fooled by the fact of having exact dependency versions in package.json file.

Javascript and dependency hell

We all know these jokes about dependency management in Javascript, right? Even if you’re not a JS developer by day, you must have heard some of them. The infamous node_modules directory that contains thousands of packages even for the simplest webpage sprinkled with tiny amount of Javascript, npm install that takes ages to download half of the Internet, packages where the amount of infrastructure code (package.json, dotfiles etc.) is a couple of times bigger than the actual package code (ill-famed leftpad and other single-function packages), various package managers, as in:

- What’s Yarn/Bower?
- Package manager for Javascript.
- How to install it?
- Via NPM.
- What’s NPM?
- Package manager for Javascript.

…and so on and so forth. But the fact is, you need dependencies in order not to reinvent the wheel and to be able to use dedicated, proven, well-tested libraries and tools that somebody has already authored. The trick is to keep your dependency tree in a known state and under your full control, which are often missing pieces in many projects.

How does the dependency tree grow?

Every conventional Javascript project (be it a node.js application or a frontend app that is built with react, webpack etc.) — that uses dependencies in some form — has a package.json file that specifies packages the project makes use of. I will not dive into the details and differences between dependencies vs devDependencies vs peerDependencies etc. at the moment, this post applies to all of the above.

Let’s init a simple NPM project and add some dependencies (via regular npm install <dep_name>):

// package.json

{
"name": "it-depends",
"version": "1.0.0",
"main": "index.js",

"dependencies": {
"express": "^4.16.2",
"moment": "^2.20.1",
"mongodb": "^3.0.2"
},

"devDependencies": {
"nodemon": "^1.14.12"
}
}

If you don’t specify an exact version, you’ll get the latest available installed (the version tagged latest on NPM registry). Moreover, all the versions are by default prefixed with ^ (caret). What that means is that, in a SEMVER versioning schema (MAJOR.MINOR.PATCH), changes are allowed up to the MINOR part. So, for example, having dependency with version ^2.2.4 means that both 2.2.5 as well as 2.3.0 will be considered valid and “the latest” of them will be used if available.

To complete the story, similar but a bit more restrictive logic applies to ~ (tilde). Once used in front of the version number, it means: allow only PATCH segment changes. So, in the example above, if we had ~2.2.4, the only valid candidate out of those two mentioned would be 2.2.5.

These above are really useful features, e.g. when you want to depend on a package that your teammates develop in parallel and you want to keep the dependency up to date all the time whenever they release new development version etc. But they may bite you hard too, as we’ll see in a moment.

Direct dependencies are just a tiny part of the problem

You may think: OK, I can depend on range of versions with ~ or ^, so it probably means that, if I pinpoint the exact versions for all my dependencies, I’ll always get the same dependencies no matter what — so I’m done, right?

I asked that question during one of my conference talks on that topic and was really surprised how many people answered so. I have bad news here: it doesn’t work that way.

Even if you lock down your own dependencies by pinpointing their exact versions, they (your dependencies) might not do the same. In fact, you must assume they don’t, as you have no way to control and enforce that. What does that mean in practice? Let’s look at the following example:

  • You depend directly on exact express version 4.16.2
  • express depends on accepts in form of ~1.3.4
  • accepts in turn depends on mime-types in the ~2.1.16 range

Can you see the problem yet? Even if you didn’t touch your package.json at all, you may have ended up with a different dependency tree being resolved across two independent executions of npm install!

Hypothetically, you may have gotten accepts in e.g. 1.3.5 or mime-types in 2.1.17 in one build and completely different versions (but still satisfying the range) in another one. What if one of these packages you didn’t control introduced a bug impacting your application? What if it changed its public API (remember, SEMVER is just an agreement that cannot really be enforced) that (when called indirectly) broke your app at runtime? And all that with zero changes in your direct dependencies list! Imagine you develop a feature or fix a tiny bug (like changing the label for a text field), rebuild/redeploy the app and BANG! Your app is now broken in a completely fix-unrelated area you haven’t touched for ages.

Have you ever seen a GitHub issue report or a StackOverflow post in which a feature that used to work just a day ago no longer works? Or the app doesn’t start because of undefined is not a function? I suspect a huge number of these to be caused by this tiny, often overlooked fact described above.

Surprisingly (and frighteningly) often the answer to these is

rm -rf node_modules && npm cache clean && npm install

which makes me really scared as this is just like a game of Russian roulette — you never know what you’ll end up with, the next time you push the npm install trigger.

Let’s see how to avoid that kind of trap. Also, in the subsequent posts I’m going to show other complementary techniques for hardening the build/deployment process of the app, so keep an eye out on updates.

We’ve got tools for that now

Now, when we’ve got the issue identified, it’s time to find a weapon to destroy it. The weapon is called dependency locking and is available to most of you (unless you work with a really old version of NPM).

How does that work? Basically it walks the tree of the dependencies you’ve just installed and fixes in place the exact versions of all dependencies, saving this resolution to a file. Now, when anyone installs the dependencies for the project (be it another developer, the CI server etc.) with this file available, they get exactly the same dependencies as you did before. It doesn’t matter how many times they do that and how many new releases of your transitive dependencies have been published since the initial npm install — they’ll always get the same versions for their build! An easy, predictable solution that greatly lowers the risk of unanticipated failure.

How to get that for your project? Good news: if you work with NPM 5+ or Yarn, you get this feature out of the box. These tools create/update their lockfiles (respectively package-lock.json and yarn.lock) every time you install a package (with npm install or yarn add) — just remember to keep those files in the VCS.

If you work with an older version of NPM and cannot update to 5+, then you need to perform some manual work every time you install a new dependency. There is the npm shrinkwrap command that analyzes your node_modules directory to resolve all the dependency versions and write them to an npm-shrinkwrap.json file. There are few drawback here: first is you need to issue the command on every new dependency. The other one is that npm shrinkwrap may fail in case you mess up with your node_modules, e.g. by installing a package your package.json doesn’t know about (e.g. without — save)

The only problem we had some time ago with NPM 5 (before we switched to yarn) was that the checksums generated during this locking operation on macOS were different from these generated on Linux (also relevant from a CI perspective), which was causing some problems with keeping the managing dependencies process right.

As an alternative, you may also consider storing the entire node_modules directory in the VCS, but that’s totally impractical (in my opinion), because of the size, managing differences between commits adding new dependencies and binary extensions not portable between operating systems.

A fair question that may pop up now: why do NPM and Yarn keep adding the caret (^) prefix for SEMVER range while installing dependencies via npm install or yarn add, although we already have all the versions locked down to exact numbers with respective lockfiles anyway?

The answer is that having the ranges in package.json allows you to use tools like npm outdated or npm update to track and update your dependencies with respect to the ranges defined. So, for example, when updating express with version defined as ^4.16.2 you’ll never jump outside of 4.x range accidentally.

Are we safe now?

Understanding the issue and correctly using the tools mentioned above makes you safe in terms of resolving all the dependencies in your project (even the transitive ones), which is a step in the right direction: building a safe and predictable development and deployment process for your application. But that’s just one step — I’ll show you more issues and bumps along the road in the subsequent posts.

But for now… live long and keep your dependencies locked. Stay tuned!