It depends. On the registry

Michał Ostruszka
SoftwareMill Tech Blog
8 min readFeb 26, 2018

--

Welcome to the second part of a series “It depends. Harden your JavaScript deployment process”. Here is Part 1 on dependency versions management, and how to keep your dependency graph stable and consistent. In this part I’d like to focus on another key aspect of hardening the project setup, namely minimizing the risk of failure while getting the dependencies themselves.

In short

Don’t let the main package registry (NPM, yarn) be the only source of your dependencies.

It may crash, go down, lose the packages etc. On the other hand, you may not be able to reach it, for example due to some network issues. Avoid packages registries being a single point of failure for your project. Use dedicated proxies or local repositories to minimize these risks. Read on for more details.

Millions of downloads

We all know Javascript is crazy popular. There are tons of projects running it, not only on traditional servers, but also on desktops, IoT devices, robots etc. They all have their own graphs of dependencies — but have you ever wondered how is it possible that some NPM packages have thousands or even millions of downloads per day, not even mentioning monthly stats? And it’s not only a thing for packages being currently on the rise (React, Angular, etc). Even the ordinary packages, like simple jQuery plugins, lodash etc. have huge download stats.

For example, at the time of writing, React had the following figures:

  • 102,270 downloads in the last day
  • 1,995,642 downloads in the last week
  • 7,812,191 downloads in the last month

How is that even possible? Do you really think that there are over 102 thousands of new projects being bootstrapped with React every day? I bet that’s not the case, at all. The reason for that is quite simple: in a huge majority of the projects every single npm install hits the NPM registry directly, downloading packages and bumping their counters.

All right, but still — how come the numbers are so high? People don’t spend their lives issuing npm install all the time right? They usually don’t, unless they call rm -rf node_modules because something doesn’t click together (see Part 1 of the series). But machines do. CI servers doing their work on every git push: running unit tests, doing builds, running acceptance tests — all that in separate, clean environments, usually with sources freshly pulled from git. Taking into account that it’s free (or relatively cheap) to set up CI servers in the cloud for OSS projects, nearly everyone can get one. Now imagine how many projects use that opportunity and you’ll easily spot the correlation with these crazy NPM downloads counts.

If it works, why bother?

Sure, it usually works, but “usually” is a keyword here — and it’s definitely not enough for a serious, commercial project. NPM is just a software running on some servers on the Internet. It may render itself unavailable due to various reasons: be it network outage, storage failure, or even a simple bug in the registry code itself that has just sneaked in with recent update. Also, there may be packages that got removed (remember leftpad story?) or just became temporarily unavailable due to some technical reasons — it has happened several times in the recent months (see NPM’s status Twitter account for issues and look for the ones related to missing packages).

On the other hand, even when NPM works fine, you may have troubles reaching it because of some network errors, proxy/corporate firewall settings changes etc. Most of the times these are the cases you don’t have control over.

Would you like things like these to impact your development, builds and in consequence your production deployments?

Would you like to end up with a failed build because of unmet dependencies at the moment you’re about to deploy your app to production?

Surely not. Your entire pipeline should be rock-solid and this is what we aim for with the recipes in this series.

So what options do we have here?

Avoid Single Points of Failure

As we’ve already seen, we can’t rely on NPM entirely if we want to have a stable and bullet-proof pipeline. It would be great to have some kind of secondary registry to mitigate the risks above and to be used in case of NPM unavailability. So theoretically the option would be to set up our own mirror of the entire NPM. Unfortunately it’s loads of complicated work and infrastructure to set up to replicate the registry, and it’s definitely not a way to go for great number of smaller commercial or OSS projects.

What if we set up kind of lazy registry or caching proxy sitting between our environment and NPM itself, so that it could store aside the packages we download from NPM? If you come from the JVM world (as I do) you may recall that exactly the same approach has been introduced years ago in form of either Nexus or Artifactory, sitting in front of Maven Central’s repository. The way it works is pretty simple: instead of hitting the original registry, you configure your environment to use the proxy instead. Now, every time you ask for a package and there is no matching one (name, version etc) stored in this middleman registry, it (transparently) hits the NPM downloading it for you AND caching it on its end for further use. That way if you need the package later, it’ll be served from the proxy without hitting the main registry at all.

Solutions available

As said before, the two most popular solutions in the JVM world are Nexus and Artifactory. The good news is that both of them are able to handle other repository types too, including NPM! Chances are that, in your company, there are JVM-based projects that already use one of these, so it’s even easier to have it integrated with your JS projects.

But what if you would like to stick to JavaScript land for that too? Unsurprisingly there are JS-based solutions available out there. One of the most popular is Verdaccio (born as a spinoff of the discontinued Sinopia). It does exactly the same thing as the ones above, and is an extensively-configurable node.js-based application. There are also other projects like npm-proxy-cache which is well… just a cache (not a formal registry) but may be worth trying at a smaller scale (or at least as a localhost-based, development proxy). Nexus and Verdaccio, being the most powerful ones, are also easy to install and try out as they usually provide official Docker images and require only minimal changes on the clients’ side — setting up correct registry data in the NPM configuration (instead of the defaults).

Private registry for free

These tools have way more to offer. Have you ever worked on a project where you’d been using private packages/libraries developed by another team in the company, packages you didn’t want to share publicly? It’s not that easy by default, unless you have a paid account on NPM, where you can push your private packages.

As it turns out when you set up a proper local registry/proxy (like Verdaccio or Nexus for example) you get these things for free, out of the box.

You can publish your packages to the local, private registry and make use of them as usual, without even hitting NPM.

That way you are the only one controlling your private packages. Sure, it’s another incarnation of a Single Point of Failure, but as long as it’s under your control you can set up clusters, backups for that etc. in order to mitigate , aforementioned risks. You can also still choose to publish your packages to main NPM registry privately, using your paid account and deal with them as with any other package, storing them locally on your side with Nexus or Verdaccio.

Use it

Setting up your own Nexus (or Verdaccio) registry is one thing, but you need to configure your NPM clients to use it — as they use the main NPM registry by default.

You can set client registry resolution globally for all projects using ~/.nvmrc file, or use per-project settings (with .nvmrc file in project’s root directory). These settings are merged together, which allows for some flexibility and a sane default setup. You can do this via the npm command line by calling:

npm config set registry http://your.nexus:8081/repository-address

which effectively adds the following line to the ~/.npmrc file:

registry = http://your.nexus:8081/repository-address

For security purposes, you may want to have your repository protected, to allow only authenticated users to download/publish packages. If that’s the case, and you have your user accounts set up properly (e.g. on Nexus server), you should first login with npm via:

npm login --registry=http://your.nexus:8081/repository-address

which again adds corresponding entries to the NPM configuration (they are different depending on the version of NPM you’re using).

That’s basically it, if you want to download packages from your local repository (and publish private packages if using Verdaccio).

In case of Nexus there is one more step required in order to be able to publish your private packages. That is because Nexus internally has two types of repositories: one for downloads (proxy) and another one for publishing private packages. That’s why you need to tell your NPM client which registry should be used when pushing new releases of your modules. You can do that by either:

  • appending--registry flag to npm publish command pointing to your local registry,
  • or by adding the following section to package.json:
“publishConfig” : {
“registry” : “http://your.nexus:8081/private-packages-repo”
}

Adding a flag on every publish can be tedious, that’s why publishConfig is — in my opinion — the best way to go as it doesn’t require any other changes in publication process.

Gains

Although setting up own NPM proxy requires some initial and further maintenance effort (but not that much, really), it’s worth having one if you consider your project a serious one, and don’t want to take risks of hitting a Single Point of Failure in the form of relying on the main NPM registry only. If you make your pipeline resistant to external systems’ outages, you can build and ship your application at any time, and it doesn’t matter if NPM is unavailable at the moment, or the package you’re asking for just got removed. If a given package has ever been used in your project/company, it’s there on your local registry. It’s priceless to have full control over all your dependencies’ availability because it’s your decision when you e.g. migrate to new package (assuming the one being used got discontinued/rejected from main registry).

Another advantage you get is packages installation time savings. You’ve probably seen projects where hitting npm install was causing long minutes of NPM trying to download half of the Internet from its main registry. By hitting your local network proxy instead of the main NPM registry, you may greatly speed up that process which will effectively dramatically speed up your both local and (the most importantly) CI jobs build times.

So, as you can see, there is some effort required in order to make your project’s development and deployment pipeline more stable and predictable, but in my opinion the cost is worth paying for, if you’re serious about crafting software the professional way. We all depend on external services in our work, but if it’s only possible to mitigate the risks of their failures we should all go for it. Especially if it impacts our project’s delivery. All in all, we are professionals, right?

--

--