You don’t have to go through hell managing software dependencies

A padlock icon next to a person working on a lapto

(Image credit: Shutterstock)

Software supply chains have risen to the forefront of application security and legislative attention due to several high-publicity cybersecurity incidents in recent years - Log4Shell being perhaps the most infamous example.

Let’s face it: a big reason is what developers call ‘dependency hell’ - the bottomless pit of attempting to manage all dependencies in one’s software. It’s high time we look at how to do it better.

Software developers use a lot of dependencies – third-party software components taken from freely available open source repositories. It’s more efficient than writing original software from the ground-up, but the volume of dependencies being adopted is growing rapidly. The average Java application in 2022 contained around 148 dependencies compared to 128 in 2021. Developers are being asked to track an average of 1,500 dependency changes per year per application – in addition to producing new software that solves business requirements.

Today's developers have gone through a silent industrial revolution of open source adoption, and as an industry, we’re facing vast unaddressed technical debt from the work needed to maintain software dependencies. It’s often tedious, complicated work relying on networked pieces of code, and unless functionally required, can lead to organizations pushing this work aside.

Most importantly, non-action can inadvertently lead to security vulnerabilities.

Dependency confusion - the fastest growing low-skill, high-reward malicious attack type

The holiday season of 2022 was an inconvenient reminder of this when a new incident affected PyTorch, a popular machine learning framework. Disclosed by PyTorch themselves, the supply chain attack exfiltrates a whole host of information from systems by targeting users of the nightly build using a tactic called ‘dependency confusion’ (or ‘namespace confusion’).

How did this work? One dependency in the pytorch nightly build but not the stable release is the torchtriton package. The attackers registered the torchtriton name in the official pypi.org registry, then published a malicious package with a higher version number. In pip, the dependency manager used in python, pypi.org registries generally take precedence over private or alternative registries, so people consuming nightly builds automatically downloaded the malicious package due to it having a higher version number.

Sounds deviously simple, right? There’s a reason dependency confusion is the fastest growing form of supply chain attack. Bad actors tend to take the path of least resistance. Many open source registries don’t have explicit namespace protection, meaning anyone can register any package name for themselves if previously free. This means anyone's private package ranges are free for anyone else to register.

What makes the pytorch incident special is that it’s the first time we’ve seen this strategy used as a mass distribution strategy, as opposed to the more typical targeted attack against a given single organization.

Ilkka Turunen

Ilkka Turunen is Field CTO at Sonatype.

How do I protect my company from dependency confusion attacks?

As a maintainer of your internal packages you should always reserve package names, even if you don’t intend to publish them. Use namespacing or scoping in every package you publish, and reserve the scope in all the available upstreams. Note that attackers will try to leverage “typoed” versions of your package names, so it’s worth reserving some easy-to-make mistakes too.

If this seems like a lot of work, that’s because it is. Unfortunately, the only protection in the upstream is to either prefer ecosystems with strict namespace protections, to deploy defensive registrations, or to deploy proactive defenses in your own environments, such as dependency firewalls.

Beyond the specific risk of dependency confusion, anytime you use software developed externally, you’re relying on an actively changing project. Over time, as these changes stack, if you don’t actively manage usage, you may find upgrading to a new component prohibitively time-consuming.

Imagine adopting the open source package “foo” at version 1.0. Nothing is functionally wrong with it, so you don’t touch it. One day, you find a new security vulnerability affecting that version, and the fix is in “foo 5.1” upwards - many releases away from where your software is at. To get to a safe state, you have to burn down your technical debt of not keeping this component up-to-date before you can mitigate the vulnerability itself. This work, often painful, can prevent a team from delivering new value for weeks.

That’s why good dependency management matters so much. Keeping your software up-to-date is not optional, and neither is managing the parts you use to assemble it. Just like in manufacturing, sub-optimal parts can hurt the end product; software development is no different.

The latest isn’t the greatest

There’s another wrinkle to optimizing dependency management: it’s better to update to a version close to the most recent release, but not necessarily the bleeding edge. Even when excluding experimental ‘beta’ or ‘release candidate’ versions, the latest version is rarely ideal.

One reason why this strategy works is inadvertently avoiding security vulnerabilities, which proved true for 2014’s “Heartbleed” vulnerability in the widely-used security library OpenSSL. It exposed millions of users’ secure connections, but programs that didn’t update to the latest version (v1.0.1) – whether deliberately or accidentally – avoided exposing customer data. In 2020’s SolarWinds attack, again, staying on an older release proved safer than the latest compromised version.

This contradicts accepted wisdom of the latest being the greatest, so let’s look at the cost of upgrading. The average Java project, for example, releases 10x per year. Which one should you choose?

Choosing good versions

How organizations typically handle software dependencies isn’t particularly encouraging, with only 25% of components actively updated when observing commercial engineering teams. On a micro-level, Sonatype’s last State of the Software Supply Chain report discovered 69% of all upgrade decisions were suboptimal.

There are objectively and subjectively bad upgrade decisions. An objectively bad choice is using the latest version, which is rarely the most safe version. The optimum version, on average, is 2.7 versions behind.

Objective measures are easy to distil into bite-sized wisdoms, but what makes the dependency hell problematic is some upgrade choices are very subjective to the organisation and application, based on usage and environments. Unfortunately, simple measures like download popularity don’t necessarily lead to better decisions.

It’s generally best to upgrade when necessary on a proactive, consistent basis before there’s a problem, instead of a reactive basis involving unplanned work triggered by a published security concern like Log4j. It’s safer, saves cost and time, and helps developers focus on creative software development, improving morale and value.

Automation will save developers

The number of components and criterion for choosing the best version is already a heavy burden. Even after following best practice, this work, at the speed modern development teams require it, must inevitably introduce some automated dependency management tooling for support. They can reduce required component upgrades over time, saving time, money, and avoiding developer frustration.

Failing that, we’ll continue to see direct attacks into dependencies because it’s easy and it works. Open source consumers, not maintainers, take on most of the risk involved, and because of this organizations will need a more holistic approach to software supply chain hygiene.

Open source adoption isn’t slowing down; 2023 will likely introduce new supply chain attacks. As we continue to rely on the goodness of open source, we must manage software supply chains in a thoughtful, architected way. It will empower swift reactions in a practised manner when a new incident occurs.

We've featured the best encryption software.

Ilkka Turunen is Field CTO at Sonatype. He’s a software engineer with a knack for rapid web-development and cloud computing and with technical experience on multiple levels of the XaaS cake.