Are We Vulnerable? (e.g. Log4Shell) - Identifying Open-Source Library Risk Using MergeStat (Part 2)

Guest Post

This is a guest post by Peter Freiberg, a DevSecOps and application security consultant based in Melbourne, Australia.

I am a consultant who specialises in Application Security and DevSecOps. I've built teams and capabilities for organisations around Penetration Testing (ethical hacking), Secure Code Reviews, DevSecOps, Security Test Automation, Security Development Lifecycle and training.

Previously, we looked at a few ways we can look at open-source library risk across our code. In this article, we’ll look at how we can search for new known vulnerabilities once details become available.

So, an email, article or colleague disrupts whatever you were doing and proclaims “XYZ library is vulnerable and it’s horrible!”. How do you tell if you’re vulnerable to XYZ vulnerability?

If you can query your code, you can start finding where you might be exploitable. If you’re new to MergeStat, the tldr is it's a tool to aggregate multiple Git sources, organisations and repos in a literal SQL interface (with some other magic sauce utilities too).

In this case, let’s take a look at Log4Shell. Here’s a rundown of the advisory: https://www.cisa.gov/news-events/cybersecurity-advisories/aa21-356a

Log4J Logo

One of their first steps was “Identifying assets affected by Log4Shell and other Log4j-related vulnerabilities.”

Now, assets is going to involve a lot of things that are off the shelf, but from a “knowing our code” angle there’s a few things we can do.

Our high-level process using MergeStat to identify potential locations is pretty simple:

Find all locations where we might include Log4J. In this example, we’ll look at the the package manager maven)
Search for Log4J (we can get more specific, but let’s start here)
Find the last author or committer who touched the file to assist with assessing and remediation

Find all the maven package manager files:

SELECT repo, path 
FROM git_files
INNER JOIN repos ON git_files.repo_id = repos.id
WHERE path LIKE '%pom.xml'

We’re going to get something that looks like the following

repo	path
https://github.com/ORG-A/repo1	service-discovery/blah/pom.xml
https://git.internal.xyz/TEAM-B/repoY	pom.xml
...	...

That’s pretty straight forward, but what we really want is to find all the maven files which look like they use Log4J.

SELECT repo, path 
FROM git_files
INNER JOIN repos ON git_files.repo_id = repos.id
WHERE path LIKE '%pom.xml' AND contents LIKE '%log4j%'

This will produce a similar table of results as above, this time with files that contain Log4J. Now, this is pretty broad keyword search, but if you take a look at the Log4J documentation this should catch most of the situations where Log4J appears, even if you change the package manager filename (e.g. build.gradle, build.sbt etc.) and even the Clojure build tool if that’s your thing.

Now to find out who can help us determine if we’re actually vulnerable. Hopefully we have a version number in the pom.xml or equivalent build tool. If we have to dive deeper to check deployment status then we probably need to find the last person who touched the package manager file.

Here’s a query to find all developers and committers who last modified the maven files which look like they use Log4J:

SELECT DISTINCT ON (repo, path) repo, path, author_name, author_email, author_when, committer_name, committer_email, committer_when
FROM git_files
INNER JOIN git_commit_stats ON git_files.repo_id = git_commit_stats.repo_id AND git_files.path = git_commit_stats.file_path
INNER JOIN git_commits ON git_commit_stats.repo_id = git_commits.repo_id AND git_commit_stats.commit_hash = git_commits.hash
INNER JOIN repos ON git_files.repo_id = repos.id
WHERE path LIKE '%pom.xml' AND contents LIKE '%log4j%' AND parents < 2
ORDER BY repo, path, committer_when DESC

Our results should look something like:

repo	path	author_name	author_email	author_when	committer_name	committer_email	committer_when
https://github.com/ORG-A/repo1	service/pom.xml	Bob	bob@xyz.com	2021-06-01T08:30:36Z	Alice	alice@alice.com	2021-06-01T08:30:36Z
https://int.xyz/TEAM-B/repoY	pom.xml	Josue	josue@aaa.com	2021-11-01T09:42:36Z	Patrick	pd@there.com	2021-06-01T08:30:36Z
…	…	…	…	…	…	…	…

The above query has some complexity, but this should be easy(ish) for anyone to modify, changing the path and searching on strings based on your use case.

If the above query has found people who are no longer in your organisation, then you can follow up by finding whoever was the last to author or commit to the repo:

SELECT *
FROM git_commits
INNER JOIN repos ON git_commits.repo_id = repos.id
WHERE repo = 'https://github.com/YOUR_ORG/YOUR_REPO' -- replace with your repo
ORDER BY committer_when DESC
LIMIT 1

What are the limitations of this process?

This example is only Maven, it doesn’t include other ecosystems (but could)

The above process and queries could be easily tweaked to query Gradle, Scala build tool and other JVM based languages or integrations. You could run your analysis on build.gradle or whatever your JVM package manager file is. As above take a look at the Log4J and Clojure build tool documentation for some other package managers.

It may be included in another library as a dependency (AKA Transitive Dependency)

This one is a bit trickier. If we don’t have that extracted and queryable already, we would then need to query the broader libraries and check if they use the library. However, we can at least start from a “where are all the maven, gradle, sbt etc” package manager files and run the analysis from that starting point.

We’re only looking at our code, what about 3rd party and COTS?

The above won’t help with “off the shelf” software that has something like Log4J bundled (e.g. Tomcat, or a commercial application built on top of Tomcat). Most likely, these will be discovered using other vulnerability management tools looking at virtual machines or containers.

This code may not be actually deployed

Is it test code? Proof of Concept? Decommissioned?

Who knows? At least you have a starting point with some names to start asking questions. As above, if you find the last author or committer and ask them you can then rule in or out whether it’s an issue for you.

It may be fixed in code, but not released to production

To actually confirm we’re not vulnerable across our organisation and applications, we still need to check the deployed artefact. We have the starting point of code, but can then follow up through developers we found who’ve updated those files.

Next Steps

As you can see, being able to query your code is extremely useful when looking for something which is “known bad”. We can quickly query our code bases to start the initial analysis of our posture. We’ve used Log4J here, but these types of queries could be used for Dockerfiles (e.g. some binary install through curl that’s compromised, different added packages with vulnerabilities), other language package managers (e.g. Ruby Gems, Python pip, Go mod etc).

Join our Slack

If you found this interesting, hop in our community Slack! We're always happy to chat about MergeStat there 🎉.

What are the limitations of this process?​

Next Steps​

What are the limitations of this process?

Next Steps