Dependency Mapping With Graphs
Ian 'z0r0' Abreu / January 2024 (272 Words, 2 Minutes)
Preamble ramble
It’s a hilarious example of what we all know is true about the modern digital ecosystem- it’s built on the backs of a small handfull of underresourced projects. Projects that’re maintained by weekend warriors, developers with some niche interest, and/or just a guy that wrote a thing 10 years ago that somehow has become his passion project. Some of these packages are utilized in tens or even hundreds of thousands of installs, underpinning billions of dollars in commercial revenue and yet only a small handful of that money gets funneled back to the maintainers of those core software packages.
…yet if one of those projects was to have a critical flaw it would have an oversized impact on the security of the world around us.
- CVE-2016–3714: ImageTragic
- CVE-2014-6271: ShellShock
- CVE-2021-44228: Log4Shell
These are just some of the examples of the vulnerabilities that rocked the security community, but in all of these cases were projects that were highly utilized, and under-resourced for the true critical infrastructure that they provided.
Intro
Why does any of this matter, right? Well, a while back I began wondering to myself
So- today is the day that I begin my quest to hopefully come up with solves for these (non) trivial problems, and hopefully learn some new things along the way.
My Gameplan
- In order to solve the complex dependency graph problem, I’m going to have to learn to use graph databases.
- I’m going to iterate a bit on a data model that works for a repository/build repository that’s easy to parse.
- I’ll add weights for things like complexity, age of project, number of contributors, etc.
- Based on my outputs I’ll do a real light impact analysis of a hypothecital package or two with a critical CVE (10/10 on the CVSS scale)
My expectation is that there are a small number of projects that meet the following criteria
- They’re dependencies of a “large” number of projects.
- The complexity of their codebases is statistically significant.
- the age of their codebases is ~7 years.
- The number of regular contributions is extremely small.
We’ll see what happens after that. I’ve got quite a bit of work ahead of me.