Discovery for Software Bill of Material (SBOM)

By: admin
Date: Jan 29, 2021
Comment: 0
Category: General

The concept of a bill of material comes originally from manufacturing where you have a plan of the piece of hardware on which you list all parts needed to construct and build the asset.

A software bill of materials (SBOM) is the same just for software, a list of all components that make up a piece of software. This includes open-source and commercial components, libraries but also the infrastructure and application services that a system is composed of.
Of course we can segment a system into different layer becuase there are off-the-shelf components either open-source or commercial that are used for the development of a application or service. The application itself might have multiple services that compose the solution. Finally at runtime we have multiple applications that run on servers that make up the complete system.

The component technology used differs widely depending on how a service is implemented, could be Java jar files, C# assemblies, node, python or Golang packages or a lot else.

https://youtu.be/GLcSx8fKYd0

BoM Sources

Whoever looked at the dependencies of any modern Java, C# or Node.js application will understand that you do not want to assemble a BoM by hand. This needs automation and tools. But where do these tools get their information from?

Now for the lowest level, the libraries and components these are coming usually from the dependency management of a build system, where there are artifact repositories that can deliver name, versions and dependencies for each component. These tools are maven, gradle, pypi, yarn, npm, godep etc. that every developer uses daily. The only exception might be a C developer, who doesn’t apply component dependencies yet.

The second layer, the application itself as it is composed by micro-services often today can usually not easily be generated automatically, as the build package dependency system goes just to the level of a single artifact but not to the level of which artifacts a application is composed of. So here the development team might be forced to write a BoM by hand, but on the other side this shouldn’t be too large an effort to maintain.

Finally on the upper level we have the application running on some server or cloud environment. Here we can ask a package manager of the operating system about the installed libraries and services or check well-known locations in the file systems.
Hopefully you have a good discovery system that takes away the burden to do that by yourself in a large distributed and heterogeneous environment.

So all these sources would deliver pieces of a complete system BoM and you want to collect these in a central system for management of these assets. This would usually be an IT asset management system.
So potentially your CMDB system can be that, although these do not tend to go to the level of components. Or it can be a general purpose asset management system that knows software BoMs, such as OWASP dependency track or commercial solutions.

SBOM Use Cases

So what can a SBOM be used for, why should we go the whole effort of collecting all the bits and pieces? A single development project usually manages its OSS and commercial components, using a dependency mangagement like maven or gradle for Java, npm or yarn for Node.js and similar techniques exist for most other languages other than C. But we want to capture not only the lowest level and not only as an isolated project but understand where a component is used. How are packages that are distributed as software composed or even how is a service defined based on the underlying applications? We want to check the authenticiy, integrity and provenance (source) of component in order to manage the risk of the software supply chain. Using a central IT asset inventory we can understand the use of components across the projects (dependencies) and foster reuse of packages that have proven to be of good quality and support.
And finally and most importantly, components can be poisoned with malware, trojans, RATs, or crypto mining code. When we detect such a problem, that happened really in the Node.js space in the past, we want to immediately warn other projects that use the same component.

With the SBOM and an IT asset management system we can do exactly that. This gives us a cross-application vulnerability management system for the whole company. But not only IT security needs the SBOM as data base, also OSS compliance requires a company to generate a disclosure document that lists all components with licenses, copyrights and sources. The same is true for commercial components (COTS) where the SBOM allows to do software license management in order to prevent expensive incidents when a software audit is requested from a commercial software vendor, such as Oracle or Microsoft.

Anatomy of a SBOM

Let’s look at what is in a BoM. Before we do that we quickly see what standards are used in the market to represent SBOMs. This might not be complete list of course, but the two main players are

SPDX – Software Package Data Exchange®
CycloneDX

SPDX, a project of the Linux Foundation, is the older but also larger specification in XML format that captures components, licenses and dependencies down to the level of indiviudal source files and even snipplets of code of a development project.

Especially SPDX has a large list of license identifiers that are also used in other projects. SPDS comes from the OSS license compliance space originally.

CylconeDX on the other hand is the new kid on the block and tries to have a modern and lightweight BoM specification that is available in both XML and JSON serialization. It uses SWID tags (according to ) for identifying installed softare ISO/IEC 19770-2:2015 and Package URLs (PURL) for identifying packages and components in addition to CPE specification from MITRE, that is widely used in the security domain.
SPDX license identifiers are also reused in CycloneDX.
This is how a CycloneDX BoM file looks like:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.2",
  "serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
  "version": 1,
  "components": [ {
    "type": "application",
    "name": "Acme Application",
    "version": "9.1.1",
    "cpe": "cpe:/a:acme:application:9.1.1",
    "swid": {
      "tagId": "swidgen-242eb18a-503e-ca37-393b-cf156ef09691_9.1.1",
      "name": "Acme Application",
      "version": "9.1.1", …
    }
  }, {
    "type": "library",
    "group": "org.apache.tomcat",
    "name": "tomcat-catalina",
    "version": "9.0.14",
    "purl": "pkg:maven/org.apache.tomcat/tomcat-catalina@9.0.14"
  } ]
}

It should be a requirement for every development project to deliver a BoM in the format of choice as part of the build process. And it is so easy, as we will see in the next section.

SBOM Tooling

For CycloneDX there are libraries and command line tools for every programming language availalbe in the tool center. These are mainly tools to scan a development project file tree and generate a SBOM for the OSS and other components used in the project by means of the corresponding build dependency manager (maven, gradle, pythion requirements.txt, npm, yarn, godep, etc.).
Let us just run a quick example on a python project:

$ pip freeze > requirements.txt
$ cyclonedx-py.exe
Input file: requirements.txt
Output BOM: bom.xml
JSON output: False
Package info url: https://pypi.org/pypi/{package_name}/{package_version}/json
Generating CycloneDX BOM
Validating BOM
Complete
$ ll bom.xml
-rw-r--r-- 1 Peter Klotz 197121 17526 Jan 8 14:05 bom.xml

And a BoM was generated as XML file based on the requirements.txt that we created in our virtual environment. Similar for other language projects.

On the other hand, a discovery system, such as JDisc Discovery, contains all the data for installed applications on a server from package managers and the filesystem. Just we need to get it out there in the right format, such as CycloneDX or SPDX.
For this we have written a small Python script that makes use of the new JDisc GraphQL API to extract the installed applications from previous discovery scans, converts that data to CycloneDX using the Python library and writes the SBOM to a file. This is a sample invocation from the command line:

$ python main.py -u user -p secret 192.168.185.54 -o bom.json

The script can be found in the JDisc Github organisation in the project discovery-api-python-sample.

For our setup we use OWASP Dependency Track, from the OWASP open-source project, as our standalone IT asset management tool. But we could also use a CMDB, as long as it supports either SPDX or CycloneDX for input, which might still be rare.
In Dependency Track we can either manually or for automation later also via RESTful API upload the SBOM into a project that we create upfront.

After loading the SBOM we see the components that have been discovered by JDisc discovery as assets and Dependency Track would start scanning its vulnerability sources for CVEs for these components (will take a while).

Summary

In contrast a SBOM format like SPDX or CycloneDX is standard and not company-specific and will usually contain more details especially about licenses than any custom CSV file that is usually used. Furthermore all components have a clear identification with a SWID or PURL that make it easier to manage the assets across projects and applications.

Now we can start building the before mentioned applications based on the IT asset management system, e.g. the Dependency Track REST APIs either, as we did here by scripting or using integration code.
The hope would be that these standard formats also find their way from the domain of software development into IT asset management, ITSM and ITIL, as they are really useful.

Categories