Why You Need an Open Source Software Governance & Provenance framework
First, lest there be any doubt, open source and the Open Source movement are
good. Even if you choose the most conservative route and do not use open source,
the code tends to be very well written and documented and well-factored and
simply studying it can
serve as a best-practices guide for your own software development efforts.
The issue is that an alarming amount of open source usage is not using open
source but rather pre-built binaries, typically (but not always)
downloaded from the same site hosting the source. Platform independent
binaries like Java make it tantalizingly easy to just grab a jar file
from the internet, drop it into your classpath, and begin to use it.
probably wouldn't download somethingYouFound.exe from
someplaceOnTheNet.co.br and just run it on your PC.
Remember that the argument of "everyone
else is using it so it must be OK" is flawed for a number of reasons including
"Everyone else is using it" is not acceptable corporate governance.
Why would you do it for Java?
Because the Java Duke character is cute?
Because Java and the Open Source movement are hip?
Because the Open Source community is beyond malfeasance?
You think Java is immune from virus and malware transmission?
Perl, Python, and Ruby environments also tend to run afoul of this problem,
especially Perl, with the thousands of modules hosted in the CPAN environment.
Although there is typically more compile activity in using open source
from these environments than
with Java, the maturity of the highly lubricated download/build/install
frameworks in these environments leads to the same sort of "blindly capture,
process, and install" behavior.
In The Old Days, code was either built
in-house or purchased from a vendor. In both cases, the software development
lifecycle (SDLC) including design
review, testing of all kinds (functional, performance, security/robustness),
and build/release process were well known. Yes, the design and utility
of some of that
code was poor. But the governance and certainly the
of the code was clear.
Increasingly, internal and external audit functions are requiring a higher
standard of code stewardship and management for a number of reasons, not the
least of which being an increased level of cyber-borne threats. It is very
likely that nearly 100% of Java applications today contain significant
(>10%) amounts of open source-originating software for which no clear
provenance exists, e.g. exactly where did that jxpath.jar come from
and how are you sure it was bullt cleanly from the proper source code? In some
cases, the unprovenanced percentage is much higher.
in a large development organization, diligent groups that embrace the
tenets and value of open source management will end up duplicating effort
by building and managing the same component of open source software in multiple
therefore vital that if you use open source, you must create a comprehensive
enterprise-wide process to categorize and manage it.
Steps to Take To Provide Open Source Governance & Provenance
With such a framework in place, you will be able to accurately and
reliably attest to the provenance of open source software running in production.
- First (and this is probably the hardest nut to crack), modify your
software practices policy to forbid the downloading of any binary
executable content that does not pass through an authorized channel
intended to provide some sort of front door protection/inspection of the
content. At the absolute minimum, the activity needs to be logged and
reviewed on a weekly basis as a reactive control.
- Create a consumption/build independent component repository. This
can be a filesystem or a blob-oriented database. It is not Maven; Maven is a
Java-oriented, build and build-dependency oriented ecosystem involving
both a repo component and a client-side build utility (mvn) and specific
handling of dependency resolution.
of the repository is to carry source, build instructions, build output,
and governance/provenance info (more on this in a moment). The repository
should be granular to version of source and support holding multiple
versions of build for that source. There is a nuance here in that although
you can use an SCM (git, clearcase, cvs, etc.), in theory you don't
need to because you're not doing comprehensive work on the source.
In fact, it's more about
associating a single, unchanging image of source with a compiled output, and
that is more than the traditional boundary of SCM.
As a stretch goal, the component repository should declare the component
versions of dependencies used at both compile-time and link-time (test
drivers and other actual executables). This is also similar to Maven
functionality but importantly, the repository has no built-in logic or
assuptions about how to perform transitive closure on the dependency
graph and especially common dependency version mismatch resolution.
- Develop a provenance model that objectively captures aspects of
the code base such as
Together, these objective inputs can be used to form
a subjective interpretation of the "level of security." Different
subjective models are permitted ("one man's red flag is another man's yellow")
but the objective inputs are the same. A key
feature of the model is that it accomodates components that in fact have
not been compiled in house but rather have been imported directly
as binaries which is undesirable but nevertheless transparent and managable
in this model. This enables the entire footprint to be managed consistently
and offers plenty of runway for a "Continuous Open Source Improvement" program
wherein periodically, the set of "less secure" components is assessed
for sensitivity/uptake and the highest priority items are "beefed" up.
If the component repository contains compile-time and link-time dependencies,
then a utility can be easily crafted to determine the overall posture
of any dependency graph of components.
- Compiled in house
- Declared external dependencies match same revs used to compile internally
- Not compiled in house but SHA1/MD5 digest captured and matches a
reputable source for the component
- License type rationalization (GPL, MIT, etc.)
- Test driver coverage / quality
- Static / lexical analysis performed
- Dynamic / path / mutation analysis performed
- Establish a sandbox compilation environment to ensure that all source
is compiled, scanned, analyzed, and tested in exactly the same environment,
no matter who is performing the work. Many frameworks (such as
Vagrant) can be used to do this.
- Modify software build environments to consume components from the
component repository. Ideally, the build should be able to access the
repo directly, but it is acceptable to copy components from the repo
as long as sufficient indexing material is also copied to provide an
unambiguous link back to the repo. Specifically, a Maven repo server
becomes the slave to the master repo. Do not fall into the trap of
letting a Maven repo server be the master component repository for
open source! Open source management is both broader than Java and
deeper than just housing source and build artifacts.
- If you have a sufficiently large development team footprint (more
than 50 active developers) and open source is a large and vital component
of your software, consider making one of the developers a full time
Provenance Man to continuously improve the quality of the footprint,
address version conflict issues, security issues, etc.
Like this? Dislike this? Let me know
Site copyright © 2013-2021 Buzz Moschetti. All rights reserved