feature article
Subscribe Now

Anatomy of a Software Code Audit Process

Software has become a major component of products that are produced by most technology companies and is rarely written from scratch. Resourceful software development organizations and developers use a combination of previously-created code, commercial software, open-source software, and their own creative content to produce the desired software product or functionality. Anytime a product containing software changes hands, there is a need to understand its composition, its pedigree, its ownership, and any third-party (including open-source software) licenses or obligations that govern its use by its new owners.

Avoiding Uncertainties in a Technology Transaction

Technology transactions that involve software may include the launch of a product into the market, mergers & acquisitions (M&A) of companies with software development operations, or technology transfer between organizations, whether they are commercial, academic or otherwise public.

Any uncertainty around either ownership of software or compliance with the licenses associated with software can:

  • deter downstream users,
  • reduce the ability to create partnerships,
  • create litigation risk to the company and downstream users,
  • increase risk and threaten closures in funding deals,
  • negatively impact M&A activities,
  • increase product time to market, and
  • affect company valuation.

So how can all of this be avoided?

A software code audit is a good way to determine what is in your software product. A software code audit should not be confused with the more commonplace software audit process: the latter generally has to do with making sure you have paid for the software applications (e.g. Microsoft Office) you are using in your organization. Software code audits identify building blocks (files or software modules or packages, or even five lines of external code) that are used in a product or exist in the code inventory of an organization.

The audit process establishes code ownership, licensing, or copyright obligations around any third-party content in the code portfolio; it also identifies code authorship, package versions, and export restrictions. Software code audits can also highlight alignment with the policies around either use or delivery of software in a particular organization. Software code audits can also pinpoint code reuse between different portfolios within or across organizations.

The common mistake is to start a code audit process only in the last step of a transaction.  Starting the audit in anticipation of a transaction allows for timely correction of any shortcomings detected during the audit. You certainly do not want to delay a transaction because of uncertainties uncovered during the audit.

What Is Software Code Audit Process?

Except in the simplest, smallest code portfolios, a manual audit of a company’s code portfolio takes time, is inaccurate, and is expensive. Automated code scanning solutions can sift through large portfolios quickly and efficiently, detecting outside code and retrieving licensing and other attributes of external components. While automated code scanning solutions operate very fast, there is still a human element to a thorough audit project. Our experience has shown that most of the time an audit is taken by the front-end process of audit preparation and back-end results interpretation and report generation.

The software code audit process usually involves:

  • Establishing a legal framework (NDA) between the parties involved and the auditor.
  • Questions and answers between the parties involved to establish:
    • the objectives of the audit, to understand the company or product that is audited;
    • the specific business of the target companies; 
    • their third-party software practices;
    • the software environment that is used in the target company; and
    • their open source adoption policy (if any).

In some cases, all code that must be audited is not in one place, or must be “assembled” before an audit is carried out. Depending on the size of the project, the front-end process can take can take 1-5 days.

Software Code Scanning and Detection

Once the legal framework is in place, the code is available, and the environment discovery process is complete, an automated scanning application is set up. The complete job is broken into logically-meaningful segments (for example, identifiable subprojects and modules), and then the actual automated code scanning is carried out. Ownership warnings generated by the automated application (such as proprietary code without appropriate headers or copyright information, or conflicting license information) are brought to the attention of the staff, and either resolved or marked for further action.

The reports created by the automated solution are reviewed by an expert audit staff, and a final executive report is assembled. Depending on the size of the audit project, this step can take as little as a couple of days (small project containing thousands of files) and up to two weeks (for a very large portfolio of hundreds of thousands of software files). 

End-Results of a Software Code Audit

The end result of a software code audit is a combination of two reports.

The first is a high-level executive overview report that is custom-created by the audit staff. This report defines the software code audit environment, the process used, and the major findings, in simple graphical and tabular format. Attention is drawn to specific packages, files, or licenses. Information on commercial or open-source software components, a description what each piece of software does, who created it, and related references on public-domain project websites should be provided.

Important information such as copyright owners, licenses associated with the discovered software packages, and, optionally, encryption or export obligations, are tabulated. The text of all licenses that are discovered is included with this report. The report lists all external content, including complete third-party software files, modules, or projects as well as snippets of code that have a code structure similar to known open source projects. The findings of a software code audit must be verifiable; therefore references or hyperlinks to all information that is discovered should be provided.

The second report is a detailed machine-generated report that lists:

  • all packages, files, licenses, copyrights, etc. associated with all software files in the target portfolio, and,
  • optionally, a license obligation report, summarizing the obligations associated with all licenses found in the portfolio.

The detailed report can be very large and is normally provided as a back up to the high-level executive report. This report is normally consulted if a specific project or file requires further scrutiny.

How Much Does a Software Code Audit Cost?

Generally the cost of an audit is proportional to the complexity of the project, which in turn can be roughly defined as the number of files in the target portfolio, the nature of the packages (commercial or public domain) used in the portfolio, and the information that is available about those packages. Most audits (thousands and up to hundreds of thousands of software files) fall within a $5-$40K range.

If you’re planning a specific transaction involving software assets, whether it’s an M&A, equity investment, product introduction, demand for IP indemnity, commercialization of research, or other event, conduct a software code audit as early as possible in the transaction. Knowing what’s in the code can speed up transaction times and reduce costs associated with fixing problems at the last minute.

Kamal Hassin, Director, R&D and Product Management at Protecode, is a thought-leader in the area of open source licensing and is the author or co-author of a number of papers on Software Intellectual Property management.  Kamal has a Bachelor of Engineering degree and a Masters degree in Technology Innovation Management from Carleton University.  He can be reached at khassin@protecode.com.

5 thoughts on “Anatomy of a Software Code Audit Process”

  1. Pingback: GVK BIO
  2. Pingback: DMPK
  3. Pingback: www.mine-craft.me
  4. Pingback: Continued

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

featured video

Improve SoC-Level Verification Efficiency by Up to 10X

Sponsored by Cadence Design Systems

Chip-level testbench creation, multi-IP and CPU traffic generation, performance bottleneck identification, and data and cache-coherency verification all lack automation. The effort required to complete these tasks is error prone and time consuming. Discover how the Cadence® System VIP tool suite works seamlessly with its simulation, emulation, and prototyping engines to automate chip-level verification and improve efficiency by ten times over existing manual processes.

Click here for more information about System VIP

featured paper

Reducing Radiated EMI

Sponsored by Maxim Integrated

This application note explains how to reduce the radiated EMI emission in the MAX38643 nanopower buck converter. It also explains the sources of EMI noise, and provides a few simple methods to reduce the radiated EMI and make the MAX38643 buck converter compliant to the CISPR32 standard Class B limit.

Click here to download the whitepaper

Featured Chalk Talk

Smart Embedded Vision with PolarFire FPGAs

Sponsored by Mouser Electronics and Microchip

In embedded vision applications, doing AI inference at the edge is often required in order to meet performance and latency demands. But, AI inference requires massive computing power, which can exceed our overall power budget. In this episode of Chalk Talk, Amelia Dalton talks to Avery Williams of Microchip about using FPGAs to get the machine vision performance you need, without blowing your power, form factor, and thermal requirements.

More information about Microsemi / Microchip PolarFire FPGA Video & Imaging Kit