Systems and methods for detecting copied computer code using fingerprints

Inventors

Rogers, Daniel JordanMoore, Michael Alan

Assignees

Deloitte Development LLC

Publication Number

US-9218466-B2

Publication Date

2015-12-22

Expiration Date

2034-06-25

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

Systems and methods of detecting copying of computer code or portions of computer code involve generating unique fingerprints from compiled computer binaries. The unique fingerprints are simplified representations of the compiled computer binaries and are compared with each other to identify similarities between the compiled computer binaries. Copying can be detected when there are sufficient similarities between at least portions of two compiled computer binaries.

Core Innovation

The invention is directed to systems and methods for detecting the copying of computer code or portions of computer code by generating unique fingerprints from compiled computer binaries. These fingerprints are simplified representations of the compiled binaries, allowing comparisons between different binaries in order to identify similarities. Copying is detected when there are sufficient similarities between at least portions of two compiled computer binaries.

The problem addressed is that existing techniques for combating software theft, such as software keys, online activation, encryption, and watermarking, are either vulnerable to circumvention or require undesirable code modification. These methods also typically focus on entire software packages and are ineffective when only portions of code are stolen or reused. For example, watermarking and hash comparison cannot detect copying when just part of the code is misappropriated, leaving valuable and unique code portions unprotected.

The proposed solution involves generating fingerprints by disassembling compiled computer binaries to create control flow graphs and function call graphs for each function. From these graphs, unique spectra are identified and calculated, comprising elements like block size, in-degree, and out-degree along certain paths within the control flow graph. Comparison of these unique spectra across different binary fingerprints allows the system to reliably identify similarities at the functional level, detecting both full and partial code copying even when the source code is unavailable or the overall operations of the binaries differ.

Claims Coverage

The patent includes four independent claims, each covering distinct aspects of the invention's method and system for detecting copied computer code using fingerprints.

Method for detecting code copying using fingerprints of compiled binaries

This inventive feature covers: - Receiving a first and a second compiled computer binary by a computer. - Disassembling the binaries into a form independent of programming language, operating system, and architecture. - Generating fingerprints for each compiled binary by: - Creating a call graph describing relationships between functions and sub-functions. - Creating control flow graphs for each function in the call graph. - Selecting functions and calculating leading Eigenvectors of adjacency matrices. - Generating edge-connected paths using the largest elements in the leading Eigenvector. - Calculating unique spectra (such as block size, in-degree, and out-degree) for the selected functions. - Comparing the fingerprints and determining if at least some of one binary is present in the other based on the comparison.

Method for comparing fingerprints using cross-correlation of unique spectra

This inventive feature includes: - Comparing first and second fingerprints by computing distances between pairs of possible functions from both binaries. - Selecting a subset of function pairs based on computed distances. - Calculating cross-correlation of unique spectra (block size, in-degree, out-degree) for each selected function pair. - Producing correlation coefficients for block size, in-degree, and out-degree spectra that reflect functional similarity between portions of the binaries.

System for generating, comparing, and indicating code copying from compiled binaries

This inventive feature includes: - An input configured to receive two compiled computer binaries. - A microprocessor configured to disassemble the first binary into a form independent of language, OS, and architecture, and to generate fingerprints for both binaries. - The microprocessor is further configured to: - Generate call graphs and control flow graphs. - Calculate leading Eigenvectors and generate edge-connected paths. - Calculate unique spectra of control flow graphs. - Compare fingerprints to determine if at least some of one binary is present in the other. - An output to indicate whether such copying has been detected.

System for comparing function pairs and indicating code similarity based on cross-correlation

This inventive feature includes: - The microprocessor computes distances between each possible function pair from two binaries. - A subset of function pairs is selected for evaluation. - The microprocessor calculates cross-correlation of the unique spectra (block size, in-degree, and out-degree) for the selected function pairs. - The correlation coefficients generated are used to determine whether at least some of the first binary is present in the second binary.

The inventive features collectively define comprehensive methods and systems for fingerprinting compiled computer binaries, analyzing and comparing structural and functional attributes, and determining partial or full code copying with high precision.

Stated Advantages

The invention provides an effective and efficient method to discover, remediate, and enforce intellectual property rights by detecting both entire and partial copying of compiled computer binaries.

It operates directly on compiled binaries without needing access to source code, increasing applicability and reducing dependency on programming specifics.

The system does not require modification of the computer code, avoiding drawbacks of watermarking and other pre-existing techniques.

By identifying similarities at the function level, it reduces false positives from common code libraries and enhances detection accuracy for unique code portions.

The processes can be automated and scheduled, providing a cost- and time-effective solution.

The technique serves as a deterrent against software code theft.

Documented Applications

Detection of unauthorized copying of software code or code segments by comparing compiled computer binaries.

Automated discovery and enforcement of intellectual property rights related to software code by identifying copied code in binaries collected from various sources, including those obtained via web crawling.

Identifying partial as well as entire copying of software code to assist in software theft prevention and remediation.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.