Systems and methods for detecting copied computer code using fingerprints
Inventors
Rogers, Daniel Jordan • Moore, Michael Alan
Assignees
Publication Number
US-8997256-B1
Publication Date
2015-03-31
Expiration Date
2034-06-25
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Systems and methods of detecting copying of computer code or portions of computer code involve generating unique fingerprints from compiled computer binaries. The unique fingerprints are simplified representations of the compiled computer binaries and are compared with each other to identify similarities between the compiled computer binaries. Copying can be detected when there are sufficient similarities between at least portions of two compiled computer binaries.
Core Innovation
The invention provides a system and method for detecting copying of computer code or portions of code by generating unique fingerprints from compiled computer binaries. These fingerprints are simplified representations that can be compared across different binaries to identify similarities, indicating possible code theft. The technique is capable of recognizing similarities at the function level, even if only portions of the code have been copied, and operates directly on compiled binaries without requiring access to source code.
Existing solutions to software theft, such as watermarking and comparing hash values, are often inadequate because they either require code modifications or only operate at the whole-file level. Watermarks can be removed or circumvented and hash comparisons fail when only parts of code are copied, making it difficult to identify partial code theft, especially of unique code segments.
The invention addresses these limitations by disassembling compiled binaries, constructing control flow and function call graphs, and analyzing each function to extract unique spectra. These spectra are used to form fingerprints that reveal overlaps between code in different binaries. The fingerprints enable effective comparison, so both full and partial code copying can be detected, enhancing the ability to protect intellectual property in software.
Claims Coverage
The patent includes multiple independent claims covering at least four distinct inventive features.
Method for generating and comparing fingerprints of compiled computer binaries
This feature covers: - Receiving a first and second compiled computer binary via a computer. - Generating a fingerprint for each binary by: 1. Generating a call graph using the compiled binary. 2. Creating a control flow graph for each function in the call graph. 3. Selecting a function, calculating its leading Eigenvector from the control flow graph's adjacency matrix. 4. Generating an edge-connected path starting from the node with the largest element in the leading Eigenvector. 5. Calculating unique spectra for the function based on this path. - Comparing the fingerprints and determining whether at least some of the first binary is present in the second.
Method for fingerprint comparison using function-level correlations
This feature encompasses: - Receiving two compiled computer binaries and generating their fingerprints. - Comparing the fingerprints by: - Computing distances between all possible pairs of functions from both binaries. - Selecting a subset of function pairs for further analysis. - Calculating a cross-correlation for unique spectra (count block size, in-degree, out-degree) along edge-connected paths for each selected pair. - Using the correlation coefficients (block size, in-degree, out-degree) to assess similarity and the presence of copied code.
System for detecting presence of code in binaries
This feature includes: - An input for receiving first and second compiled binaries. - A microprocessor that: - Generates fingerprints for both binaries as described above (using call graphs, control flow graphs, Eigenvector computation, and unique spectra extraction). - Compares the fingerprints to determine if any part of one binary exists in the other. - An output configured to indicate whether copying or similarity has been detected.
System for fingerprint comparison using function pair correlation
This feature entails: - An input for compiled binaries, a microprocessor, and an output. - The microprocessor compares fingerprints by: - Computing distances between all possible function pairs of the binaries. - Selecting function pairs below a threshold distance. - Calculating cross-correlations of unique spectra for each pair (block size, in-degree, out-degree). - Using ratios of correlation coefficients above certain thresholds to determine code presence and potential infringement.
The inventive features center on novel methods and systems for generating and comparing function-level fingerprints of compiled binaries using graph-based and statistical techniques to detect both full and partial code copying.
Stated Advantages
Operates on compiled computer binaries, eliminating the need for access to source code.
Detects copying of both entire binaries and partial code segments, enabling identification of code theft even when only unique functions or sections are stolen.
Reduces false positives by relying on unique spectra of control flow graphs and function-level analysis rather than simple hash or watermark techniques.
Supports automated and scheduled code collection, fingerprint generation, and comparison, providing a cost- and time-effective approach for intellectual property protection.
Documented Applications
Detecting and identifying theft or unauthorized copying of computer code in compiled binaries for enforcement of intellectual property rights.
Interested in licensing this patent?