Systems and methods for detecting copied computer code using fingerprints
Inventors
Rogers, Daniel J. • Moore, Michael • Blazakis, Dionysus
Assignees
Publication Number
US-9459861-B1
Publication Date
2016-10-04
Expiration Date
2036-03-24
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Systems and methods of detecting copying of computer code or portions of computer code involve generating unique fingerprints from compiled computer binaries. The unique fingerprints are simplified representations of functions in the compiled computer binaries and are compared with each other to identify similarities between functions in the respective compiled computer binaries. Copying can be detected when there are sufficient similarities between fingerprints of two functions.
Core Innovation
The invention is directed to systems and methods for detecting copying of computer code or portions of computer code by generating unique fingerprints from compiled computer binaries. The fingerprints are simplified representations of functions within the compiled binaries and are compared to identify similarities between functions across different binaries. This process enables the detection of copied code by determining whether sufficient similarity exists between fingerprints of respective functions.
Conventional techniques for combating software theft, such as software keys, online activation, encryption, and watermarking, are insufficient because they can often be bypassed and typically only address copying of entire software packages. They are not capable of identifying theft when only portions of code are copied, particularly at the function level, nor do they work reliably when software is modified or compiled differently.
The present invention addresses these deficiencies by creating fingerprints based on the internal structure of functions in compiled binaries. This involves disassembling the binary, generating control flow graphs for each function, calculating block rank scores using a modified PageRank algorithm with reversed edges, and synthesizing a fingerprint as an ordered list of 3-tuples (number of instructions, in-degree, and out-degree) along a characteristic path through the function. The comparison of such fingerprints enables the identification of code copying even if the binaries are compiled differently or only partial copying has occurred without access to source code.
Claims Coverage
The patent contains multiple independent claims covering distinct inventive features related to generating and comparing fingerprints of functions within compiled computer binaries to detect code copying.
Generation and comparison of function fingerprints using block rank scores and ordered paths
A method in which a computer receives compiled computer binaries and generates fingerprints for functions by: 1. Generating a block rank score for each block in each function by reversing all edges in the control flow graph and assigning scores based on blocks with forward paths. 2. Generating a path of blocks based on block rank scores. 3. Constructing the fingerprint from this path. The method then compares fingerprints between functions from different binaries to determine if code from one function is present in another.
Database-based detection of copied code using fingerprint comparison
A method involving: - Receiving multiple compiled computer binaries. - Generating fingerprints for functions in each binary and storing them in a database. - Receiving a query function, generating its fingerprint, and comparing it against stored fingerprints to determine if the query contains code copied from any stored function using the path and block rank approach.
Fingerprint comparison using ordered 3-tuple lists and maximum match pathway
A method where the fingerprint of a function is represented as an ordered list of 3-tuples (number of instructions, in-degree, out-degree) for blocks along a characteristic path, and comparison between functions is performed by: - Generating a two-dimensional array of all possible 3-tuple pair combinations from two function fingerprints. - Identifying a maximum match pathway through this array for similarity assessment.
The independent claims collectively cover generating unique fingerprints for functions in compiled binaries using ordered paths and block rank scores, storing and comparing such fingerprints in a database environment, and employing ordered 3-tuple-based comparison algorithms to detect copied code even across different compilers and partial code copies.
Stated Advantages
Allows identification of copied code even when only a portion of code, such as a function, is copied rather than the entire software.
Operates on compiled binaries, eliminating the need for source code access.
Can identify copied code even if the code is compiled with different compilers and contains compiler-added blocks not present in both binaries.
Reduces computational load required for fingerprinting and comparison, improving the performance of the computer executing these methods.
Provides a cost- and time-effective approach for discovering, remediating, and enforcing intellectual property rights related to software code.
Documented Applications
Detecting instances of software theft where functions or code portions are copied from one compiled binary into another, including across differently compiled binaries.
Automated collection, fingerprinting, and comparison of binaries for intellectual property enforcement and deterrence against code theft.
Providing a service for software developers or entities to check if their code has been copied into third-party software using a fingerprint database.
Interested in licensing this patent?