Authorship technologies
Inventors
Juola, Patrick • Overly, James Orlo • Noecker, John Isaac • Ryan, Michael • Gray, Christine
Assignees
Duquesne University of the Holy Spirit
Publication Number
US-11605055-B2
Publication Date
2023-03-14
Expiration Date
2032-05-04
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Novel distractorless authorship verification technology optionally combines with novel algorithms to solve authorship attribution as to an open set of candidates—such as without limitation by analyzing the voting of “mixture of experts” and outputting the result to a user using the following: if z (z=pi−pj√ pi+pj−(pi−pj)2/n) is larger than a first predetermined threshold then author j cannot be the correct author; or if z (z=pi−pj√ pi+pj−(pi−pj)2/n) is smaller than a second predetermined threshold then author i cannot be the correct author; or if no author garners significantly more votes than all other contenders then none of the named authors is the author of a document in question—in a number of novel applications. Personality profiling and authorship attribution may also be used to verify user identity to a computer.
Core Innovation
The invention provides novel computer-based methods for authorship verification and attribution, including both distractorless and improved mixture-of-experts approaches. The invention eliminates the need for distractor sets by verifying authorship using only the candidate author's known prior writing samples for comparison, or alternatively, applies improved mathematical analysis to voting results from multiple automated attribution systems across an open set of candidate authors.
The invention addresses the problem in the field of authorship attribution, particularly the limitations in open set scenarios where tools have failed to reliably answer questions such as 'none of the above' or author verification regarding a single candidate. Prior art relied heavily on distractor sets and was inadequate for open set verification, creating an unmet need for improved, reliable computer-based solutions.
The core technique involves assembling a 'mixture of experts' by deploying at least two, preferably many more, independent authorship attribution systems to analyze both the text in question and a distractor set (when used). Votes are tallied and subjected to a specific statistical analysis, using z-tests and defined thresholds, to determine if one author is identified, or if 'none of the above' is the correct conclusion. The method allows customization of statistical sensitivity and is applicable not only to attribution but also to author profiling.
Additionally, the distractorless approach analyzes similarity exclusively between the candidate author's own texts and the questioned document, utilizing empirically determined thresholds to reliably verify authorship. This flexibility extends to profiling tasks, such as detecting demographic or psychological characteristics, thus broadening the technology's utility.
Claims Coverage
The patent contains one independent claim describing the principal inventive system with several key inventive features.
System for authorship attribution analysis using a mixture of experts with statistical thresholding
The system creates a compilation of at least two or more separate automated authorship attribution software-containing systems, associating this with a vote compiler. It compiles an open distractor set comprising texts authored by a pool of potential authors and generates, in electronic form, a query text for which authorship is to be determined. The system deploys the data compilation, distractor set, and vote compiler in cooperation with the query text to identify the author by executing an algorithm that:
Algorithm for determining authorship by comparative vote proportions and thresholds
- Calculates the proportion of votes for author i (pi) and for another author j (pj) across n experts. - Computes the statistic z = (pi - pj) / sqrt(pi + pj - (pi - pj)^2 / n). - Excludes author j if z is larger than a predetermined first threshold and the difference between pi and pj is significant. - Excludes author i if z is smaller than a second threshold (the negative of the first threshold) and the difference is significant. - Determines 'none of the above' if no author garners significantly more votes than all others.
Overall, the claims cover a system and method for open-set authorship attribution using multiple automated attribution tools, a vote compiler, and statistically defined thresholds to reliably identify the author or determine if the correct author is absent.
Stated Advantages
The invention enables reliable authorship attribution and verification in open set scenarios, including concluding 'none of the above' as an answer.
By employing a mixture of multiple expert systems, the approach cancels out errors or inaccuracies present in any single attribution method.
The distractorless method eliminates the need for assembling distractor author sets, reducing complexity and dependency on potentially arbitrary comparison selections.
The presented statistical methods provide adjustable error tolerances and confidence levels, allowing suitable application across forensic, security, academic, and clinical domains.
The technologies offer improved or comparable accuracy, precision, and recall compared to traditional closed-set and mixture-of-experts methods.
Documented Applications
Preventing transmission of spam e-mail by verifying the authorship of outgoing messages before allowing transmission.
Identifying mental health vulnerability as an adjunct diagnostic tool by comparing writing stylometrics to profiles associated with known mental health conditions.
Detection and prevention of plagiarism by analyzing authorship of submitted texts.
Complementary physical health assessment, such as evaluating professional athletes for concussion.
Detection or prevention of posting-while-intoxicated or otherwise impaired.
Detection or prevention of account sharing to ensure single-user accountability.
Target marketing by screening individuals' electronic media posts for market category evaluation.
Detection or confirmation of first language of the writer.
Detection or verification of age or age range based on stylometric analysis.
Pre-employment personality screening.
Detection of fraudulent applications for employment, licensure, or certification.
Compatibility screening for employment, dating, or other matching-type Internet sites.
Detection or prevention of workplace security incidents such as anger eruptions, workplace violence, espionage, sabotage, fraud, theft, security violations, and other policy violations.
Relationship enhancement for marriages, work teams, or other groups through language style analysis.
Personality profiling and verification for computer access security, such as replacing challenge question systems with personality profile-based identification.
Verification of user identity to a computer system by analyzing ongoing or challenge-based text or speech entry.
Interested in licensing this patent?