Authorship technologies
Inventors
Juola, Patrick • Overly, James Orlo • Noecker, JR., John Isaac • Ryan, Michael • Gray, Christine
Assignees
Duquesne University of the Holy Spirit
Publication Number
US-10657494-B2
Publication Date
2020-05-19
Expiration Date
2032-05-04
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Novel distractorless authorship verification technology optionally combines with novel algorithms to solve authorship attribution as to an open set of candidates—such as without limitation by analyzing the voting of “mixture of experts” and outputting the result to a user using the following: if z (z=pi−pj√pi+pj−(pi−pj)2/n) is larger than a first predetermined threshold then author j cannot be the correct author; or if z (z=pi−pj√pi+pj−(pi−pj)2/n) is smaller than a second predetermined threshold then author i cannot be the correct author; or if no author garners significantly more votes than all other contenders then none of the named authors is the author of a document in question—in a number of novel applications. Personality profiling and authorship attribution may also be used to verify user identity to a computer.
Core Innovation
The invention introduces novel computer-based methods for authorship verification and attribution, including a distractorless technology where verification can be performed using only the candidate author's own prior writings as comparison data. This departs from traditional methods which rely on distractor sets—writings from other potential authors—thereby streamlining and potentially increasing the reliability of the verification process.
Further, the invention enhances the use of existing authorship attribution tools that employ distractor sets by implementing a mathematical approach that allows the analysis of a pool of candidates in such a way that 'none of the above' can be the formal result. This is enabled by using a mixture of automated authorship attribution systems (preferably 30-100 or more), compiling their votes for each author, and quantitatively analyzing vote differentials through a z-test to assess statistical significance between candidates.
The methods can be employed in both yes/no (single candidate) verification as well as open set attribution tasks, and the mathematical thresholds can be adjusted according to the context or required error tolerance. The technology is also extended to enable author profiling and secure user verification for computer access, by comparing a user's responses or writing against stored models based solely on their own data.
Claims Coverage
There are two independent inventive features described in the claims of this patent.
Computer-based distractorless authorship verification using only candidate's own writings
The method increases the efficiency of a computer in verifying authorship of a document by comparing stylometrics from the document for which verification is sought solely with prior known writings of the candidate author, without using writings of any other authors. - Steps include compiling a set of training data written by the candidate author, extracting linguistic/token features to create a feature vector, extracting a feature set from the document in question, selecting a distance function, and assigning one or two empirically determined thresholds to evaluate author similarity for verification. - Output is delivered to a user based on these results, and the benefits are achieved by omitting any distractor set in the comparison process.
Computer-based distractorless authorship verification outputting a probability/confidence function
The method increases computer efficiency in authorship verification by comparing stylometrics of the questioned writing to those of only the candidate's known writings, then applying: - Steps similar to the above, but instead of fixed thresholds, a monotonically decreasing function is selected such that the function output provides the probability or confidence that the writing was authored by the candidate. - This confidence value is rendered as output to a user, and the method's benefits arise from using only the candidate author's writings, not those of multiple candidates for comparison.
The claims cover methods of computer-based authorship verification that use only the writings of the candidate author, encompassing both threshold-based and probabilistic/confidence-based output schemes, and explicitly exclude the use of distractor author sets.
Stated Advantages
Eliminates dependence on distractor sets, allowing verification using only the candidate author's prior writings.
Reduces or solves problems of reliability and accuracy in authorship verification, especially in open set and single-candidate contexts.
Provides statistically robust outcomes, including 'none of the above,' via a mathematical analysis of results from multiple attribution systems.
Yields better accuracy and reliability compared to traditional closed-set and mixture-of-experts verification methods.
Enables new and improved applications, such as secure user identification for computer access, e-mail spam prevention, and adjunct diagnostic profiling.
Documented Applications
Personality profiling and using authorship attribution to verify user identity to a computer.
Computer access security by verifying identity without relying on traditional challenge questions, using personality or authorship profiling.
Ongoing or sporadic monitoring of keyboard or speech input to verify user identity for security purposes.
E-mail spam prevention by stopping imposter-authored e-mails at the source, before transmission.
Adjunct medical mental health diagnostics by profiling text for indicators of mental health conditions.
Plagiarism detection and prevention.
Complementary physical health assessment, such as evaluating NFL players for concussion.
Detection or prevention of posting-while-intoxicated or otherwise impaired.
Detection or prevention of account sharing.
Target marketing through analysis of social media posts for market categorization.
Detection or confirmation of first language or verification of age or age range.
Pre-employment personality screening.
Detection of fraudulent applications for employment, licensure, or certification.
Compatibility screening for employment, dating, or other matching-based internet services.
Detection or prevention of workplace security incidents including anger eruptions, violence, espionage, sabotage, fraud, theft, security violations, criminal conduct, or work policy violations.
Initiating or maintaining personal relationships by matching or teaching compatible language styles for dating, marriage counseling, or team building.
Interested in licensing this patent?