Document ranking systems and methods

Inventors

Fletcher, David C. • Wagers, Doug R.

Assignees

Softek Illuminate Inc

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-8180783-B1

Publication Date

2012-05-15

Expiration Date

Abstract

Systems and methods are provided for ranking document data retrieved from a data source in response to a search request. A ranking system retrieves document data from documents in the data source that each includes at least one key term that matches a search term in the search request. For each document, a term frequency value is calculated based on a number of occurrences of the key term in the document. Prefix and suffix term rules are used to determine whether a particular occurrence of the key term in a particular document should be included in determining a term weight value for that particular occurrence of the key term. A relevancy ranking value is determined for each document based on the corresponding term frequency and term weight values. The document data is displayed according to each document's corresponding relevancy ranking value.

Core Innovation

A ranking system processes a plurality of documents retrieved from a data source to generate corresponding tokens for each word in document data and to assign context-based tags to those tokens. The system retrieves at least one negation term and at least one negation rule from memory, compares the negation term according to the negation rule to other terms within a selected proximity of each word, and assigns a negative tag when the negation term matches within the selected proximity and a positive tag otherwise.

The processed document data, including document content, tokens, and tags assigned to each of the tokens, is stored in a data store. In response to a search request including document identification data, the system identifies the processed document data in the data store that corresponds to the document identification data and searches the processed document data for tokens assigned the positive context tag.

The system then generates processed document data for display that includes visual indicators for each word where the corresponding token is assigned the positive context tag. The display reflects the positive-context tagging results derived from the negation terms and negation rules.

The system further performs analytics to evaluate predictive document data against corresponding result documents data by identifying corresponding tokens in predictive and result documents that correspond to the same particular word and identifying the tags assigned to each identified corresponding token. Based on tag alignment across those tokens, the system assigns an accuracy rating to the predictive document data and stores the accuracy rating in memory.

Claims Coverage

The independent claims cover four main inventive features: context-aware token tagging using negation terms and negation rules within a selected proximity, storing processed document data for search and display using a positive context tag, predictive-versus-result analytics that assign an accuracy rating from token-tag agreement, and implementation as a non-transitory computer-readable medium including display and analytics functionality.

Token generation and negation rule tagging within selected proximity

Generate a corresponding token for each word in document data included in at least one of a plurality of documents; retrieve at least one negation term and at least one negation rule from a memory; compare the negation term according to the negation rule to other terms within a selected proximity of each word; assign a negative tag when the negation term matches and a positive tag otherwise.

Storing processed document data with tokens and tags

Store processed document data for the at least one document in a data store, the processed document data comprising document content, tokens, and the tags assigned to each of the tokens.

Search and display using positive context tag visual indicators

Receive a search request including document identification data; identify the processed document data in the data store that corresponds to the document identification data; search the processed document data for tokens assigned the positive context tag; generate the processed document data for display that includes visual indicators for each word where the corresponding token is assigned the positive context tag.

Predictive-versus-result token tag analytics with accuracy rating

Receive another search request comprising at least one of the document identification data and document preparer identification data; identify predictive document data and corresponding result documents data in the data store; identify corresponding tokens in the predictive document data and the corresponding result documents data that correspond to the same particular word; identify the tags assigned to each of the identified corresponding tokens; assign an accuracy rating based on the tags, with a first value when both tokens are assigned the positive context tag and a second value when one token is assigned the positive context tag and another token is assigned the negative tag; store the accuracy rating in memory.

Non-transitory computer-readable medium for token tagging, searching, and display

A non-transitory computer-readable medium encoded with instructions executable by at least one processor to generate tokens; retrieve negation terms and negation rules; compare according to the negation rule within a selected proximity; assign negative and positive tags; store processed document data; receive a search request including document identification data; identify corresponding processed document data; search for tokens assigned the positive context tag; and generate display output with visual indicators for each word where the corresponding token is assigned the positive context tag.

Non-transitory computer-readable medium including predictive analytics accuracy rating

A non-transitory computer-readable medium encoded with instructions executable by at least one processor to generate tokens; retrieve negation terms; compare within a selected proximity; assign negative and positive tags; store processed document data; receive search requests; identify predictive document data and corresponding result documents data; identify corresponding tokens and their tags; assign an accuracy rating with first and second values based on positive-versus-mixed tag assignments; and store the accuracy rating in memory.

Across the independent claims, the core coverage is a ranking system and method that generate word-level tokens, apply negation terms and negation rules within a selected proximity to assign positive and negative context tags, store processed document data for search, and generate display output using visual indicators for words whose tokens are assigned the positive context tag. Additional claim coverage includes analytics that evaluate predictive documents against result documents by aligning corresponding tokens for the same particular word and assigning an accuracy rating based on tag agreement patterns.

Stated Advantages

Enables searching of processed document data for tokens assigned the positive context tag.

Generates display including visual indicators for each word where the corresponding token is assigned the positive context tag.

Enables assigning an accuracy rating to predictive document data based on tags assigned to corresponding tokens in predictive and result documents.

Stores accuracy ratings for predictive documents in memory.

Documented Applications

Document ranking and search results display where positive-context words are visually indicated in response to a search request including document identification data.

Analytics use to analyze predictive documents versus result documents by token-tag agreement and to assign accuracy ratings associated with document preparers based on document preparer identification data.

Abstract
Claims Coverage
Core Innovation
Stated Advantages
Documented Applications
Interested in licensing this patent?

Document ranking systems and methods

Inventors

Assignees

Interested in licensing this patent?

Publication Number

Publication Date

Expiration Date

Abstract

Core Innovation

Claims Coverage

Token generation and negation rule tagging within selected proximity

Storing processed document data with tokens and tags

Search and display using positive context tag visual indicators

Predictive-versus-result token tag analytics with accuracy rating

Non-transitory computer-readable medium for token tagging, searching, and display

Non-transitory computer-readable medium including predictive analytics accuracy rating

Stated Advantages

Documented Applications

Interested in licensing this patent?

Stay Connected with MTEC