Automated program synthesis from natural language for domain specific computing applications

Inventors

Hsiao, Michael S.

Assignees

Virginia Tech Intellectual Properties IncVirginia Polytechnic Institute and State University

Publication Number

US-10843080-B2

Publication Date

2020-11-24

Expiration Date

2037-02-23

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

Disclosed are various embodiments for automated program synthesis from a natural language for domain specific computing applications. In one embodiment, a natural language processor may be configured to parse words from a sentence of text formed in a natural language, such as English, following a grammatical structure for the natural language. The words may be compared to a dictionary to identify a token. The text formed in the natural language may be converted to an intermediate format of programming code in a programming language, such as C, where the intermediate format includes the token. The token may invoke a function or a routine of a library written in the programming language. The intermediate format may be compiled into executable program code to generate an application, such as a video game, for execution.

Core Innovation

The invention provides a system and method for automated program synthesis from natural language text for domain-specific computing applications. A natural language processor parses sentences written in a natural language, such as English, following the grammatical structure of that language. The system compares the words to a domain-specific dictionary to identify tokens, which are linked to functions or routines in a programming language library. The text is then converted into an intermediate programming code format and compiled into executable code to generate an application.

The problem addressed by the invention lies in the ambiguity and nuance of natural language, which makes direct natural language programming challenging. Traditional approaches require users to learn the syntax and semantics of programming languages, which can limit productivity and hinder accessibility, especially for those lacking programming background. This system seeks to enable users to write application logic using natural language, which is programmatically translated into executable code.

The system incorporates techniques such as token identification, synonym, verb, and pronoun resolution, and fuzzy grammar matching to overcome the complexity of natural language. Words not found in the dictionary are analyzed contextually and may be added to the dictionary or a temporary one if their meanings can be inferred. A certainty metric is generated for each parsed sentence, indicating the degree of programmatic understanding, and error messages are produced when sentences fall below a confidence threshold. Overall, the approach leverages domain-specific libraries and dynamic language models to facilitate automated application development from natural language input.

Claims Coverage

The patent contains two principal independent claims, each outlining key inventive features of the system and method for automated program synthesis from natural language for domain specific applications.

Automated parsing and tokenization of natural language text

The system identifies a plurality of words from at least one natural language sentence, compares these words to a dictionary, and determines for each word whether it has a corresponding entry. - For words with dictionary entries, an associated token is identified. - For words without entries, contextual analysis of antecedent and consequent clauses is used to determine meaning, and new tokens are associated and possibly stored for later use. - Tokens invoke functions or routines in a predetermined programming language library. - An error message is generated if a word is not found in the dictionary and its meaning cannot be determined. - Ultimately, the natural language text is converted into an intermediate programming code format using the tokens, which is then compiled into executable program code.

Contextual determination and binding of new terms

When processing words not present in the dictionary, the system employs contextual analysis of sentence clauses (antecedent and consequent) to infer meanings. - New words are not only given meaning but also bound to objects in the application, allowing those words to serve as variables that can be consulted during execution. - The inferred new terms (second tokens) are stored for future reference in association with their derived meanings.

Certainty metric generation and error handling during program synthesis

The system generates a certainty metric for each sentence, indicating the degree to which the sentence's meaning was programmatically understood. - If the certainty metric falls below a predefined error threshold, the system generates and provides an error message to the user. - Errors must be resolved before the natural language can be successfully converted to executable code.

Domain-specific application support via libraries

The system uses libraries containing functions and routines corresponding to specific application domains. - The dictionary, libraries, and operation of the natural language compiler are tailored to domains such as video games. - The system can compile and execute applications using these domain-specific resources.

Flexible architecture supporting local and remote compilation

The claimed invention covers both remote and local implementations. - The system may execute program instructions locally on a client device or remotely via a server. - A user interface is generated to receive natural language text input, which is then processed as described.

The claims comprehensively cover a system and method for translating domain-specific natural language input into executable application code by means of dictionary-based tokenization, contextual analysis for unseen terms, certainty metric feedback, and the use of domain-tailored libraries in both local and remote computing environments.

Stated Advantages

Significantly increases productivity in software development by allowing programming in natural language.

Helps eliminate the inherent fear and frustration of learning a conventional computer programming language, thus making computing education more accessible.

Provides innovations in software design, execution, and understanding by enabling natural language program synthesis.

Allows users unfamiliar with formal programming languages to create applications, broadening accessibility to software creation.

Reduces the complexity of modeling grammar rules for programming languages by using fuzzy grammar matching and dynamic learning of new terms.

Documented Applications

Automated synthesis of executable video game applications from natural language descriptions.

Generation of video games where game logic is expressed as a story or description in a natural language.

Creation of applications for different domains, including but not limited to, video games, web browsers, word processing applications, and social networking applications, using natural language input.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.