System and method for hybrid kernel and user-space checkpointing using a character device

Inventors

Havemose, Allan

Assignees

Philips North America LLC

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-10621052-B1

Patent

Publication Date

2020-04-14

Expiration Date


Abstract

A system, method, and computer readable medium for hybrid kernel-mode and user-mode checkpointing of multi-process applications using a character device. The computer readable medium includes computer-executable instructions for execution by a processing system. A multi-process application runs on primary hosts and is checkpointed by a checkpointer comprised of a kernel-mode checkpointer module and one or more user-space interceptors providing barrier synchronization, checkpointing thread, resource flushing, and an application virtualization space. Checkpoints may be written to storage and the application restored from said stored checkpoint at a later time. Checkpointing is transparent to the application and requires no modification to the application, operating system, networking stack or libraries. In an alternate embodiment the kernel-mode checkpointer is built into the kernel.

Core Innovation

The invention provides a hybrid kernel-mode and user-space checkpointing approach for multi-process applications. A kernel module performs transparent checkpoint capture by capturing memory-page and kernel-state information during application execution, while user-space interceptors coordinate execution and checkpoint behavior through barrier synchronization and other controls.

A character-device checkpointer interface is used for checkpoint creation, including a read function that includes memory pages used by the applications. The checkpointer reads memory locations and creates checkpoints by capturing the memory pages, and the captured kernel state and memory information are used to write checkpoint data to storage for later restore.

An Application Virtualization Space (AVS) virtualizes operating system constructs such as PIDs, TIDs, and resource identities across restore. During restore, the initial process recreates the process hierarchy and remaps the AVS resources so that checkpointed applications, including shared/global state and process relationships, are restored while maintaining transparent operation without modifying the application, OS, networking stack, or libraries.

Claims Coverage

This patent contains four independent claims, centered on synchronization points that coordinate or pause application execution and a checkpointer that creates checkpoints by reading memory pages used by the applications, with a character-device implementation and specific read-function behaviors.

Synchronization point coordinating execution and checkpointer reading memory locations for checkpoints

The system includes one or more instructions comprising a synchronization point for coordinating execution of one or more applications at the synchronization point, and one or more instructions comprising a checkpointer configured to read one or more memory locations used by the one or more applications to create one or more checkpoints, wherein the checkpointer comprises instructions for a read function to include memory pages used by the one or more applications.

Checkpointer device read that forwards device pointer to next page after read

The system includes one or more instructions comprising a checkpointer device configured to read one or more memory locations used by the one or more applications to create one or more checkpoints, wherein the checkpointer device comprises instructions for a read function to include memory pages used by the one or more applications, and wherein the checkpointer device comprises instructions for the CPUs to forward a device pointer to a next page after a read.

Synchronization point for pausing execution with a character device checkpointer

The system includes one or more instructions comprising a synchronization point for pausing execution of the one or more applications at the synchronization point, and one or more instructions comprising a checkpointer device configured to read one or more memory pages and create one or more checkpoints by reading memory pages used by the one or more applications, wherein the checkpointer device is a character device, wherein the checkpointer character device comprises instructions for a read function to include memory pages used by the one or more applications, and wherein the checkpointer character device comprises instructions for the CPUs to forward the checkpointer character device pointer to a next page after a read.

Synchronization point pausing and triggering pause plus character-device checkpointer read

The system includes one or more instructions comprising a synchronization point for pausing execution of the one or more applications at the synchronization point and triggering the one or more applications to pause execution at the synchronization point, and a checkpointer character device comprising instructions for the CPUs for a read function to include memory pages used by the one or more applications, wherein the checkpointer character device comprises instructions for the CPUs to forward a device pointer to a next page after a read.

Across the independent claims, the core claim coverage is the combination of a synchronization point that coordinates and/or pauses application execution with a checkpointer that creates checkpoints by reading memory pages used by the applications, implemented as or accessed via a character-device read function that includes a read-function mechanism for forwarding a device pointer to a next page after a read.

Stated Advantages

Transparent operation without modifying the application, OS, networking stack, or libraries.

Documented Applications

Checkpointing and later restoring multi-process applications, including maintaining process hierarchy and shared/global state across restore.

Deployment architectures including primary and backup servers, with possible remote storage/network scenarios.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.