Data driven computer user emulation
Inventors
Oesch, Timothy S. • Bridges, Robert A. • Nichols, Jeffrey A. • Smith, Jared M. • Verma, Kiren E. • Weber, Brian • Diallo, Oumar Souleymane
Assignees
Publication Number
US-12307273-B2
Publication Date
2025-05-20
Expiration Date
2041-08-10
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Whether testing intrusion detection systems, conducting training exercises, or creating data sets to be used by the broader cybersecurity community, realistic user behavior is a desirable component of a cyber-range. Existing methods either rely on network level data or replay recorded user actions to approximate real users in a network. Probabilistic models can be fit to actual user data (sequences of application usage) collected from endpoints. Once trained to the user's behavioral data, these models can generate novel sequences of actions from the same distribution as the training data. These sequences of actions can be fed to emulator software via configuration files, which replicate those behaviors on end devices. The models are platform agnostic and can generate behavior data for any emulation software package. In some embodiments a latent variable is added to faithfully capture and leverage time-of-day trends.
Core Innovation
The invention provides a data-driven approach for emulating computer user behavior by collecting detailed user activity data and modeling it with a time-aware probabilistic model. This approach generates new, realistic sequences of user actions by learning from real user behavior attributes such as activity order, duration, and time of day. The generated behavior sequences can then be used to drive emulation software on computers or virtual machines, closely simulating actual user activity.
The problem addressed is the lack of realism in existing user emulation technologies, which commonly replay recorded behaviors, generate network traffic directly, or use manually constructed configuration files, resulting in simulated behaviors that are easily distinguished from real users. These systems also fail to capture crucial sequential and temporal aspects of user activity, such as timing patterns related to time of day and realistic sequences of user actions.
By leveraging time-aware probabilistic models, including Markov Chain, Hidden Markov Model, and Random Surfer architectures, the system is able to learn and replicate characteristics like application switching patterns, usage duration, and temporal trends throughout the day. The generated sequences are processed into configuration files compatible with various emulation platforms, allowing for high-fidelity simulations that can serve as digital twins of real users on any computing device.
Claims Coverage
There are four main inventive features described in the independent claims.
Time-aware probabilistic modeling of user behavior
Instructions enable the reception of computer-user activity data, which captures user attributes such as activity order, duration, and time of day, and the modeling of this data using a time-aware user behavior probabilistic model. The model is used to generate new user behaviors by producing sequences that are similar to the input data with respect to one or more behavioral attributes.
Emulation of user behavior on computer systems
The generated user behaviors are used to drive a computer system to emulate real user actions, providing instructions for a computer or system to act in accordance with the synthesized behavior sequences as determined by the probabilistic model.
Indistinguishable behavior distribution via refined modeling
Modeling and generating user behaviors are refined to ensure that the distribution of generated behaviors cannot be distinguished from that of a real user. This includes evaluating and updating the model to ensure generated data matches the fidelity and characteristics of real user data.
Production of configuration files and emulator control
User behavior sequences generated by the probabilistic model are output as configuration files, which are used to control user emulators on computer systems. The system provides instructions for running user emulators based on these files, including capabilities for logging emulator status, receiving activation or deactivation commands, and supporting out-of-band traffic for emulator control and status monitoring.
The independent claims collectively cover collecting and modeling detailed user activity data with a time-aware probabilistic model, generating realistic user behaviors, using these to emulate user actions on computer systems, ensuring high fidelity with real user data, and supporting emulator control and monitoring through configuration files and status management.
Stated Advantages
Enables generation of high-fidelity, realistic user behavior data that closely resembles real user actions, including temporal and sequential attributes.
Allows for the creation of novel and unique behavior sequences beyond replaying recorded sessions, facilitating endless original data generation.
Improves the realism and effectiveness of cyber testing, training, and technology evaluation by more accurately simulating real users within cyber environments.
Supports scalable deployment, allowing hundreds or thousands of emulators to be run simultaneously for large-scale experiments.
Provides extensibility, enabling new user actions to be added to the system with minimal effort.
Documented Applications
Generating realistic host and network data for testing cybersecurity tools such as User Behavior Analytics, Anomaly Detection, and Intrusion Detection Systems.
Creating cyber deception capabilities, including software-defined shadow networks that camouflage real assets with realistic emulated users.
Producing data sets for developing machine learning or artificial intelligence models that leverage data from user actions.
Enhancing realism in cyber exercises, including red-team events and operator training for network defense.
Impersonation of specific computer users for research, testing, or non-security IT tools and technologies.
Generating novel datasets to support research and industry needs in cybersecurity and network operations.
Interested in licensing this patent?