Hyperparameter tuning using visual analytics in a data science platform

Inventors

Convertino, GregorioLi, TianyiMost, Haley AllenWang, WenboTsai, Yi-HsunZajonc, Michael TristanLee Williams, Michael John

Assignees

Cloudera Inc

Publication Number

US-12248888-B2

Publication Date

2025-03-11

Expiration Date

2038-09-21

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

Techniques are disclosed for facilitating the tuning of hyperparameter values during the development of machine learning (ML) models using visual analytics in a data science platform. In an example embodiment, a computer-implemented data science platform is configured to generate, and display to a user, interactive visualizations that dynamically change in response to user interaction. Using the introduced technique, a user can, for example, 1) tune hyperparameters through an iterative process using visual analytics to gain and use insights into how certain hyperparameters affect model performance and convergence, 2) leverage automation and recommendations along this process to optimize the tuning given available resources, 3) collaborate with peers, and 4) view costs associated with executing experiments during the tuning process.

Core Innovation

The invention disclosed relates to techniques for facilitating the tuning of hyperparameter values during the development of machine learning (ML) models using visual analytics in a data science platform. This technique enables a user to iteratively tune hyperparameters through interactive visualizations that dynamically update according to user interaction, thereby gaining insights into how certain hyperparameters affect model performance and convergence.

The technique also leverages automation and recommendations to optimize the tuning process given available computing resources, supports collaboration among users, and provides visibility into the costs associated with executing experiments. The visual analytics allow exploration of relationships between hyperparameters and performance metrics to make informed tuning decisions.

The problem addressed arises from the difficulty and inefficiency in existing hyperparameter tuning methods. Hyperparameter tuning is crucial for ML model performance but is often time-consuming, computationally expensive, and requires expert knowledge. Automated approaches like exhaustive grid search or random search are often impractical due to computational cost and complexity, while manual tuning remains error-prone, inefficient, and does not easily support reproducibility or collaboration.

Claims Coverage

The patent includes multiple independent claims covering methods, computer systems, and computer-readable media for facilitating hyperparameter tuning using interactive visualizations.

Interactive visualization with linked plot visualizations

A graphical user interface (GUI) displaying interactive visualizations where multiple plot visualizations link so that user interaction in one plot dynamically updates others to highlight related data points, facilitating exploratory analysis of hyperparameters and performance metrics.

Iterative hyperparameter tuning with locking and adding hyperparameters

Receiving and locking selected hyperparameter values based on visualized experiment results, then selecting additional hyperparameters and generating new hyperparameter value sets by applying locked values and new values, followed by executing new experiments and training the ML model with these tuned values.

Generating and executing experiments based on hyperparameter selections

Determining proposed sets of experiments according to selected hyperparameters and their values, including generating experiments for all or selected combinations, displaying interactive listings, and allowing users to select which experiments to run.

Redundancy detection and cost notification in experiment execution

Determining which proposed experiments have already been executed to notify users of redundancies, and calculating and displaying resource utilization, monetary, and time costs associated with executing proposed experiments, including cost comparisons across different computing resource options.

Optimization recommendations for experiment sets under constraints

Determining constraints such as available computing resources or cost limits and providing recommendations to optimize the set of experiments to execute, maximizing utility within constraints.

Execution using distributed computing clusters and versioned experiments

Executing each experiment in an isolated resource container within distributed computing clusters, and associating experiments with versioned hyperparameter inputs and performance metric outputs to support reproducibility and tracking.

User-guided function fitting for hyperparameter recommendation

Enabling users to interactively select data patterns via user-drawn lines or selections on visualizations, applying function fitting (e.g., linear or quadratic) to analyze results and generate recommendations for hyperparameter values that optimize performance metrics.

Selection of computing resource for experiment execution

Providing users options to select from multiple execution computer systems, including on-premises and cloud-based clusters, for executing hyperparameter tuning experiments.

Overall, the claims cover a comprehensive system and method for hyperparameter tuning of ML models through interactive, linked visual analytics enabling iterative exploration, user input, experiment selection, cost-awareness, and efficient execution in distributed computing environments.

Stated Advantages

Enables iterative hyperparameter tuning through interactive visual analytics, allowing users to gain insight into how hyperparameters affect model performance and convergence.

Supports automation and provides recommendations to optimize tuning considering available computing resources, enhancing efficiency.

Facilitates collaboration among peers during the hyperparameter tuning process.

Provides visibility into computation costs associated with executing tuning experiments, enabling cost-effective decision-making.

Allows selective execution of experiments by detecting redundancies and proposing optimized experiment subsets based on user constraints.

Documented Applications

Developing machine learning models such as convolutional neural networks for tasks including optical character recognition (OCR) of handwritten digits.

General application to various ML model types including neural networks, clustering models, decision trees, and random forest classifiers for solving real-world problems.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.