Utilization-aware resource scheduling in a distributed computing cluster

Inventors

Kambatla, Karthik

Assignees

Cloudera Inc

Publication Number

US-12223349-B2

Publication Date

2025-02-11

Expiration Date

2037-05-15

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

Embodiments are disclosed for a utilization-aware approach to cluster scheduling, to address this resource fragmentation and to improve cluster utilization and job throughput. In some embodiments a resource manager at a master node considers actual usage of running tasks and schedules opportunistic work on underutilized worker nodes. The resource manager monitors resource usage on these nodes and preempts opportunistic containers in the event this over-subscription becomes untenable. In doing so, the resource manager effectively utilizes wasted resources, while minimizing adverse effects on regularly scheduled tasks.

Core Innovation

The invention presents a utilization-aware resource scheduling technique called UBIS (utilization-based incremental scheduling) for distributed computing clusters. UBIS addresses resource fragmentation and underutilization by considering actual resource usage rather than solely user-specified requests when scheduling tasks. It opportunistically allocates second-tier containers using underutilized resources within already allocated first-tier containers or available unallocated resources, thereby improving cluster utilization and job throughput.

The problem being solved arises from the difficulty in accurately predicting resource requirements for tasks in distributed clusters. Users tend to over-allocate resources to avoid job failures, leading to severe resource fragmentation and underutilization. Existing cluster schedulers allocate resources exclusively to containers regardless of actual usage, causing significant inefficiency.

UBIS introduces a hierarchical scheduling model with first-tier (regular) containers guaranteed resources and second-tier (opportunistic) containers allocated using slack resources. Opportunistic containers can be preempted when resource utilization rises above a threshold, minimizing adverse effects on regular tasks. UBIS controls aggressiveness of scheduling and preemption via variable thresholds (Talloc and Tpreempt) and supports job opt-out for latency-sensitive workloads. The method is compatible with Hadoop-based systems like Apache YARN but extends to other resources and distributed systems.

Claims Coverage

The patent includes one independent method claim, one independent system claim, and one independent computer-readable medium claim, covering utilization-aware scheduling of resource containers in a distributed computing worker node.

Utilization-aware opportunistic third-tier container allocation

Allocating an opportunistic third-tier resource container at a worker node to process a third task based on actual computing resource utilization decreasing below a first threshold, where the container includes unutilized resource slack within first-tier and second-tier containers and unallocated resources at the node.

Preemptive deallocation of opportunistic containers

Deallocating the opportunistic third-tier resource container when actual resource utilization rises above a second threshold offset below the worker node's utilization capacity to guarantee resources to higher-tier containers, including assessing utilization state between thresholds and referencing periodic heartbeat information.

Tiered container hierarchy with resource slack consideration

Maintaining a multi-tier container hierarchy where opportunistic containers have lower priority; allocation and preemption decisions depend on summing resource slack within allocated containers and unallocated resources, with thresholds adjustable based on variable over-allocation and preemption parameters, possibly specific to worker nodes or resource types.

The independent claims focus on a method, system, and machine-readable medium implementing an inventive technique of scheduling opportunistic resource containers using unused slack within allocated containers and unallocated resources, managing their preemption through thresholds to improve utilization while protecting guaranteed resources in first-tier containers.

Stated Advantages

Improves cluster resource utilization and job throughput by harvesting slack resources within allocated containers.

Minimizes adverse effects on regularly scheduled tasks by preempting opportunistic containers upon reaching resource utilization thresholds.

Allows fine-grained control of over-allocation and preemption aggressiveness via variable thresholds adjustable per node, job, or resource type.

Supports dynamic scheduling that adapts to actual resource usage, reducing resource fragmentation and underutilization inherent with existing cluster schedulers.

Facilitates multi-tenant cluster fairness by allocating opportunistic containers only after regular containers are allocated, preserving predictable performance for priority workloads.

Documented Applications

Scheduling and managing computing resources in Hadoop-based distributed computing clusters, including implementations with Apache YARN.

Allocating and managing computing resources such as CPU, memory, disk storage, network bandwidth, I/O, and GPU in distributed computing environments.

Supporting batch-processing jobs with frameworks like MapReduce and Apache Spark, including real-time ad hoc query execution.

Optimizing large-scale data analysis workflows in enterprise distributed systems employing data warehouse software like Apache Hive and NoSQL stores like HBase.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.