Automatic clustering for self-organizing grids
Inventors
Abu-Ghazaleh, Nael • Yang, Weishuai • Lewis, Michael
Assignees
National Science Foundation NSF • Research Foundation of the State University of New York
Publication Number
US-9602573-B1
Publication Date
2017-03-21
Expiration Date
2028-09-23
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A cluster of nodes, comprising: a plurality of nodes, each having a security policy, and being associated task processing resources; a registration agent configured to register a node and issue a node certificate to the respective node; a communication network configured to communicate certificates to authorize access to computing resources, in accordance with the respective security policy; and a processor configured to automatically dynamically partition the plurality of nodes into subnets, based on at least a distance function of at least one node characteristic, each subnet designating a communication node for communicating control information and task data with other communication nodes, and to communicate control information between each node within the subnet and the communication node of the other subnets.
Core Innovation
The invention provides a system and method for automatic clustering of nodes within a self-organizing grid (SOG) based on a performance-related metric, primarily link delay between nodes. It addresses the challenge of efficiently organizing a large and dynamic set of computing resources into clusters or subnets to support distributed tasks, while minimizing communication latency and administrative overhead. The system dynamically partitions nodes into a branched hierarchy of subsets, where each subset has a designated super-node responsible for communication control and coordination within the subset and with others.
The invention solves the problem found in prior grid computing and peer-to-peer environments where large-scale, open grids have not yet emerged due to the high administrative overhead required to add and manage resources, and the lack of scalable and effective node clustering methods. Traditional approaches require global knowledge or exhaustive pairwise measurements, leading to impractical overhead for large-scale systems. This invention offers a scalable alternative by employing a Minimum-delay Dynamic Tree (MDTree) overlay structure that organizes nodes based on network proximity with limited delay measurements, enabling efficient on-demand cluster formation of variable size.
Further, the system includes mechanisms for resource monitoring and scheduling that consider dynamic states of nodes and communication conditions, enabling effective allocation of mutually compatible resources for parallel applications. It also integrates distributed authorization and security features where nodes register and obtain certificates allowing controlled access in accordance with security policies. The invention supports dynamic adjustments in the hierarchy as nodes join or leave, uses heuristic partitioning such as genetic algorithms for maintaining cluster quality, and promotes fault tolerance and self-organization through distributed control.
Claims Coverage
The patent includes four independent claims presenting methods and systems for dynamic partitioning and clustering of node devices in a network, each containing inventive features focusing on scalable clustering, communication within clusters, security management, and distributed control.
Dynamic hierarchical clustering of node devices based on node characteristics
Automatically and dynamically repartitioning a set of nodes into multiple subnets based on a distance function of one or more node device characteristics, such as pairwise communication latency, forming a branched hierarchy with designated communication nodes (super-nodes) controlling intra-subnet communication and inter-subnet control information exchange.
Creation and maintenance of Minimum-delay Dynamic Tree (MDTree) overlay
Building and maintaining a hierarchical tree overlay structure that organizes nodes by link delay, enabling efficient neighborhood splitting using genetic algorithms for effective bi-partitioning, and supporting dynamic join, leave, and repositioning of nodes within the hierarchy while a distributed task is in progress.
Resource management and secure access control in self-organizing grids
Registering nodes via registration agents issuing certificates, communicating certificates for authorization of resource access in accordance with security policies, and employing distributed attribute repositories and role-based authorization mechanisms with reusable access tokens for scalable and autonomous resource access control.
Scalable cluster formation and task allocation based on hierarchical control
Designating preferred node sets for task allocation dependent on the hierarchical clustering and task requirements, forwarding scheduling requests through pruned branches of the overlay, controlling query and update overhead, and accommodating dynamic resource availability without interrupting task accomplishment.
The claims collectively cover a distributed clustering method and system that dynamically partitions nodes into hierarchical subnets based on communication metrics, supports scalable and efficient cluster formation, integrates security and registration processes, and enables effective resource allocation and scheduling in self-organizing computational grids.
Stated Advantages
Scalable automatic clustering of nodes with low overhead for large and dynamic self-organizing grids.
Efficient identification and formation of variable size clusters with minimal average link delay between nodes, close to optimal solutions.
Dynamic maintenance of cluster hierarchy accommodating node join and departure without interrupting ongoing distributed tasks.
Distributed and hierarchical resource information management supporting efficient range and multi-attribute queries.
Integrated security with distributed authorization using registration agents, certificates, role-based policies, and reusable access tokens.
Fault tolerance and self-organizing capabilities ensuring robust grid operation despite dynamic changes.
Documented Applications
Supporting large-scale parallel processing applications requiring effective scheduling and allocation of multiple interdependent resources with consideration of communication costs.
Facilitating self-organizing grids composed of diverse resources ranging from large permanent clusters to intermittently available individual machines and embedded devices.
Providing scalable overlay topologies and cluster formation for wide-area distributed computing systems and computational grids.
Enabling distributed task execution where dynamic membership and resource availability changes are tolerated at the cluster and task level.
Serving as a basis for group scheduling frameworks that allocate a set of closely related nodes for parallel jobs with co-scheduling requirements.
Interested in licensing this patent?