Abstract: Based on the analysis and comparison of existing grid resource management models, a concrete model HRMM based on hierarchical structure is proposed. The resource management is divided into job parallel analysis, global resource allocation, local resource allocation and Four levels of local resource management, and corresponding optimization strategies and algorithms are designed for each level. The maximum computational complexity of the model for resource management is O(n2)~O(n3), which is an optimized and effective grid resource management model.
Computational Grid is an important parallel distributed computing technology that has emerged in recent years. One of its key technologies is to manage resources in the grid. The resources in the grid have wide-area distribution, heterogeneous and dynamic characteristics, making grid resource management complicated. There is currently no model that can handle all of the grid application requirements. At present, the grid resource management model is mainly divided into three types: hierarchical model, abstract owner model and economic/market model. The Globus project team has an important say in grid protocol development, and many important companies including IBM, Microsoft, Sun, Compaq, SGI, and NEC have announced their support for the Globus Toolkit. Therefore, the layered model adopted by Globus represents the development trend of grid resource management.
Based on the design idea of ​​Globus hierarchical model, this paper proposes an optimized HRMM (Hierarchical Resource Management Model) and gives the corresponding resource management algorithm. In order to improve efficiency, the data structures and interfaces provided by Globus Toolkit 2.4 are used in the main modules of HRMM.
1 Overall structure of HRMM
The design idea of ​​HRMM is to dynamically receive job requests from users and assign qualified computing resources to the jobs, and provide online feedback on resource information throughout the calculation process to accept online control of users. The architecture of HRMM is shown in Figure 1. The resource management tasks of the computing grid are divided into four levels: job parallel analysis, global resource allocation, local resource allocation, and local resource management.
As can be seen from Figure 1, the user submits a job request to the HRMM via a GUI (Graphical User Interface), the job parallel analyzer receives the user's job request, and then divides the tasks in the job into several task groups according to the maximum degree of parallelism, and submits them to the global resource allocation. Device. For each task in the multitasking group, the global resource allocator searches multiple static clusters for the cluster that meets the requirements, and the candidate cluster group is submitted to the local resource allocator. The local resource allocator reads information about each cluster in the candidate cluster group in the dynamic resource library, and assigns the corresponding task to the most qualified cluster. The cluster then applies the local resource manager to perform the task. On the whole, the local resource manager sends static resource update information to the static resource repository at regular intervals. In addition, the dynamic resource library reads the update information from the local resource manager before the local resource allocator reads the dynamic resource library.
In this layered model, on the one hand, the user-submitted jobs can be executed with the maximum degree of parallelism, thus effectively embodying the idea of ​​parallel computing; on the other hand, selecting multiple clusters to form a candidate cluster group, and then determining one of them The scheme of allocating resources improves the efficiency of resource allocation by comprehensively considering the static and dynamic requirements of the task and avoiding repeated query operations.
2 job parallel analyzer
As shown in FIG. 1, the user submits a job request to the job parallel analyzer via the GUI. This request includes information about multiple tasks contained in the job, dependencies between tasks, and computing resource requirements for each task. The job parallel analyzer analyzes the tasks and relationships in the job, divides the tasks in the job into different task groups according to the dependencies of each task, and submits each task group to the global resource allocator after proper description.
2.1 Topological representation of the job
A job consists of one or more tasks. The topology of the job is defined as a directed acyclic graph that satisfies the following conditions: the nodes of the graph correspond one-to-one with the tasks in the job; if task B directly depends on task A, there is a directed orientation from node A to node B. Side, said A is the direct precursor of B, B is the direct successor of A; if there is a directed path composed of multiple directed edges from A to B, then A is called the precursor of B, and B is the successor of A. .
Figure 2 shows the topology of a job. Let the job consist of 7 tasks labeled A~G and their relationship. As shown in Figure 2, task D needs to be started after tasks A and B are completed, and task G must be started after the task is completed and F is completed.
In order to improve the parallel execution efficiency of the job, it is necessary to pay attention to the depth of the task in the topology definition. Note that the direct precursor set of task T is Pd(T), and its depth d(T) is:
If Pd(T)=φ, then d(T)=1;
If Pd(T) ≠φ, then d(T)=max {d(R)}+1.
R∈Pd(T)
2.2 Maximum parallelism of the job
The parallel division of the job refers to a series of task groups formed after the split of a job, corresponding to each task, orderly and independent. A job may have one or more parallel partitioning schemes to form a parallel partitioning set corresponding to the job, denoted by Θ, I(Θ) is the number of task groups in the Θ. It is called the maximum degree of parallelism of the job, if: E∈Θ, and ξ∈Θ. I( ) ≤ I(ξ) divides multiple tasks in the job according to the corresponding depth to form a maximum degree of parallelism. As shown in the job in Figure 2, the maximum degree of parallelism is divided into: = {(A, B), (C, D, E), F, G}.
3 global resource allocator
After receiving the task group described by RSL, the global resource allocator analyzes and interprets immediately to obtain the static resource requirements of each task. The system searches the static resource library for multiple clusters that meet the conditions according to the resource requirements of each task, and submits the results to the local resource allocator.
3.1 Static Resource Library
The static resource library in the system uses an LDAP structure based on the Lightweight Directory Access Protocol. In the HRMM model, all static resources of the grid system establish corresponding directory entries in the DIT (Directory Information Tree) of the LDAP server, and describe various resource attributes by a combination of <attribute, value>. Selecting LDAP from a static repository can bring the following advantages in performance:
(1) LDAP is optimized for read operations, and the read efficiency can be improved in the case of frequent read operations.
(2) LDAP is a cross-platform protocol that can be used on any computer. Thereby increasing the adaptability of the system to the heterogeneous grid environment.
(3) The LDAP server supports a distributed structure. The static resource library can access a local or global LDAP server, and can be easily synchronized, that is, enhance the distribution of resource management.
3.2 Global Resource Allocation Algorithm
Based on the static requirements of each task in the task group, the global resource allocator searches the static repository for clusters that meet the requirements. When searching, first randomly select the starting position of the search, and then return the first N clusters that meet the task requirements for each task, form a candidate cluster group, and submit it to the local resource allocation after the description of the ClusterList data structure. The ClusterList is a generalized table structure used to describe the candidate cluster group, as shown in Figure 3. For any task, if only K is found ( 4 local resource allocator The local resource allocator searches the dynamic resource library for the dynamic information of the candidate cluster group, combines the dynamic information with the static information obtained from the global resource allocator, and performs comprehensive analysis, and finally assigns each task in the task group to The most suitable cluster. 4.1 Dynamic Resource Library The data in the dynamic resource library is described in XML, which brings the following advantages: (1) XML is optimized for update operations. Therefore, for a dynamic resource library that needs to be continuously updated, efficiency can be effectively improved. (2) XML and LDAP are tree structures in the storage structure, which can be easily converted to each other. Describe the data in XML, so that the dynamic resource library and the LDAP-based static resource library can be better coupled. (3) XML is platform-independent, and data represented in XML can be easily used by other programs. 4.2 Local Resource Allocation Strategy After the local resource allocator obtains the candidate cluster group ClusterList, the dynamic information of each candidate cluster is obtained from the dynamic resource library, and the dynamic information is added to the static information of the corresponding cluster, and then the static resource and the dynamic resource information are combined. Form cluster comprehensive resource information. Set the dynamic resource information of a cluster to h=[h1,...,hm]T, and the static resource information to t=[t1,...,td]T, where m and d are the number of fields described by dynamic and static resources, respectively. The cluster synthesis information is υ=[tThT]T=[υ1,...,υp]T, where P=m+d. As shown in FIG. 3, the comprehensive information of the clusters 2, 2 is represented as υ2.2. Similarly, the task static resource requirement and the dynamic resource are combined, and the dynamic resource requirement of one task is g=[g1,...,gm]T, and the static resource requirement is s=[s1,...,sd)T, then the comprehensive resource The demand is r=[sT gT]T=[r1,...,rp]T. The comprehensive resource requirement for task i is expressed as ri. When determining the allocation strategy, only the comprehensive resource requirements of the task and the comprehensive resource information of the cluster will be considered. First, in order for the task to be successfully completed, the selected cluster must meet both the static resource requirements and the dynamic resource requirements of the task, that is, meet the comprehensive resource requirements of the task: ∨i∈[1,n],∨j∈[1,p],Vi,f(i)[j]≥ri[j] Where n is the number of tasks in the task group, p is the dimension of the vector u/ and r, and f(i) is the sequence number of the finally selected cluster in the candidate cluster of task i (ie, the cluster linked list corresponding to Tasket in the ClusterList). Therefore, first delete all the clusters that do not meet the above conditions in the ClusterList, and note that there are still Ki candidates in the i-th task that meet the requirements of the comprehensive resource, where 1≤i≤n, 1≤Ki≤N. Finally, the local resource allocator wants to select the most appropriate one of the Ki candidate clusters for each task Taski. Considering the overall resource allocation efficiency of the computing grid, the following decision mechanisms are adopted when selecting the cluster: (1) The comprehensive resource information of the selected cluster should be as close as possible to the comprehensive resource requirements of the corresponding tasks, avoiding waste of resources, namely: (2) The total network delay between the selected cluster and the task submission node should be as small as possible, namely: Where tj is the delay of the cluster with the global identifier j; (3) HRMM specifies the upper limit of the computing resource occupancy for each user, namely: Where W is the upper limit of the user's occupied computing resources, and W>0. Taking into account the above three aspects, local resource allocation can be described as the following secondary planning problem: Where C is a weighting coefficient that can be changed, and C>0. Since f(i) is a discrete value and the range of values ​​is limited, the following optimization method is proposed to search for the approximate optimal solution by less calculation. If the candidate cluster group is ClusterList, the algorithm is as follows: STEP 1. Combine static and dynamic resource information into integrated resource information for each task and candidate cluster; STEP 2. Delete clusters in the ClusterList that do not meet the sum resource requirements; STEP 3. STEP 4. Sort each column of Cost in parallel, and rearrange the cluster linked list in the ClusterList in ascending order; STEP 5. in case STEP 6. ∨i∈[1,n], parallel calculation Cost*[i]:=‖vi,k-ri‖+C·TI,k, where k=aramin(‖vi,j‖<‖vi,1‖); STEP 7. ∨i∈[1,n], parallel calculation d(i]:= STEP 8. Set b:=argmin(d[j]) and delete the first k-1 cluster nodes in the cluster linked list of task b in the ClusterList; STEP 9. If satisfied STEP 10. ∨i∈[1,n] assigns the i-th task to the first cluster in the corresponding task cluster list in the ClusterList, and the algorithm ends. The algorithm finds the approximate optimal solution for resource allocation, and makes maximum use of the computing resources of the cluster where the resource management site is located, and parallelizes most of the calculations. If the number of nodes in the cluster where the resource management site is located is the household, the computational complexity of the algorithm on each node is O(n2n/P). 5 Analysis and summary The research group adopts a hierarchical model-based structure, divides resource management into four levels, and then optimizes the performance of the model at each level and proposes corresponding algorithms. In general, the maximum computational complexity of HRMM for resource management of a job does not exceed O(n3), which is an optimized and effective grid system resource management model. Al2O3 Aerospace Ceramics, Aerospace Ceramic Part, ceramic for aerospace Yixing Guangming Special Ceramics Co.,Ltd , https://www.yxgmtc.com Calculate the local loss of each cluster i, j Cost[i,j]:=‖vi,j-ri‖+C·TIj;
, then report that there is no solution that satisfies the condition, and the algorithm ends;
Then turn to STEPl0, otherwise go to STEP6;