Internet-Draft | Deterministic Networking | July 2024 |
Tang & Zhang | Expires 9 January 2025 | [Page] |
Massive data processing and complex algorithm applications in the industrial Internet require a large amount of computing resources. At the same time, real-time control requirements and production safety requirements require network reliability and certainty. This draft proposes a service-oriented task process processing framework, which divides the execution process of services into two stages, namely resource orchestration of task flow and packet transmission scheduling. In order to obtain the optimal scheduling strategy, a constrained optimization problem is developed, which aims to maximize the success rate of transmission scheduling while compromising load balancing and resource utilization. In order to improve the reliability of the network, the TSN-5G converged network architecture is used to transmit data packets.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 9 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Centralized control and distributed management are adopted to connect and coordinate the scheduling of geographically dispersed computing resources, including CPU, GPU, and storage resources. There are two ways to connect each computing power domain, one is to connect through TSN system; The second is to connect through the TSN-5G converged network. The DRL algorithm is deployed on the central controller for resource orchestration and transmission scheduling decisions. when, and only when, they appear in all capitals, as shown here.¶
In this draft, computing tasks are divided into three levels, namely service, task flow and data packet. By classifying the task flows generated by different services according to the requirements based on the main resource requirements, different types of task flows can be better matched with the appropriate resources to achieve the on-demand adaptation of resources, which is conducive to the efficient management and utilization of multi-dimensional resources. The resource orchestration part of the decision variable is the destination function computing domain address of the task flow.¶
In order to cope with unexpected tasks and ensure system stability and user experience, computing and storage resources are reserved statistically in the model design to deal with unexpected tasks, and the remaining computing and storage resources are utilized by task flow.¶
The constraints on this part are as follows: Ensure that the reserved resources and the consumed resources cannot exceed the resource capacity of the corresponding computing domain; Ensure that each task flow in a window can only be assigned to one computing domain for processing. The objective function can be obtained by considering the load balancing and resource utilization in each computing domain.¶
In the transmission scheduling phase, a wireless and wired converged network framework supporting deterministic transmission is designed. Each computing domain server is connected through the 5G system and the TSN system, and the deterministic transmission between each computing domain is realized through the deterministic mechanism and constraints, wherein the TSN system is connected to the 5G system through two interfaces DS-TT and NW-TT.¶
The transmission scheduling part determines the transmission path and time slot of the task flow packets. According to[IEEE802.1Qbv], predictable finite delays can be provided by precisely controlling the forward queue of packets. Among them, in order to realize the joint scheduling of time slot resources in TSN-5G converged network, the following three measures are adopted:¶
• Adopt mini timeslot: The time slot of TSN is in the hundreds of microseconds, while that of 5G is in the millisecond level. In order to realize the time synchronization of TSN system and 5G system, mini time slot is used as the time scheduling unit of 5G network. The duration of mini time slot is very short (such as 100-500 microseconds), and the boundary of mini time slot is consistent with the time synchronization period of TSN network. Achieve seamless connection between the two networks.¶
• Uniform time slot length: To apply CQF mechanism to TSN-5G converged network, the key is to design timeslot length reasonably. The packets sent by upstream switch in the previous timeslot must be received by downstream switch in the next timeslot. According to the network architecture adopted in this draft, there are three cases of transmission between any two hops, TSN to TSN, TSN to 5G and 5G to TSN, so the minimum time slot should be greater than the transmission time between any hop in these three cases, and at the same time, The slot size should be the greatest common divisor of all packet cycles.¶
• Consider 5GS as a logical TSN bridge: In the packet transmission scheduling, the whole 5GS is regarded as a logical network bridge, the forwarding delay inside 5GS is less than or equal to the slot T, and the interface DS-TT and NW-TT are regarded as the receiving and sending queues of the CQF queue. Therefore, the end-to-end delay that may be experienced can be obtained using the delay calculation formula of CQF.¶
Through these three measures, the unified and joint scheduling of time slot resources is realized.¶
In order to realize reliable transmission scheduling, the following constraint functions are designed:¶
• Data packet transmission: To ensure the smooth transmission of data packets, the total number of data packets in each queue in a time slot cannot exceed the maximum capacity of the queue.¶
• Two-scale interaction: The transfer scheduling phase needs to take into account the output of the resource orchestration phase to ensure that tasks are assigned to the correct servers for computation. In order to ensure that the packets of the second stage task flow are transmitted to the server that has been selected for the resource orchestration decision of the first stage, a ternary binary variable is added to constrain this relationship when modeling the problem. For a task flow, the value of the ternary variable is 1 only when the packets of the second stage are transmitted to the decision server of the first stage, otherwise it is 0.¶
If the delay meets the requirements, the scheduling succeeds; otherwise, the scheduling fails. The objective function of this part can be obtained by aiming at maximizing the scheduling success rate.¶
The overall objective function can be obtained by combining the objective function of the two stages.¶
The global objective function is a multi-objective optimization problem, which is decoupled into a resource scheduling problem on a large time scale and a transmission scheduling problem on a small time scale, and then a two-layer constraint reinforcement learning algorithm is proposed to solve this problem.¶
The cross-domain resource orchestration subproblem is to maximize the overall resource utilization by optimizing resource orchestration decisions.¶
This draft gives a resource orchestration process based on greedy algorithm. Because of its efficiency and simplicity, greedy algorithm can often give a relatively good approximate solution, especially when the number of tasks is large enough, as follows:¶
• Sort and initialize: The task flows are sorted according to their resource requirements and tolerance times. Consider prioritizing task flows with high demand and low tolerance time.¶
• Iterate over all tasks: Start with a high-priority task and traverse all computing domains to find the optimal domain that meets the resource requirements for that task.¶
• Improve resource utilization: In the domains that meet the requirements, select the domain with the lowest resource utilization to allocate resources to maximize resource utilization.¶
The input of the subroutine is the available resources in all computing domains and the resource requirements of all computing tasks, and the output is a 2D 01 resource scheduling decision matrix DR. Through this subroutine, resource utilization can be improved while load balancing is achieved.¶
The deterministic transmission scheduling subproblem is to allocate time slot resources with the goal of maximizing the successful scheduling rate. This is an MDP problem because each packet transmission situation of a future task flow depends only on the current remaining packet volume and remaining delay state, and has nothing to do with the transmission history of previous packets. Three important factors are as follows:¶
• Status: The central controller collects service information and remaining slot capacity information from the compute domain server and TSN switch.¶
• Action: Based on the observed state, the agent can make real-time transmission scheduling policies to determine which time slot is arranged to transmit a packet of a task stream, thus meeting the overall service delay requirements.¶
• Reward: Once the agent takes action a, it gets a reward to evaluate how well it took action a in state s.¶
In MDP, the agent's goal is to find the optimal time-slot resource allocation strategy that maximizes the cumulative discount reward.¶
In this problem, there are coupling constraints between the two phases. In the resource orchestration phase, it is necessary to consider the subsequent transmission scheduling phase to ensure that tasks can be successfully transmitted under delay constraints. At the same time, the transfer scheduling phase also needs to consider the output of the resource orchestration phase to ensure that the task is assigned to the correct server for computation. In order to realize the two-stage closed-loop control, the following measures are designed:¶
• Greedy sorting algorithm based on task flow delay requirements: the optimization goal of the first stage is to achieve load balancing and improve resource utilization. In order to consider the impact of resource scheduling on the second stage transmission delay at the same time, the tolerance delay and resource demand characteristics of the task flow are considered in feature sorting, and weight factors are added to measure the relationship between the two.¶
• Introduction of constraint variables: The transfer scheduling phase needs to take into account the output of the resource orchestration phase to ensure that the task is assigned to the correct server for computation. In order to ensure that the packets of the second stage task flow are transmitted to the server that has been selected for the resource orchestration decision of the first stage, a ternary binary variable is added to constrain this relationship when modeling the problem. For a task flow, the value of the ternary variable is 1 only when the packets of the second stage are transmitted to the decision server of the first stage, otherwise it is 0.¶
• Feedback design of reward function: the resource utilization achieved in resource orchestration stage is included in the transmission scheduling algorithm reward, so as to capture the interaction between the two stages and realize closed-loop task processing control.¶
This section will be described later.¶
This document should not affect the security of the Internet.¶
TBA¶