Which Queue Pair type to use?
Contents [hide]
When writing a new RDMA application (just like when writing a new application over sockets), one should decide which QP type he should work with.
In this post, I will describe in detail the characteristics of each transport type.
In RDMA, there are several QP types. They can be represented by : XY
X can be:
Reliable: There is a guarantee that messages are delivered at most once, in order and without corruption.
Unreliable: There isn't any guarantee that the messages will be delivered or about the order of the packets.
In RDMA, every packet has a CRC and corrupted packets are being dropped (for any transport type). The Reliability of a QP transport type refers to the whole message reliability.
Y can be:
Connected: one QP send/receive with exactly one QP
Unconnected: one QP send/receive with any QP
The following mechanisms are being used in RDMA:
* CRC: The CRC field which validates that packets weren't corrupted along the path.
* PSN: The Packet Serial Number makes sure that packets are being received by the order. This helps detect missing packets and packet duplications.
* Acknowledgement: (only in RC QP) Only after a message is being written successfully on the responder side, an ack packet is being sent back to the requestor. If an ack isn't being sent by the requestor, it resend the message again according to the QP's attributes. If there won't be any ack (or nack) from a QP, it will report that there is an error (retry exceeded).
If there is any kind of error on the responder side (protection, resources, etc.) an ack will be sent to the requestor and it will report that there is an error.
Reliable Connected (RC) QP
One RC QP is being connected (i.e. send and receive messages) to exactly one RC QP in a reliable way. It is guaranteed that messages are delivered from a requester to a responder at most once, in order and without corruption. The maximum supported message size is up to 2GB (this value may be lower, depends on the supported RDMA device attributes). RC QP supports Send operations (w/o immediate), RDMA Write operations (w/o immediate), RDMA Read operations and Atomic operations (it depends on the RDMA device support level in atomic operations).
If a message size is bigger than the path MTU, it is being fragmented in the side that sends the data and being reassembled in the receiver side.
Requester considers a message operation complete once there is an ack from the responder side that the message was read/written to its memory.
Responder considers a message operation complete once the message was read/written to its (local) memory.
Unreliable Connected (UC) QP
One UC QP is being connected (i.e. send and receive messages) to exactly one UC QP in an unreliable way. There isn't any guaranteed that the messages will be received by the other side: corrupted or out of sequence packets are silently dropped. If a packet is being dropped, the whole message that it belongs to will be dropped. In this case, the responder won't stop, but continues to receive incoming packets. There isn't any guarantee about the packet ordering. The maximum supported message size is up to 2GB (this value may be lower, depends on the support RDMA device attributes). RC QP supports Send operations (w/o immediate) and RDMA Write operations (w/o immediate).
If a message size is bigger than the path MTU, it is being fragmented in the side that sends the data and being reassembled in the receiver side.
Requester considers a message operation complete once all of the message was sent to the fabric.
Responder considers a message operation complete once it received a complete message in correct sequence and it written the data to its (local) memory.
Unreliable Datagram (UD) QP
One QP can send and receive message to any other UD QP in either unicast (one to one) or multicast (one to many) way in an unreliable way. There isn't any guaranteed that the messages will be received by the other side: corrupted or out of sequence packets are silently dropped. There isn't any guarantee about the packet ordering. The maximum supported message size is the maximum path MTU. UD QP supports only Send operations.
Requester considers a message operation complete once the (one packet) message was sent to the fabric.
Responder considers a message operation complete once it received a complete message and it written the data to its (local) memory.
Choosing the right QP type
Choosing the right QP type is critical to the correction and scalability of an application.
RC QP should be chosen if:
- Reliability by the fabric is needed
- Fabric size isn't big or the cluster size is big, but not all nodes send traffic to the same node (one victim)
Several uses for a RC QP can be: FTP over RDMA or file system over RDMA.
UC QP should be chosen if:
- Reliability by the fabric isn't needed (i.e. reliability isn't important at all or it is being taken care of by the application)
- Fabric size isn't big or the cluster size is big, but not all nodes send traffic to the same node (one victim)
- Big messages (more than the path MTU) are being sent
One use for an UC QP can be: video over RDMA.
UD QP should be chosen if:
- Reliability by the fabric isn't needed (i.e. reliability isn't important at all or it is being taken care of by the application)
- Fabric size is big and all nodes and every node send messages to any other node in the fabric. UD is one of the best solutions for scalability problems.
- Multicast messages are needed
One use for an UD QP can be: voice over RDMA.
Summary
The following table describes the characteristics of each QP Transport Service Type:
Metric | UD | UC | RC |
---|---|---|---|
Opcode: SEND (w/o immediate) | Supported | Supported | Supported |
Opcode: RDMA Write (w/o immediate) | Not supported | Supported | Supported |
Opcode: RDMA Read | Not supported | Not supported | Supported |
Opcode: Atomic operations | Not supported | Not supported | Supported |
Reliability | No | No | Yes |
Connection type | Datagram (One to any/many) | Connected (one to one) | Connected (one to one) |
Maximum message size | Maximum path MTU | 2 GB | 2 GB |
Multicast | supported | Not supported | Not supported |
Share:
Written by: Dotan Barak on June 1, 2013.on January 11, 2019.