【RDMA】ibv_query_qp()

原文:http://www.polarhome.com/service/man/AIX/libs/ofed/ofed_ibv_create_qp.htm

 

ibv_query_qp()

Contents [hide]

    

5.00 平均评分 (98% score) - 2 votes

int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
                 enum ibv_qp_attr_mask attr_mask,
                 struct ibv_qp_init_attr *init_attr);

描述

ibv_query_qp() 返回QP的属性和当前值。 .

struct ibv_qp_attr 描述QP的属性。

struct ibv_qp_attr {
	enum ibv_qp_state	qp_state;
	enum ibv_qp_state	cur_qp_state;
	enum ibv_mtu		path_mtu;
	enum ibv_mig_state	path_mig_state;
	uint32_t		qkey;
	uint32_t		rq_psn;
	uint32_t		sq_psn;
	uint32_t		dest_qp_num;
	int			qp_access_flags;
	struct ibv_qp_cap	cap;
	struct ibv_ah_attr	ah_attr;
	struct ibv_ah_attr	alt_ah_attr;
	uint16_t		pkey_index;
	uint16_t		alt_pkey_index;
	uint8_t			en_sqd_async_notify;
	uint8_t			sq_draining;
	uint8_t			max_rd_atomic;
	uint8_t			max_dest_rd_atomic;
	uint8_t			min_rnr_timer;
	uint8_t			port_num;
	uint8_t			timeout;
	uint8_t			retry_cnt;
	uint8_t			rnr_retry;
	uint8_t			alt_port_num;
	uint8_t			alt_timeout;
};

 struct ibv_qp_attr: 完整的描述

qp_state QP状态。它可以是下列枚举值之一:
IBV_QPS_RESET-重置状态
IBV_QPS_INIT-初始化状态
IBV_QPS_RTR-准备接收状态
IBV_QPS_RTS-准备发送状态
IBV_QPS_SQD-发送队列耗尽状态
IBV_QPS_SQE-发送队列错误状态
IBV_QPS_ERR-错误状态
cur_qp_state

Assume that this is the current QP state. This is useful if it is known to the application that the QP state is different from the assumed state by the low-level driver. It can be one of the enumerated values as qp_state. Not relevant for ibv_query_qp()

假设这是当前的QP状态。如果应用程序知道QP状态与底层驱动程序假定的状态不同,这将很有用。它可以是qp_state枚举值之一。与ibv_query_qp()不相关

path_mtu

路径MTU(path MTU)(最大传输单位),即可以在路径中传输的数据包的最大有效负载大小。它可以是下列枚举值之一:

 

  • IBV_MTU_256 - 256 bytes
  • IBV_MTU_512 - 512 bytes
  • IBV_MTU_1024 - 1024 bytes
  • IBV_MTU_2048 - 2048 bytes
  • IBV_MTU_4096 - 4096 bytes

对于UC和RC QP,在需要时,RDMA设备将自动将消息分段为该大小的数据包。

path_mig_state

The state of the QP's path migration state machine if supported by the device (IBV_DEVICE_AUTO_PATH_MIG is set in dev_cap.device_cap_flags). It can be one of the following enumerated values:

如果设备支持QP的路径迁移状态机的状态(IBV_DEVICE_AUTO_PATH_MIG在dev_cap.device_cap_flags中设置)。它可以是下列枚举值之一:

 

  • IBV_MIG_MIGRATED - 迁移了路径迁移的状态机,即完成了迁移的初始状态The state machine of path migration is Migrated, i.e. initial state of migration was done
  • IBV_MIG_REARM - The state machine of path migration is Rearm, i.e. attempt to try to coordinate the remote RC QP to move both local and remote QPs to Armed state
  • IBV_MIG_ARMED - The state machine of path migration is Armed, i.e. both local and remote QPs are ready to perform a path migration
qkey The Q_Key that incoming messages are check against and possibly used as the outgoing Q_Key (if the MSB of the q_key in the Send Request is set). Relevant only for UD QPs
rq_psn A 24 bits value of the Packet Sequence Number of the received packets for RC and UC QPs
sq_psn A 24 bits value of the Packet Sequence Number of the sent packets for any QP
dest_qp_num A 24 bits value of the remote QP number of RC and UC QPs; when sending data, packets will be sent to this QP number and when receiving data, packets will be accepted only from this QP number
qp_access_flags Allowed access flags of the remote operations for incoming packets of RC and UC QPs. It is either 0 or the bitwise OR of one or more of the following flags:

 

  • IBV_ACCESS_REMOTE_WRITE - Allow incoming RDMA Writes on this QP
  • IBV_ACCESS_REMOTE_READ - Allow incoming RDMA Reads on this QP
  • IBV_ACCESS_REMOTE_ATOMIC - Allow incoming Atomic operations on this QP
cap Attributes of the number of Wore Requests in the Queue Pair, as described in the table below
ah_attr Address vector of the primary path which describes the path information to the remote QP as described in the table below
alt_ah_attr Address vector of the alternate path which describes the path information to the remote QP as described in the table below. Can be used only if supported by the device (IBV_DEVICE_AUTO_PATH_MIG is set in dev_cap.device_cap_flags)
pkey_index Primary P_Key index. The value of the entry in the P_Key table that outgoing packets from this QP will be sent with and incoming packets to this QP will be verified within the Primary path
alt_pkey_index Alternate P_Key index. The value of the entry in the P_Key table that outgoing packets from this QP will be sent with and incoming packets to this QP will be verified within the Alternate path
en_sqd_async_notify If non-zero, generate the affiliated asynchronous event IBV_EVENT_SQ_DRAINED when the QP state becomes SQD.drained, i.e. the Send Queue is drained. Not relevant for ibv_query_qp()
sq_draining If set, indication that Send Queue draining is in progress. Relevant only when the QP is in the SQD state
max_rd_atomic The number of RDMA Reads & atomic operations outstanding at any time that can be handled by this QP as an initiator. Relevant only for RC QPs
max_dest_rd_atomic The number of RDMA Reads & atomic operations outstanding at any time that can be handled by this QP as a destination. Relevant only for RC QPs
min_rnr_timer

Minimum RNR NAK Timer Field Value. When an incoming message to this QP should consume a Work Request from the Receive Queue, but not Work Request is outstanding on that Queue, the QP will send an RNR NAK packet to the initiator. It does not affect RNR NAKs sent for other reasons. The value can be one of the following numeric values since those values aren’t enumerated:

 

最小RNR NAK计时器字段值。当传入消息到达此QP,将消费接收队列中的一个WR (Work Request)。但该队列上未完成工作请求时,QP将向发起方发送RNR NAK数据包。它不会影响由于其他原因发送的RNR NAK。该值可以是以下数字值之一(这些值未被枚举):

仅与RC QP相关。

  • 0 - 655.36 milliseconds delay
  • 1 - 0.01 milliseconds delay
  • 2 - 0.02 milliseconds delay
  • 3 - 0.03 milliseconds delay
  • 4 - 0.04 milliseconds delay
  • 5 - 0.06 milliseconds delay
  • 6 - 2.56 milliseconds delay
  • 7 - 3.84 milliseconds delay
  • 8 - 5.12 milliseconds delay
  • 9 - 7.68 milliseconds delay
  • 10 - 10.24 milliseconds delay
  • 11 - 15.36 milliseconds delay
  • 12 - 0.08 milliseconds delay
  • 13 - 0.12 milliseconds delay
  • 14 - 0.16 milliseconds delay
  • 15 - 0.24 milliseconds delay
  • 16 - 0.32 milliseconds delay
  • 17 - 0.48 milliseconds delay
  • 18 - 0.64 milliseconds delay
  • 19 - 0.96 milliseconds delay
  • 20 - 1.28 milliseconds delay
  • 21 - 1.92 milliseconds delay
  • 22 - 20.48 milliseconds delay
  • 23 - 30.72 milliseconds delay
  • 24 - 40.96 milliseconds delay
  • 25 - 61.44 milliseconds delay
  • 26 - 81.92 milliseconds delay
  • 27 - 122.88 milliseconds delay
  • 28 - 163.84 milliseconds delay
  • 29 - 245.76 milliseconds delay
  • 30 - 327.68 milliseconds delay
  • 31 - 491.52 milliseconds delay

 

port_num Primary physical port number associated with this QP
timeout The minimum timeout that a QP waits for ACK/NACK from remote QP before retransmitting the packet. The value zero is special value which means wait an infinite time for the ACK/NACK (useful for debugging). For any other value of timeout, the time calculation is: 【RDMA】ibv_query_qp() usec. For your convenience, here is the summary of each value and its timeout:

 

  • 0 - infinite
  • 1 - 8.192 usec (0.000008 sec)
  • 2 - 16.384 usec (0.000016 sec)
  • 3 - 32.768 usec (0.000032 sec)
  • 4 - 65.536 usec (0.000065 sec)
  • 5 - 131.072 usec (0.000131 sec)
  • 6 - 262.144 usec (0.000262 sec)
  • 7 - 524.288 usec (0.000524 sec)
  • 8 - 1048.576 usec (0.00104 sec)
  • 9 - 2097.152 usec (0.00209 sec)
  • 10 - 4194.304 usec (0.00419 sec)
  • 11 - 8388.608 usec (0.00838 sec)
  • 12 - 16777.22 usec (0.01677 sec)
  • 13 - 33554.43 usec (0.0335 sec)
  • 14 - 67108.86 usec (0.0671 sec)
  • 15 - 134217.7 usec (0.134 sec)
  • 16 - 268435.5 usec (0.268 sec)
  • 17 - 536870.9 usec (0.536 sec)
  • 18 - 1073742 usec (1.07 sec)
  • 19 - 2147484 usec (2.14 sec)
  • 20 - 4294967 usec (4.29 sec)
  • 21 - 8589935 usec (8.58 sec)
  • 22 - 17179869 usec (17.1 sec)
  • 23 - 34359738 usec (34.3 sec)
  • 24 - 68719477 usec (68.7 sec)
  • 25 - 137000000 usec (137 sec)
  • 26 - 275000000 usec (275 sec)
  • 27 - 550000000 usec (550 sec)
  • 28 - 1100000000 usec (1100 sec)
  • 29 - 2200000000 usec (2200 sec)
  • 30 - 4400000000 usec (4400 sec)
  • 31 - 8800000000 usec (8800 sec)

Relevant only to RC QPs

retry_cnt A 3 bits value of the total number of times that the QP will try to resend the packets before reporting an error because the remote side doesn't answer in the primary path
rnr_retry A 3 bits value of the total number of times that the QP will try to resend the packets when an RNR NACK was sent by the remote QP before reporting an error. The value 7 is special and specify to retry infinite times in case of RNR
alt_port_num Alternate physical port number associated with this QP
alt_timeout The total number of times that the QP will try to resend the packets before reporting an error because the remote side doesn't answer in the alternate path

A few caveats about some of the specific values in the QP attributes:

  • The value 0 in attr.timeout means waiting infinite time for the ACK or NACK. This means that if any packet in a message is being lost and no ACK or NACK is being sent, no retry will ever occur and the QP will just stop sending data
  • The value 7 in attr.rnr_retry means to retry infinite number of times to send the message when RNR Nack is being sent by remote side

The struct ibv_qp_init_attr describes the attributes of the QP's Queues:

struct ibv_qp_init_attr {
	void		       *qp_context;
	struct ibv_cq	       *send_cq;
	struct ibv_cq	       *recv_cq;
	struct ibv_srq	       *srq;
	struct ibv_qp_cap	cap;
	enum ibv_qp_type	qp_type;
	int			sq_sig_all;
};

Here is the full description of struct ibv_qp_init_attr:

qp_context (optional) User defined value which will be available in qp->qp_context
send_cq A Completion Queue, that was returned from ibv_create_cq(), to be associated with the Send Queue
recv_cq A Completion Queue, that was returned from ibv_create_cq(), to be associated with the Receive Queue
srq (optional) A Shared Receive Queue, that was returned from ibv_create_srq(), that this Queue Pair will be associated with. Otherwise, NULL
cap Attributes of the Queue Pair size, as described in the table below. Upon a successful Queue Pair creation, this structure will hold the actual Queue Pair attributes
qp_type Transport Service Type for requested this QP:

 

IBV_QPT_RC Reliable Connection
IBV_QPT_UC Unreliable Connection
IBV_QPT_UD Unreliable Datagram
sq_sig_all The Signaling level of Work Requests that will be posted to the Send Queue in this QP.

 

0 In every Work Request submitted to the Send Queue, the user must decide whether to generate a Work Completion for successful completions or not
otherwise All Work Requests that will be submitted to the Send Queue will always generate a Work Completion

struct ibv_qp_cap describes the size of the Queue Pair (for both Send and Receive Queues).

struct ibv_qp_cap {
	uint32_t		max_send_wr;
	uint32_t		max_recv_wr;
	uint32_t		max_send_sge;
	uint32_t		max_recv_sge;
	uint32_t		max_inline_data;
};

Here is the full description of struct ibv_qp_cap:

max_send_wr The maximum number of outstanding Work Requests that can be posted to the Send Queue in that Queue Pair. Value can be [1..dev_cap.max_qp_wr]
max_recv_wr The maximum number of outstanding Work Requests that can be posted to the Receive Queue in that Queue Pair. Value can be [1..dev_cap.max_qp_wr]. This value is ignored if the Queue Pair is associated with an SRQ
max_send_sge The maximum number of scatter/gather elements in any Work Request that can be posted to the Send Queue in that Queue Pair. Value can be [1..dev_cap.max_sge]
max_recv_sge The maximum number of scatter/gather elements in any Work Request that can be posted to the Receive Queue in that Queue Pair. Value can be [1..dev_cap.max_sge]. This value is ignored if the Queue Pair is associated with an SRQ
max_inline_data The maximum message size (in bytes) that can be posted inline to the Send Queue. 0, if no inline message is requested

struct ibv_ah_attr describes the Address Vector of the QP.

struct ibv_ah_attr {
	struct ibv_global_route	grh;
	uint16_t		dlid;
	uint8_t			sl;
	uint8_t			src_path_bits;
	uint8_t			static_rate;
	uint8_t			is_global;
	uint8_t			port_num;
};

Here is the full description of struct ibv_ah_attr:

grh Attributes of the Global Routing Headers (GRH), as described in the table below. This is useful when sending packets to another subnet
dlid If the destination is in same subnet, the LID of the port to which the subnet delivers the packets to. If the destination is in another subnet, the LID of the Router
sl 4 bits. The Service Level to be used
src_path_bits The used Source Path Bits. This is useful when LMC is used in the port, i.e. each port covers a range of LIDs. The packets are being sent with the port's base LID, bitwised ORed with the value of the source path bits. The value 0 indicates the port's base LID is used
static_rate A value which limits the rate of packets that being sent to the subnet. This can be useful if the rate of the packet origin is higher than the rate of the destination
is_global If this value contains any value other than zero, then GRH information exists in this AH, thus the field grh if valid
port_num The local physical port that the packets will be sent from

struct ibv_global_route describes the values to be used in the GRH of the packets that will be sent when using this AH.

struct ibv_global_route {
	union ibv_gid		dgid;
	uint32_t		flow_label;
	uint8_t			sgid_index;
	uint8_t			hop_limit;
	uint8_t			traffic_class;
};

Here is the full description of struct ibv_global_route:

dgid The GID that is used to identify the destination port of the packets
flow_label 20 bits. If this value is set to a non-zero value, it gives a hint for switches and routers with multiple outbound paths that these sequence of packets must be delivered in order, those staying on the same path, so that they won't be reordered.
sgid_index An index in the port's GID table that will be used to identify the originator of the packet
hop_limit The number of hops (i.e. the number of routers) that the packet is permitted to take before being discarded. This ensures that a packet will not loop indefinitely between routers if a routing loop occur. Each router decrement by one this value at the packet and when this value reaches 0, this packet is discarded. Setting the value to 0 or 1 will ensure that the packet won't leave the local subnet.
traffic_class Using this value, the originator of the packets specifies the required delivery priority for handling them by the routers

attr_mask provides a hint to the low-level driver of the RDMA device which QP attributes should be queried. It is possible that extra attributes, which weren't requested, will be filled by ibv_query_qp(). It is either 0 or the bitwise OR of one or more of the following flags:

IBV_QP_STATE Fill the value in attr->qp_state
IBV_QP_CUR_STATE Fill the value in attr->cur_qp_state
IBV_QP_EN_SQD_ASYNC_NOTIFY Fill the value in attr->en_sqd_async_notify
IBV_QP_ACCESS_FLAGS Fill the value in attr->qp_access_flags
IBV_QP_PKEY_INDEX Fill the value in attr->pkey_index
IBV_QP_PORT Fill the value in attr->port_num
IBV_QP_QKEY Fill the value in attr->qkey
IBV_QP_AV Fill the value in attr->ah_attr
IBV_QP_PATH_MTU Fill the value in attr->path_mtu
IBV_QP_TIMEOUT Fill the value in attr->timeout
IBV_QP_RETRY_CNT Fill the value in attr->retry_cnt
IBV_QP_RNR_RETRY Fill the value in attr->rnr_retry
IBV_QP_RQ_PSN Fill the value in attr->rq_psn
IBV_QP_MAX_QP_RD_ATOMIC Fill the value in attr->max_rd_atomic
IBV_QP_ALT_PATH Fill the value in attr->alt_ah_attrattr->alt_pkey_indexattr->alt_port_numattr->alt_timeout
IBV_QP_MIN_RNR_TIMER Fill the value in attr->min_rnr_timer
IBV_QP_SQ_PSN Fill the value in attr->sq_psn
IBV_QP_MAX_DEST_RD_ATOMIC Fill the value in attr->max_dest_rd_atomic
IBV_QP_PATH_MIG_STATE Fill the value in attr->path_mig_state
IBV_QP_CAP Fill the value in attr->cap
IBV_QP_DEST_QPN Fill the value in attr->dest_qp_num

The following table specify the valid attributes of a QP with service type IBV_QPT_UD in each state:

State Valid Attributes
RESET IBV_QP_STATE
INIT IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_QKEY
RTR IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_QKEY
RTS IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_QKEYIBV_QP_SQ_PSN
SQD IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_QKEYIBV_QP_SQ_PSN
SQE IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_QKEYIBV_QP_SQ_PSN
ERR IBV_QP_STATE

The following table specify the valid attributes of a QP with service type IBV_QPT_UC in each state:

State Valid Attributes
RESET IBV_QP_STATE
INIT IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGS
RTR IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGSIBV_QP_AVIBV_QP_PATH_MTUIBV_QP_DEST_QPNIBV_QP_RQ_PSNIBV_QP_ALT_PATH
RTS IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGSIBV_QP_AVIBV_QP_PATH_MTUIBV_QP_DEST_QPNIBV_QP_RQ_PSNIBV_QP_ALT_PATHIBV_QP_SQ_PSNIBV_QP_PATH_MIG_STATE
SQD IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGSIBV_QP_AVIBV_QP_PATH_MTUIBV_QP_DEST_QPNIBV_QP_RQ_PSNIBV_QP_ALT_PATHIBV_QP_SQ_PSNIBV_QP_PATH_MIG_STATE
SQE IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGSIBV_QP_AVIBV_QP_PATH_MTUIBV_QP_DEST_QPNIBV_QP_RQ_PSNIBV_QP_ALT_PATHIBV_QP_SQ_PSNIBV_QP_PATH_MIG_STATE
ERR IBV_QP_STATE

The following table specify the valid attributes of a QP with service type IBV_QPT_RC in each state:

State Valid Attributes
RESET IBV_QP_STATE
INIT IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGS
RTR IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGSIBV_QP_AVIBV_QP_PATH_MTUIBV_QP_DEST_QPNIBV_QP_RQ_PSNIBV_QP_MAX_DEST_RD_ATOMICIBV_QP_MIN_RNR_TIMERIBV_QP_ALT_PATH
RTS IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGSIBV_QP_AVIBV_QP_PATH_MTUIBV_QP_DEST_QPNIBV_QP_RQ_PSNIBV_QP_MAX_DEST_RD_ATOMICIBV_QP_MIN_RNR_TIMERIBV_QP_ALT_PATH, IBV_QP_SQ_PSNIBV_QP_TIMEOUTIBV_QP_RETRY_CNTIBV_QP_RNR_RETRYIBV_QP_MAX_QP_RD_ATOMICIBV_QP_PATH_MIG_STATE
SQD IBV_QP_STATEIBV_QP_PKEY_INDEXIBV_QP_PORTIBV_QP_ACCESS_FLAGSIBV_QP_AVIBV_QP_PATH_MTUIBV_QP_DEST_QPNIBV_QP_RQ_PSNIBV_QP_MAX_DEST_RD_ATOMICIBV_QP_MIN_RNR_TIMERIBV_QP_ALT_PATH, IBV_QP_SQ_PSNIBV_QP_TIMEOUTIBV_QP_RETRY_CNTIBV_QP_RNR_RETRYIBV_QP_MAX_QP_RD_ATOMICIBV_QP_PATH_MIG_STATE
ERR IBV_QP_STATE

Parameters

Name Direction Description
qp in QP that was returned from ibv_create_qp()
attr out Will be filled with the current attributes of the QP
attr_mask in Mask of the QP attributes to be queried
init_attr out Will be filled with the QP's queues attributes

Return Values

Value Description
0 On success
errno On failure:
ENOMEM Not enough resources to complete this operation

Examples

Query a QP to get its state:

struct ibv_qp *qp;
struct ibv_qp_attr attr;
struct ibv_qp_init_attr init_attr;
 
if (ibv_query_qp(qp, &attr,
		  IBV_QP_STATE, &init_attr)) {
	fprintf(stderr, "Failed to query QP state\n");
	return -1;
}

FAQs

Calling every time to ibv_query_qp() when I need a QP attribute takes time, can I cache some of the attributes?

Actually, yes. Most of the attributes are constants unless they were changed using ibv_modify_qp(). The following fields in the QP attribute structure may change: qp_statepath_mig_statesq_drainingah_attrpkey_indexport_numtimeout.

Can I specify exactly which attributes will be filled by ibv_query_qp()?

No. The parameter attr_mask behaves as a hint, and the low-lever driver of the RDMA device may (and most of the time, will) fill many more attributes than the ones that were requested in attr_mask.

Do all of the QP attributes are valid?

No. The valid QP attributes depends on the QP state.

How to get the QP state?

One should call ibv_query_qp() with, at least, the flag IBV_QP_STATE set.

上一篇:两个presentation


下一篇:一篇文章教你使用RDMA技术提升Spark的Shuffle性能