4 Data Structures
This section describes data structures used by NVM Express.
4.1 Submission Queue & Completion Queue Definition
Sections 4.1, 4.1.1 and 4.1.2 apply to NVMe over PCIe only. For NVMe over Fabrics, refer to sections 2.4, 2.4.1 and 2.4.2 in the NVMe over Fabrics 1.0 specification.
The submitter of entries to a queue uses the current Tail entry pointer to identify the next open queue slot. The submitter increments the Tail entry pointer after placing the new entry to the open queue slot. If the Tail entry pointer increment exceeds the queue size, the Tail entry shall roll to zero. The submitter may continue to place entries in free queue slots as long as the Full queue condition is not met (refer to section 4.1.2).
Note: The submitter shall take queue wrap conditions into account.
The consumer of entries on a queue uses the current Head entry pointer to identify the slot containing the next entry to be consumed. The consumer increments the Head entry pointer after consuming the next entry from the queue. If the Head entry pointer increment exceeds the queue size, the Head entry pointer shall roll to zero. The consumer may continue to consume entries from the queue as long as the Empty queue condition is not met (refer to section 4.1.1).
Note: The consumer shall take queue wrap conditions into account.
Creation and deletion of Submission Queue and associated Completion Queues need to be ordered correctly by host software. Host software shall create the Completion Queue before creating any associated Submission Queue. Submission Queues may be created at any time after the associated Completion Queue is created. Host software shall delete all associated Submission Queues prior to deleting a Completion Queue. To abort all commands submitted to the Submission Queue host software should issue a Delete I/O Submission Queue Command for that queue (refer to section 7.4.3).
Host software writes the Submission Queue Tail Doorbell (refer to section 3.1.16) and the Completion Queue Head Doorbell (refer to section 3.1.17) to communicate new values of the corresponding entry pointers to the controller. If host software writes an invalid value to the Submission Queue Tail Doorbell or Completion Queue Head Doorbell register and an Asynchronous Event Request command is outstanding, then an asynchronous event is posted to the Admin Completion Queue with a status code of Invalid Doorbell Write Value. The associated queue should be deleted and recreated by host software. For a Submission Queue that experiences this error, the controller may complete previously consumed commands; no additional commands are consumed. This condition may be caused by host software attempting to add an entry to a full Submission Queue or remove an entry from an empty Completion Queue.
Host software checks Completion Queue entry Phase Tag (P) bits in memory to determine whether new Completion Queue entries have been posted. The Completion Queue Tail pointer is only used internally by the controller and is not visible to the host. The controller uses the SQ Head Pointer (SQHD) field in Completion Queue entries to communicate new values of the Submission Queue Head Pointer to the host. A new SQHD value indicates that Submission Queue entries have been consumed, but does not indicate either execution or completion of any command. Refer to section 4.6.
A Submission Queue entry is submitted to the controller when the host writes the associated Submission Queue Tail Doorbell with a new value that indicates that the Submission Queue Tail Pointer has moved to or past the slot in which that Submission Queue entry was placed. A Submission Queue Tail Doorbell write may indicate that one or more Submission Queue entries have been submitted.
A Submission Queue entry has been consumed by the controller when a Completion Queue entry is posted that indicates that the Submission Queue Head Pointer has moved past the slot in which that Submission Queue entry was placed. A Completion Queue entry may indicate that one or more Submission Queue entries have been consumed.
4 Data Structures
本节描述NVM Express使用的数据结构。
4.1 Submission Queue & Completion Queue Definition
章节4.1, 4.1.1和4.1.2只适用于NVMe over PCIe。对于NVMe over Fabrics,参阅章节2.4,2.4.1和2.4.2中的NVMe over Fabrics 1.0规范。
将条目写入队列的提交者使用当前Tail条目指针来标识下一个写入的队列空槽。提交者在将新条目放入队列空槽之后增加Tail条目指针。如果Tail条目指针增加超过了队列大小,则Tail条目应回滚到零。只要队列没满,提交者可以继续将条目放入队列空槽中。(参阅4.1.2章节)。
Note: The submitter shall take queue wrap conditions into account.
队列中条目的使用者使用当前Head条目指针来标识包含下一个要读取的条目。使用者在使用下一个条目之后,将增加Head条目指针。如果Head条目指针增加超过了队列大小,则Head条目指针应回滚到零。只要队列不为空,使用者可以继续读取队列中的条目。(参阅4.1.1章节)。
Note: The consumer shall take queue wrap conditions into account.
创建和删除Submission Queue以及相关联的Completion Queues需要由主机软件正确地进行排序。主机软件应在创建任何Submission Queue之前创建相关联的Completion Queues。在Completion Queues创建之后,可以随时创建相关联的Submission Queue。主机软件应在删除Completion Queues之前删除所有相关联的Submission Queue。要中止提交到Submission Queue的所有命令,主机软件应为该队列发出DeleteI/O Submission Queue Command。(参阅7.4.3章节)。
主机软件写Submission Queue Tail Doorbell(参阅3.1.16章节)和Completion Queue Head Doorbell(参阅3.1.17章节),来将对应的入口指针的新值传递给控制器。如果主机软件向Submission Queue Tail Doorbell或Completion Queue Head Doorbell寄存器写入无效值,且异步事件请求命令未被执行,则将一个异步事件发送到Admin Completion Queue,其状态码为Invalid Doorbell Write Value。相关的队列应由主机软件删除并重新创建。对于出现此错误的Submission Queue,控制器可以完成以前使用的命令;不再使用其他命令。这种情况可能是由于主机软件试图将一个条目添加到已满的Submission Queue或从空的Completion Queue中删除一个条目造成的。
主机软件检查内存中Completion Queue中的Phase Tag(P)位,以确定是否有新的Completion Queue条目。Completion Queue Tail 指针仅供内部控制器使用,对主机不可见。控制器使用Completion Queue条目中的SQ Head Pointer(SQHD)字段将Submission Queue Head指针的新值传递给主机。一个新的SQHD值表示Submission Queue条目已被使用,但这并不表示任何命令的执行或完成。(参阅4.6章节)
当主机将一个新值写入Submission Queue Tail Doorbell时,相关联的Submission Queue条目被提交给控制器,这表明Submission Queue Tail指针已移到或经过放置这些Submission Queue条目的槽位。一个Submission Queue Tail Doorbell写入可能表示提交了一个或多个Submission Queue条目。
在提交Completion Queue条目时,控制器已经使用了提交的Submission Queue条目,这表明Submission Queue Head指针已经移过放置这些Submission Queue条目的槽位。 一个Completion Queue条目可能表示已经使用了一个或多个 Submission Queue条目。
A Completion Queue entry is posted to the Completion Queue when the controller write of that Completion Queue entry to the next free Completion Queue slot inverts the Phase Tag (P) bit from its previous value in memory. The controller may generate an interrupt to the host to indicate that one or more Completion Queue entries have been posted.
A Completion Queue entry has been consumed by the host when the host writes the associated Completion Queue Head Doorbell with a new value that indicates that the Completion Queue Head Pointer has moved past the slot in which that Completion Queue entry was placed. A Completion Queue Head Doorbell write may indicate that one or more Completion Queue entries have been consumed.
Once a Submission Queue or Completion Queue entry has been consumed, the slot in which it was placed is free and available for reuse. Altering a Submission Queue entry after that entry has been submitted but before that entry has been consumed results in undefined behavior. Altering a Completion Queue entry after that entry has been posted but before that entry has been consumed results in undefined behavior.
If there are no free slots in a Completion Queue, then the controller shall not post status to that Completion Queue until slots become available. In this case, the controller may stop processing additional Submission Queue entries associated with the affected Completion Queue until slots become available. The controller shall continue processing for other queues.
当控制器将一个Completion Queue条目写入下一个空闲的Completion Queue槽位时,该Completion Queue条目将被发送到Completion Queue,并将 Phase Tag(P)位从内存中的上一个值反转。控制器可以生成一个中断发给主机,来表明已经提交了一个或多个Completion Queue条目。
当主机将一个新值写入Completion Queue Head Doorbell时,相关联的Completion Queue条目已被主机使用,这表明Completion Queue Head Doorbell指针已经移过放置这些Completion Queue条目的槽位。一个Completion Queue Head Doorbell写入可能表示已经使用了一个或多个Completion Queue条目。
一旦一个Submission Queue或Completion Queue条目被使用,它所在的插槽是空闲的且可以重用。在提交一个条目之后但在使用该条目之前更改Submission Queue条目会导致未定义的行为。在提交一个条目之后但在使用该条目之前更改Completion Queue条目会导致未定义的行为。
如果Completion Queue中没有空闲槽位,则控制器在槽位可用之前不得向该Completion Queue提交状态(status)。在这种情况下,控制器可能会停止处理与受影响的Completion Queue相关的其他 Submission Queue条目,直到槽位可用为止。控制器应继续处理其他队列。
4.1.1 Empty Queue
The queue is Empty when the Head entry pointer equals the Tail entry pointer. Figure 8 defines the Empty Queue condition.
当Head条目指针等于Tail条目指针时,队列为空。Figure 8定义了Empty Queue的条件。
4.1.2 Full Queue
The queue is Full when the Head equals one more than the Tail. The number of entries in a queue when full is one less than the queue size. Figure 9 defines the Full Queue condition.
Note: Queue wrap conditions shall be taken into account when determining whether a queue is Full.
当Head指针比Tail指针多1时,队列为满。队列满时,队列中的条目数目比队列大小少1。Figure 9定义了Full Queue的条件。
4.1.3 Queue Size
The Queue Size is indicated in a 16-bit 0’s based field that indicates the number of slots in the queue. The minimum size for a queue is two slots. The maximum size for either an I/O Submission Queue or an I/O Completion Queue is defined as 64K slots, limited by the maximum queue size supported by the controller that is reported in the CAP.MQES field. The maximum size for the Admin Submission and Admin Completion Queue is defined as 4K slots. One slot in each queue is not available for use due to Head and Tail entry pointer definition.
4.1.4 Queue Identifier
Each queue is identified through a 16-bit ID value that is assigned to the queue when it is created.
4.1.5 Queue Priority
If the weighted round robin with urgent priority class arbitration mechanism is supported, then host software may assign a queue priority service class of Urgent, High, Medium or Low. If the weighted round robin with urgent priority class arbitration mechanism is not supported, then the priority setting is not used and is ignored by the controller.
4.1.3 Queue Size
队列大小由从16位0开始的字段表示,该字段表示了队列中的槽位数。队列的最小大小是两个槽位。I/O Submission Queue或I/O Completion Queue的最大大小定义为64K个槽位,其还受CAP.MQES字段中报告的控制器支持的最大队列大小的限制。Admin Submission Queue和Admin Completion Queue的最大大小定义为4K个槽位。由于Head和Tail条目指针的定义,每个队列中有一个槽位是不可用的。
4.1.4 Queue Identifier
每个队列都通过一个16位ID值进行标识,该值在创建队列时分配。
4.1.5 Queue Priority
如果支持带紧急优先级仲裁机制的加权轮询(weighted round robin with urgent priority class arbitration mechanism),则主机软件可以分配Urgent、High、Medium、Low的队列优先服务等级(queue priority service class)。如果不支持带紧急优先级仲裁机制的加权轮询,则不使用优先级设置,控制器将忽略该优先级设置。