DMA CACHE一致性问题解决方案

2023-10-14 20:12:16

static int macb_alloc_consistent(struct macb *bp)
{
    struct macb_queue *queue;
    unsigned int q;
    int size;

    for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
        size = TX_RING_BYTES(bp) + bp->tx_bd_rd_prefetch;
        queue->tx_ring = dma_alloc_coherent(&bp->pdev->dev, size,
                            &queue->tx_ring_dma,
                            GFP_KERNEL);
        if (!queue->tx_ring)
            goto out_err;
        netdev_dbg(bp->dev,
               "Allocated TX ring for queue %u of %d bytes at %08lx (mapped %p)\n",
               q, size, (unsigned long)queue->tx_ring_dma,
               queue->tx_ring);

        size = bp->tx_ring_size * sizeof(struct macb_tx_skb);
        queue->tx_skb = kmalloc(size, GFP_KERNEL);
        if (!queue->tx_skb)
            goto out_err;

        size = RX_RING_BYTES(bp) + bp->rx_bd_rd_prefetch;
        queue->rx_ring = dma_alloc_coherent(&bp->pdev->dev, size,
                         &queue->rx_ring_dma, GFP_KERNEL);
        if (!queue->rx_ring)
            goto out_err;
        netdev_dbg(bp->dev,
               "Allocated RX ring of %d bytes at %08lx (mapped %p)\n",
               size, (unsigned long)queue->rx_ring_dma, queue->rx_ring);
    }
    if (bp->macbgem_ops.mog_alloc_rx_buffers(bp))
        goto out_err;

    return 0;

out_err:
    macb_free_consistent(bp);
    return -ENOMEM;
}

dma_alloc_coherent 在 arm 平台上会禁止页表项中的 C （Cacheable）域以及 B (Bufferable)域。
而 dma_alloc_writecombine 只禁止 C （Cacheable）域.

C 代表是否使用高速缓冲存储器（cacheline），而 B 代表是否使用写缓冲区。

这样，dma_alloc_writecombine 分配出来的内存不使用缓存，但是会使用写缓冲区。而 dma_alloc_coherent 则二者都不使用。
C B 位的具体含义
0 0 无cache，无写缓冲；任何对memory的读写都反映到总线上。对 memory 的操作过程中CPU需要等待。
0 1 无cache，有写缓冲；读操作直接反映到总线上；写操作，CPU将数据写入到写缓冲后继续运行，由写缓冲进行写回操作。
1 0 有cache，写通模式；读操作首先考虑cache hit；写操作时直接将数据写入写缓冲，如果同时出现cache hit，那么也更新cache。
1 1 有cache，写回模式；读操作首先考虑cache hit；写操作也首先考虑cache hit。

效率最高的写回，其次写通，再次写缓冲，最次非CACHE一致性操作。

其实，写缓冲也是一种非常简单得CACHE，为何这么说呢。

我们知道，DDR是以突发读写的，一次读写总线上实际会传输一个burst的长度，这个长度一般等于一个cache line的长度。

cache line是32bytes。即使读1个字节数据，也会传输32字节，放弃31字节。

写缓冲是以CACHE LINE进行的，所以写效率会高很多。

DMA 导致的 CACHE 一致性问题解决方案

码农公寓

相关文章