ikwzm / udmabuf

User space mappable dma buffer device driver for Linux.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Buffer from UDMABUF and V4L2_MEMORY_USERPTR

h4tr3d opened this issue · comments

Hi!

I tried to pass buffer taken from UDMABUF to the V4L2 as User buffer (as WA for issue: https://forums.xilinx.com/t5/Embedded-Linux/V4l2-V4L2-MEMORY-USERPTR-contiguous-mapping-is-too-small-4096/td-p/825067) and takes error:

V4L2 ioctl error (22): Invalid argument

on VIDIOC_QBUF.

After smoke research, I take next result:

Feb 27 02:58:18 X335002201 user.info kernel: [   70.931051] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931062] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931066] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931070] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931075] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931078] v4l_qbuf: check_fmt() ret=0
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931082] vb2_qbuf(): vb2_queue_or_prepare_buf() ret=0
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931083] vb2_core_qbuf()
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931098] get_vaddr_frames() flags=0x4fb
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931103] get_vaddr_frames(): follow_pfn fail: -22/-22
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931105] vb2_create_framevec(): ret=-22
Feb 27 02:58:18 X335002201 user.err kernel: [   70.931107] create framevec fail: -22
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931110] vb2_core_qbuf: __buf_prepare, ret=-22
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931112] v4l_qbuf: ops->vidioc_qbuf() ret=-22

Looking into pfn_fail() shows, that nested call of __follow_pte_pmd() at the mm/memory.c is failed.

Is it possible to use buffers from UDMABUF as user buffers for V4L2?

Thank you for the issue.

I have never used udmabuf as a user buffer for V4L2.
Therefore, it is not clear whether udmabuf can be used as a user buffer for V4L2.
However, the idea of ​​using udmabuf as a user buffer for V4L2 is very appealing.
I want to be able to use udmabuf as a V4L2 user buffer.
So I want to know more about this issue.

  1. What is the Linux kernel version?
  2. What is your V4L2 Video device?
  3. Can you disclose the source code of the user application that causes this problem?

We would appreciate it if you could cooperate.

Hi, thanks for answer!

Start from end:

  1. Source: https://github.com/h4tr3d/v4l2-capture-complex:

  2. V4L2 device is xilinx_video, HDMI_RX capture pipeline: v-hdmi-rx-ss (xilinx_vphy, xilinx_hdmi_rx) -> v-pros-ss (xilinx_vpss_scaler) -> v-frmbuf-wr (xilinx-frmbuf). Pipeline managed by xilinx_video driver

  3. Kernel is 4.19, shipped with PetaLinux 2019.2

  4. Platform for tests: ZCU104 board by Xilinx with ZynQ UltraScale+

Major problem with V4L2 interface by Xilinx: MMAP copy so slow due to mapped into uncached memory. But USERPTR need to be in contiguous memory, that impossible to do in generic way with tools like malloc()/posix_memalign(). So, I tried to allocate contiguous memory block using UDMABUF in CMA and provide them as USERPTR to V4L2.

The cause was found.
u-dma-buf employs on-demand paging, and just after mmap () no physical memory was allocated to the virtual address.

Physical memory is allocated to the virtual address of u-dma-buf by writing once to the memory after mmap as follows.

    explicit cma_frame_allocator(size_t buffers)
        : dmabuf("udmabuf@0x00")
    {
        assert(buffers <= 4);

        ptr = reinterpret_cast<uint8_t*>(dmabuf.get());

        buf[0] = ptr + 0 * single_buffer_size;
        buf[1] = ptr + 1 * single_buffer_size;
        buf[2] = ptr + 2 * single_buffer_size;
        buf[3] = ptr + 3 * single_buffer_size;
        // Write once to allocate physical memory to u-dma-buf virtual space.
        // Note: Do not use memset() for this.
        //       Because it does not work as expected.
        {
            uint64_t* word_ptr = reinterpret_cast<uint64_t*>(ptr);
            size_t    words    = 4*single_buffer_size/sizeof(uint64_t);
            for(int i = 0 ; i < words; i++) {
                word_ptr[i] = 0;
            }
        }
    }

Do not use memset () for this. memset () is very complex and will not produce the desired results.

I want to change u-dma-buf to do this automatically.
However, please allow time for any changes.
Until then, please clear the buffer after mmap, as in the source code above.

Thank you for your cooperation

@ikwzm Wow! Great. I will try this at the Monday on real HW. Thanks a lot!

Great! It works!

addr: 0x000000006FE00000, size: 53084160
cma vaddr: 0x7f82525000
REQBUFS: caps=0
avg frame copy: 1608683 ns / 1.600 ms
avg frame copy: 1594989 ns / 1.583 ms
avg frame copy: 1606068 ns / 1.600 ms
avg frame copy: 1607162 ns / 1.600 ms
avg frame copy: 1611885 ns / 1.600 ms
avg frame copy: 1619447 ns / 1.617 ms
avg frame copy: 1599956 ns / 1.583 ms
avg frame copy: 1610549 ns / 1.600 ms
avg frame copy: 1610608 ns / 1.600 ms
avg frame copy: 1606981 ns / 1.600 ms
avg frame copy: 1590546 ns / 1.583 ms
avg frame copy: 1606250 ns / 1.600 ms
avg frame copy: 1611319 ns / 1.600 ms

1.6ms for frame copy vs 24ms for MMAP mode.