Consolidate video.py and capture.py for local hardware acceleration
abrichr opened this issue · comments
Feature request
capture/_macos.py
uses AVFoundation
, capture/_windows.py
uses screen_recorder_sdk
which uses MediaFoundationAPI
. These are likely to be more performant than mss
used in record.py
and video.py
, but currently capture
does not support extracting time aligned screenshots (while video
does):
(openadapt-py3.10) abrichr@MacBook-Pro-4 OpenAdapt % ffprobe captures/2024-02-19-10-43-33.mov
ffprobe version 6.1.1 Copyright (c) 2007-2023 the FFmpeg developers
built with Apple clang version 15.0.0 (clang-1500.1.0.2.5)
configuration: --prefix=/usr/local/Cellar/ffmpeg/6.1.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopenvino --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
libpostproc 57. 3.100 / 57. 3.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7fd88a704b40] moov atom not found
captures/2024-02-19-10-43-33.mov: Invalid data found when processing input
This issue will be complete once we have modified these files to support saving video files recorded via openadapt.capture
from which time-aligned screenshots can be extracted. i.e. we need to modify openadapt.capture._macos.Capture
and openadapt.capture._windows.Capture
to supply screenshots in memory instead of file, e.g. self.session.addOutput_(self.file_output)
.
Motivation
Local hardware acceleration -> maximum performance
Via ChatGPT:
To replace self.session.addOutput_(self.file_output)
with a mechanism that calls a callback with a screenshot in your macOS capture implementation, you would typically use AVCaptureVideoDataOutput
instead of AVCaptureMovieFileOutput
. AVCaptureVideoDataOutput
allows you to receive video frames as they are captured, which you can then process in a callback method.
Here’s a conceptual outline on how to set this up:
-
Use
AVCaptureVideoDataOutput
: This class provides a way to capture video frames as they are produced by the capture session. -
Set up a Delegate for Frame Capture: Implement a delegate that conforms to the
AVCaptureVideoDataOutputSampleBufferDelegate
protocol. This delegate will receive callbacks with the video frames. -
Implement the Callback Method: The delegate's callback method receives a
CMSampleBufferRef
that contains the frame data. You can then convert this sample buffer into a format suitable for your needs (e.g., a screenshot).
Step-by-Step Implementation
First, modify your Capture
class to include an AVCaptureVideoDataOutput
and set up the delegate:
from Foundation import NSObject, NSLog
import AVFoundation as AVF
from Quartz import CGMainDisplayID
class SampleBufferDelegate(NSObject):
def captureOutput_didOutputSampleBuffer_fromConnection_(self, captureOutput, sampleBuffer, connection):
# This method is called with a CMSampleBufferRef `sampleBuffer`
# You can convert this to a screenshot here and call your desired callback
NSLog("Received a frame")
# Conversion to screenshot and callback call goes here
class Capture:
def __init__(self):
# Initialize as before...
self.videoDataOutput = None
self.videoDataOutputQueue = None
self.sampleBufferDelegate = None
def start(self, audio: bool = False, camera: bool = False):
# Setup as before...
# Setup video data output
self.videoDataOutput = AVF.AVCaptureVideoDataOutput.alloc().init()
self.videoDataOutputQueue = AVF.dispatch_queue_create("videoDataOutputQueue", None)
self.sampleBufferDelegate = SampleBufferDelegate.alloc().init()
self.videoDataOutput.setSampleBufferDelegate_queue_(self.sampleBufferDelegate, self.videoDataOutputQueue)
if self.session.canAddOutput_(self.videoDataOutput):
self.session.addOutput_(self.videoDataOutput)
Notes:
-
Conversion of
CMSampleBufferRef
: Converting aCMSampleBufferRef
to a more usable format (like an image or numpy array) inside thecaptureOutput_didOutputSampleBuffer_fromConnection_
method requires additional steps. This might involve extracting the image buffer (CVImageBufferRef
) and then converting it to your desired format. This process can be complex and might involve using additional libraries (like OpenCV) for image manipulation. -
Performance Considerations: Processing video frames in real-time can be CPU-intensive. Ensure your callback and conversion logic is optimized for performance. Depending on your requirements, you might want to drop frames or reduce the resolution to maintain performance.
-
Threading: The callback method
captureOutput_didOutputSampleBuffer_fromConnection_
is called on the dispatch queue specified when setting the sample buffer delegate. Make sure any UI updates or intensive processing tasks are dispatched back to the main thread or handled appropriately to avoid blocking the capture queue.
This approach allows you to intercept video frames as they are captured, enabling you to process and use them as screenshots within your application.
@0dm thoughts? 🙏 😄
@0dm thoughts? 🙏 😄
This could work. I will look into implementing this sometime this week.
Regarding this:
Performance Considerations: Processing video frames in real-time can be CPU-intensive. Ensure your callback and conversion logic is optimized for performance. Depending on your requirements, you might want to drop frames or reduce the resolution to maintain performance.
See max_cpu_percent
and related for an attempt to implement this https://github.com/OpenAdaptAI/OpenAdapt/pull/569/files#diff-57d8577d1fb5faaf576a6f5663741c83e672378c13c91a1db036fb7a3f05e067R559
@Cody-DV for a Windows approach see:
https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/capture/_windows.py
https://chat.openai.com/share/19cc37a0-750f-451a-95cf-acad27efb7b6
import cv2
import numpy as np
import time
from screen_recorder_sdk import screen_recorder
def capture_frames_in_memory(duration, fps):
"""
Captures frames for a given duration and fps, and stores the video in memory.
:param duration: Duration to capture video for in seconds
:type duration: int
:param fps: Frames per second
:type fps: int
"""
frame_interval = 1.0 / fps
num_frames = int(duration * fps)
# Initialize video capture parameters
params = screen_recorder.RecorderParams()
screen_recorder.init_resources(params)
# Prepare the first screenshot to determine resolution
image = screen_recorder.get_screenshot()
frame = np.array(image)
height, width, layers = frame.shape
size = (width, height)
# Initialize an in-memory video writer using OpenCV
# FourCC is a 4-byte code used to specify the video codec. The list of available codes can be found in fourcc.org.
# *'MP4V' is a codec that is compatible with MP4 files.
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
video_writer = cv2.VideoWriter('appsrc ! videoconvert ! x264enc noise-reduction=10000 speed-preset=ultrafast tune=zerolatency ! mp4mux ! filesink location=video.mp4 ', fourcc, fps, size)
start_time = time.time()
for _ in range(num_frames):
image = screen_recorder.get_screenshot()
frame = np.array(image)
video_writer.write(cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
time.sleep(frame_interval)
video_writer.release()
screen_recorder.free_resources()
elapsed_time = time.time() - start_time
print(f"Capturing completed in {elapsed_time:.2f} seconds.")
# Example usage
if __name__ == "__main__":
duration = 5 # seconds
fps = 10
capture_frames_in_memory(duration, fps)
We can replace the cv2 writer with what we have in https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/video.py