Bamboo is a python module for building production-quality research pipelines for processing video and stillframe images.
- Visually compare the results of face-detection or object-classification algorithms.
- Do the same comparision methodologically over a million images, storing the results in a structured database.
- Use the same pipeline to process from a disk directory, a video file, an RTSP camera stream, or JPEGs on-demand as they are loaded into an Amazon S3 bucket.
- Debug pipelines on your laptop in an easy-to-use single-threaded environment, then deploy them on a laptop or workstation with a multi-thread server, on a cluster of high-performance servers, or in a serverless cloud environment using Amazon Lambda.
- Provides a consistent, easy-to-program object-oriented abstraction layer that easily wraps existing computer vision systems including OpenCV, YOLOv8 and Amazon Rekognition.
scale
- the amount to expand a face area. scale=1.0 is default (no expansion)
Process and archive surveillance video to answer useful questions such as:
- Who was present on which days?
- How many people did we see on a day?
- Which vehicles entered our garage?
- When were people in the office?
Take data from a variety of streams, including:
- Google Nest cameras (captured from the Google Nest cloud using the API)
- Any on-prem camera (captured using either an RTSP stream or a sequence of JPEGs)
- Uploaded video
- Cell phones repurposed as surveillance cameras
- (ESP32-cam)[https://google.com/search?q=ESP32-CAM]
Plug-in architecture:
- It's clear that we will always want to be able to have a plug-in interface and be able to support multiple plugins at each step of the pipeline.
- We can have the plugins union, intersection, or vote.
- With two plugins, we can compare them against each other (for running experiments.)
Processing options:
- Single-threaded on local machine for debugging
- Multi-threaded on local machine for performance
- Lambda or GCF of Azure Functions
- Video change detection
- Object detection in a video
- Face recognition:
- Structured database
- Stores the result of the tagged video
- Video storage
- Can store frames or compressed video. Frames are higher quality; compressed video stores more. (Video is compressed as a series of I & D frames)
Initially we will prototype a number of small scripts to get an ideas of how this stuff works.
https://meraki.cisco.com/lib/pdf/meraki_datasheet_mv_sense.pdf https://documentation.meraki.com/General_Administration/Other_Topics/Cisco_Meraki_Dashboard_API
- Iterate through all of the jpegs that have been captured in chronological order.
- When a JPEG has significantly changed, copy it to the image store (local or s3) and run it through image processing.
- Store the results of the image processing in a scalable store as a JSON object.
- Store results by recognizer, so we can use several of them.
JPEGs: We're storing individual JPEGs in a directory hiearchy that is optimized to have 1000-5000 images per prefix (directory).
We anticipate that we'll have ~ 10-500 images per camera per day (local time or GMT is a config variables):
{root}/{camera}/{year:04-month:02}/{yearmonthday-hourminsec}.jpeg
Identifying which frames to process:
- Videos are chopped into frames. (Pretty standard; ffmpeg can do this.)
- Each frame that is the first in a sequence or significnatly different from the previous frame is tagged for processing.
- Optionally we will tag a window around the changed frames for tagging processing as well.
- Videos and frames are expunged after a retention time.
Current implementation:
- (ingest.py)[./ingest.py]
Processing frames:
- Each frame is represented by an object.
- Any number of processors can review a frame. We would likely have taggers for:
- Faces (input: frame; output: face regions)
- Objects (input: frame; output: objects)
- Face vectors (input: face regions; output: face vectors)
- Identities (input: face vectors; output: identities)
- Pipeline is constructed as a series of producer/consumer queues.
- Makes it easy to support multi-threaded and multi-processing environments.
- Ensemble processors are processors that have a single input, run multiple sub-processors, and have a single output.
- An ensemble processor automatically stores results that can be used for producing experimental reports.
- Python
- OpenCV
- DynamoDB for tag storage (develop with (DynamoDB Local)[https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html])
- AWS Rekognition
- Open Source Face Recognition options
- OpenFace - https://github.com/TadasBaltrusaitis/OpenFace - v2.2.0 - July 13, 2019
- Open Source frameworks:
- Facenet - https://github.com/davidsandberg/facenet - (2018)
- face_recognition - https://github.com/ageitgey/face_recognition - "The world's simplest facial recognition api for Python and the command line" (2018)
- https://github.com/serengil/deepface (current)
- "A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python"
- pip install
- highly accurate detection
- Creates embeddings
- "Deepface is a hybrid face recognition package. It currently wraps many state-of-the-art face recognition models: VGG-Face , Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace, Dlib, SFace and GhostFaceNet. The default configuration uses VGG-Face model."
- https://github.com/SthPhoenix/InsightFace-REST (current) - NVIDIA & TensorRT for optimized inference
Face clustering:
Do we want to have an abstract pipeline object?
- Input and output
- Annotation
- Easily do experiments with an object that specifies multiple other objects.
- Connect them together with YAML
- Designed for running in a functions-as-a-service
- Designed for storage with an object store like S3.
- https://universe.roboflow.com/ - "The world's largest collection of open source computer vision datasets and APIs." (Unfortunately, no consistent API).