Faster detection on ffmpeg videos

Question

Faster detection on ffmpeg videos

Jefry-nolastname opened this issue 2 years ago · comments

I'm developing a CCTV face detection app to send frame with good enough face detection quality (qTresh from example) to my backend site with employee face recognition capabilities. Nothing's wrong with Pigo and everything work as expected of this GODLY portable pure-go face-detection pkg. But I'm sort of stucked with some latency problem due to high fps (amount of frames) that the app need to "detect" with Pigo. I tried to decrease the FPS of the video reading to 10 fps but the latency still can't catch up to the video frames send by the CCTV.

Please help me if you got any tips or recommendation as a CLEARLY more experienced devs ^^.

I'm using Pigo 1.4.6.

Here's my code:

package main

import (
	"io"
	"io/ioutil"
	"log"
	"os"
	"time"

	pigo "github.com/esimov/pigo/core"
	"github.com/fogleman/gg"
	"github.com/unixpickle/ffmpego"
)

var f *os.File
var dc *gg.Context

func videoFileFacetest() {
	cascadeFile, err := ioutil.ReadFile("./cascade/facefinder")
	if err != nil {
		log.Fatalf("Error reading the cascade file: %v", err)
	}

	vr, _ := ffmpego.NewVideoReaderResampled("sample2.mp4", 10)
	// vw, _ := ffmpego.NewVideoWriter("output.mp4", vr.VideoInfo().Width, vr.VideoInfo().Height, 10)
	i := 0
	for {
		i++
		frame, err := vr.ReadFrame()
		if err == io.EOF {
			break
		}
		start := time.Now()
		pixels := pigo.RgbToGrayscale(frame)

		cols, rows := frame.Bounds().Max.X, frame.Bounds().Max.Y

		cParams := pigo.CascadeParams{
			MinSize:     20,
			MaxSize:     1000,
			ShiftFactor: 0.1,
			ScaleFactor: 1.1,

			ImageParams: pigo.ImageParams{
				Pixels: pixels,
				Rows:   rows,
				Cols:   cols,
				Dim:    cols,
			},
		}

		pigoFunc := pigo.NewPigo()
		classifier, err := pigoFunc.Unpack(cascadeFile)
		if err != nil {
			log.Fatalf("Error reading the cascade file: %s", err)
		}
		angle := 0.0 // cascade rotation angle. 0.0 is 0 radians and 1.0 is 2*pi radians

		// Run the classifier over the obtained leaf nodes and return the detection results.
		// The result contains quadruplets representing the row, column, scale and detection score.
		dets := classifier.RunCascade(cParams, angle)
		// // Calculate the intersection over union (IoU) of two clusters.

		// print(len(dets))
		// print("_")
		// println(len(dets2))

		var qTresh float32
		qTresh = 6.8
		goodQ := []pigo.Detection{}
		for i := range dets {
			if dets[i].Q > qTresh {
				goodQ = append(goodQ, dets[i])
			}
		}
		if len(goodQ) > 0 {
			dets2 := classifier.ClusterDetections(goodQ, 0.01)
			if len(dets2) > 0 {
				// dc = gg.NewContext(cols, rows)
				// dc.DrawImage(frame, 0, 0)

				for i := 0; i < len(dets2); i++ {
					if dets2[i].Q > qTresh {
						println("we got a winner here")
						// dc.DrawRectangle(
						// 	float64(dets2[i].Col-dets2[i].Scale/2),
						// 	float64(dets2[i].Row-dets2[i].Scale/2),
						// 	float64(dets2[i].Scale),
						// 	float64(dets2[i].Scale),
						// )

						// dc.SetLineWidth(20.0)
						// dc.SetStrokeStyle(gg.NewSolidPattern(color.RGBA{R: 255, G: 0, B: 0, A: 255}))
						// dc.Stroke()

						// vw.WriteFrame(dc.Image())
					}
				}
			}
		}
		elapsed := time.Since(start)
		log.Printf("Loop took %s", elapsed)
	}

	vr.Close()
	// vw.Close()
	return
}

func main() {
	videoFileFacetest()
}

Here's the sample video:

https://www.pexels.com/video/people-going-in-and-out-of-the-royal-opera-house-1721303/

Endre Simo · Answer 1 · Fri Oct 07 2022 20:34:57 GMT+0800 (China Standard Time)

What you can do is to increase the MinSize to 50 or above to achieve a better frame rate, because this value defines the lower threshold above the face detector should be performed. I assume that the chances are very low that you need to analyze videos where the face sizes are smaller than 50px. Also if you already know the video size in advance you can adapt the MinSize and MaxSize accordingly. Another tip is to increase the ScaleFactor with a few decimals, maybe around 1.3, 1.4. I've changed the MinSize to 80 while testing and I got pretty decent latencies. There is no need to decrease the video frame rate, just adapt the cascade values.

Here are my settings:

cParams := pigo.CascadeParams{
	MinSize:     80,
	MaxSize:     1000,
	ShiftFactor: 0.1,
	ScaleFactor: 1.4,
	ImageParams: pigo.ImageParams{
		Pixels: pixels,
		Rows:   rows,
		Cols:   cols,
		Dim:    cols,
	},
}

pj · Answer 2 · Fri Oct 07 2022 20:42:47 GMT+0800 (China Standard Time)

Direct him to your real time examples you did with web cam. They achieved good frame rate.

Secondly there is an optimisation you can make which I regularly do. You can assume the face in the next frame is approximately in same area.

So in frame 1, you detect the number of faces and area. Say 1 face was discovered in pos A.

In frame 2 you first crop the frame to pos A + 20%. You then look for faces in the sub frame.

If you find the face, move to frame 3. If not, search for faces in entire frame 2.

You can dramatically increase performance this way.

You also must use of all your CPU cores.

Endre Simo · Answer 3 · Sat Oct 08 2022 17:19:02 GMT+0800 (China Standard Time)

@pjebs that's sound like a really cool optimization approach, I haven't thought about it. Do you have some available examples about this approach?

Jefry · Answer 4 · Mon Oct 10 2022 12:07:51 GMT+0800 (China Standard Time)

@esimov thank you so much. After i set the min/max based on video res, the speed doubled/tripled. Thanks.

  cols, rows := frame.Bounds().Max.X, frame.Bounds().Max.Y
  cParams := pigo.CascadeParams{
	  MinSize:     rows / 18,
	  MaxSize:     rows,
	  ShiftFactor: 0.1,
	  ScaleFactor: 1.4,
	  ImageParams: pigo.ImageParams{
		  Pixels: pixels,
		  Rows:   rows,
		  Cols:   cols,
		  Dim:    cols,
	  },
  }

@pjebs That's so Giga Brain of you. But that optimisation to me at least, seems best used for 1 face detection in each frame and quite costly for the performance if the frame a bit crowded. But i'll try it anyway and see for myself. Thanks

Endre Simo · Answer 5 · Mon Oct 10 2022 13:45:43 GMT+0800 (China Standard Time)

This is exactly the same reason why I was curious about @pjebs suggestions, because cropping the image on each frame rate is quite costly, definitely more computation extensive than detecting the whole image, considering also that you have to deal with multiple faces. Maybe with one or two faces this approach is affordable giving a better performance boost.