esimov / pigo

Fast face detection, pupil/eyes localization and facial landmark points detection library in pure Go.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Detect rotated faces

fengweiqiang opened this issue · comments

The number of faces in this image should be 4 rectangles, so why get -out file is 3 rectangles (-iou 0.1 or -iou 0.2)
duogeren

-iou 0.1

outduogeren

-iou 0.2
outduogeren

This issue fits into the todo list: right now the API cannot detect the rotated faces. Please check the documentation. In this particular case the girl face exceeds the maximum permissible threshold which the algorithm can detect faces.

image

I used https://github.com/hybridgroup/gocv
Img := gocv.IMRead("./facelearn/duogeren.jpg", gocv.IMReadReducedColor2)
He called the C library and can get 4 rectangles. I don't know if it can help you.

type IMReadFlag int

const (
	// IMReadUnchanged return the loaded image as is (with alpha channel,
	// otherwise it gets cropped).
	IMReadUnchanged IMReadFlag = -1

	// IMReadGrayScale always convert image to the single channel
	// grayscale image.
	IMReadGrayScale = 0

	// IMReadColor always converts image to the 3 channel BGR color image.
	IMReadColor = 1

	// IMReadAnyDepth returns 16-bit/32-bit image when the input has the corresponding
	// depth, otherwise convert it to 8-bit.
	IMReadAnyDepth = 2

	// IMReadAnyColor the image is read in any possible color format.
	IMReadAnyColor = 4

	// IMReadLoadGDAL uses the gdal driver for loading the image.
	IMReadLoadGDAL = 8

	// IMReadReducedGrayscale2 always converts image to the single channel grayscale image
	// and the image size reduced 1/2.
	IMReadReducedGrayscale2 = 16

	// IMReadReducedColor2 always converts image to the 3 channel BGR color image and the
	// image size reduced 1/2.
	IMReadReducedColor2 = 17

	// IMReadReducedGrayscale4 always converts image to the single channel grayscale image and
	// the image size reduced 1/4.
	IMReadReducedGrayscale4 = 32

	// IMReadReducedColor4 always converts image to the 3 channel BGR color image and
	// the image size reduced 1/4.
	IMReadReducedColor4 = 33

	// IMReadReducedGrayscale8 always convert image to the single channel grayscale image and
	// the image size reduced 1/8.
	IMReadReducedGrayscale8 = 64

	// IMReadReducedColor8 always convert image to the 3 channel BGR color image and the
	// image size reduced 1/8.
	IMReadReducedColor8 = 65

	// IMReadIgnoreOrientation do not rotate the image according to EXIF's orientation flag.
	IMReadIgnoreOrientation = 128

	//IMWriteJpegQuality is the quality from 0 to 100 for JPEG (the higher is the better). Default value is 95.
	IMWriteJpegQuality = 1

	// IMWriteJpegProgressive enables JPEG progressive feature, 0 or 1, default is False.
	IMWriteJpegProgressive = 2

	// IMWriteJpegOptimize enables JPEG optimization, 0 or 1, default is False.
	IMWriteJpegOptimize = 3

	// IMWriteJpegRstInterval is the JPEG restart interval, 0 - 65535, default is 0 - no restart.
	IMWriteJpegRstInterval = 4

	// IMWriteJpegLumaQuality separates luma quality level, 0 - 100, default is 0 - don't use.
	IMWriteJpegLumaQuality = 5

	// IMWriteJpegChromaQuality separates chroma quality level, 0 - 100, default is 0 - don't use.
	IMWriteJpegChromaQuality = 6

	// IMWritePngCompression is the compression level from 0 to 9 for PNG. A
	// higher value means a smaller size and longer compression time.
	// If specified, strategy is changed to IMWRITE_PNG_STRATEGY_DEFAULT (Z_DEFAULT_STRATEGY).
	// Default value is 1 (best speed setting).
	IMWritePngCompression = 16

	// IMWritePngStrategy is one of cv::IMWritePNGFlags, default is IMWRITE_PNG_STRATEGY_RLE.
	IMWritePngStrategy = 17

	// IMWritePngBilevel is the binary level PNG, 0 or 1, default is 0.
	IMWritePngBilevel = 18

	// IMWritePxmBinary for PPM, PGM, or PBM can be a binary format flag, 0 or 1. Default value is 1.
	IMWritePxmBinary = 32

	// IMWriteWebpQuality is the quality from 1 to 100 for WEBP (the higher is
	// the better). By default (without any parameter) and for quality above
	// 100 the lossless compression is used.
	IMWriteWebpQuality = 64

	// IMWritePamTupletype sets the TUPLETYPE field to the corresponding string
	// value that is defined for the format.
	IMWritePamTupletype = 128

	// IMWritePngStrategyDefault is the value to use for normal data.
	IMWritePngStrategyDefault = 0

	// IMWritePngStrategyFiltered is the value to use for data produced by a
	// filter (or predictor). Filtered data consists mostly of small values
	// with a somewhat random distribution. In this case, the compression
	// algorithm is tuned to compress them better.
	IMWritePngStrategyFiltered = 1

	// IMWritePngStrategyHuffmanOnly forces Huffman encoding only (no string match).
	IMWritePngStrategyHuffmanOnly = 2

	// IMWritePngStrategyRle is the value to use to limit match distances to
	// one (run-length encoding).
	IMWritePngStrategyRle = 3

	// IMWritePngStrategyFixed is the value to prevent the use of dynamic
	// Huffman codes, allowing for a simpler decoder for special applications.
	IMWritePngStrategyFixed = 4
)

// IMRead reads an image from a file into a Mat.
// The flags param is one of the IMReadFlag flags.
// If the image cannot be read (because of missing file, improper permissions,
// unsupported or invalid format), the function returns an empty Mat.
//
// For further details, please see:
// http://docs.opencv.org/master/d4/da8/group__imgcodecs.html#ga288b8b3da0892bd651fce07b3bbd3a56
//
func IMRead(name string, flags IMReadFlag) Mat {
	cName := C.CString(name)
	defer C.free(unsafe.Pointer(cName))

	return newMat(C.Image_IMRead(cName, C.int(flags)))
}

Once the object rotation feature will be implemented this won't be an issue anymore.

With the new release of Pigo, it's possible now to detect rotated faces however with a small limitation, since you have to provide a specific angle to match the faces against it. Here is an example to detect the 4th missed face undetected previously:

$ pigo -in ~/Desktop/49496258-66536f80-f8a0-11e8-965b-4bdfb7f14524.jpg -out ~/Desktop/output.jpg -cf data/facefinder -angle=0.5 -iou=0.2

But this means to detect all the faces in an image the same command should be ran with the angle parameter ranging from 0.0 to 1.0, but this will cost in performance. Maybe in a feature release i will focus on resolving this kind of limitation.

Hey Endre. Thanks for the library. I too am wondering about detecting faces that are in a profile view.

I tested both eunseo.jpg and jimin3.jpg from this dataset https://github.com/Kagami/go-face-testdata without specifying an angle, but pigo didn't find a face.

Hey Nathan, that's true. I will try to investigate and get a workaround for this kind of limitation. Anyway i'm not really sure if the detector can detect 100% profile faces, but can detect deviations from face views.

@esimov any update on this?

Not yet, right now i'm working on WASM support. Afterwards i can check this issue.

how it's going

@esimov Any idea whether it will be possible to support profile or semi-profile face images?

Any idea whether it will be possible to support profile or semi-profile face images?

Not yet!

I feel like this should be the top priority for now. Without rotation detection, performance on video streams is horrible even with the slightest movement of the head.

I have a few ideas in my mind how this issue could be resolved, but meantime any contribution is welcomed ;).

I am wondering if we could just train a new cascade dataset as explained here: https://github.com/nenadmarkus/pico/tree/master/gen/sample - using a dataset which includes faces that are in various degrees of profile/rotation? I am guessing there are some good datasets around at this point that include markers for these variations of faces?

Here https://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework - it is explained:
To make the task more manageable, the Viola–Jones algorithm only detects full view (no occlusion), frontal (no head-turning)...
but further down:
The "frontal" requirement is non-negotiable, as there is no simple transformation on the image that can turn a face from a side view to a frontal view. However, one can train multiple Viola-Jones classifiers, one for each angle: one for frontal view, one for 3/4 view, one for profile view, a few more for the angles in-between them. Then one can at run time execute all these classifiers in parallel to detect faces at different view angles.

The thing is that I don't really know if that this kind of dataset is available on the https://www.vision.caltech.edu/ which the training model is referring to. The caltech_10k_webfaces contains only frontal faces, so not rotated faces. So in order to train the model with the rotated faces we need some dataset with rotated faces, but also using the format appropriate for picolrn.

So I wrote an email to Nenand, the creator of the Pico libraries and algorithm and he responded to some of my questions regarding this:

I was wondering if training it with more data that also includes faces in various degrees of profile might help improve detection when heads are turned?

Yes, definitely. I know about a few companies that have trained frontal+profile face cascades via Pico.

I notice in your code in the caltechfaces.py file you seem to be converting the eye data into bounding boxes, and many datasets already come with bounding boxes. I am wondering if this would be fairly straightforward...

If the dataset has bounding boxes, you can use that. But have in mind that the aspect ratio of the box always has to be the same in Pico. You can devise some kind of algorithm to make the conversion for you before training.

Yes, definitely. I know about a few companies that have trained frontal+profile face cascades via Pico.

The question is from where did they obtained the training data? Did you know or somehow Nenand mentioned what kind of training data they used?

I notice in your code in the caltechfaces.py file you seem to be converting the eye data into bounding boxes, and many datasets already come with bounding boxes. I am wondering if this would be fairly straightforward...

Why this is important to you?

There are lots of resources for this, here is a good place to start: https://www.face-rec.org/databases/ - in particular I think this might be a good dataset to use? http://www.cs.tau.ac.il/~wolf/ytfaces/

Why this is important to you?

Because it's important to understand the requirements we need to satisfy to train a new cascade. In this case, as long as the bounding boxes in the dataset that we find to train with maintain the same aspect ratio, we can more or less "plug" them in to picolrn in the same way that Nenand did, even skipping a few steps in caltechfaces.py (because we don't have to generate our own bounding boxes).

Thanks for the links. BTW some of them are broken or outdated. Now, since you mentioned that you have discussed with Nenand and he told that a few companies have already trained with frontal+profile faces, the obvious question is doesn't he by any chance obtained such a cascade file?

I haven't got any updates from you regarding my question. Do you know by any chance if such cascade files are somehow available? Or should I contact personally Nenad.

No they are not available as far as I understand as these are companies and they cannot share their intellectual property like that - these are things you would have to train yourself, but luckily Nenand has really great documentation and information on how to do that, all of the ingredients are available, you just gotta put the pieces together.