This project is a very simple demonstration of image DLP System, it works for text content in images and for image identifiers.
The first feature uses Tesseract OCR to locate words in the image, with a simple algorithm we construct sentences with the found words and apply Perl compliant Regular expressions to search for matches with the DLP rules.
Assembled text: " AM Android is the world's most popular mobile platform. With Android you can use all the Google apps you know and love, plus there are more than 600,000 apps and games available on Google Play to keep you entertained, alongside millions of songs and books, and thousands of movies. Android devices are already smart, and will only get smarter, with new features you won't find on any other platform, letting you focus on what's important and putting you in control of your mobile experience."
Assembled text: " A . Teste <Hey, im leaking this secret info about Joshep jhonson, his secret id is abcd-1a2b3-uu77"
Note: there is a limitation with dark on dark text
This feature allows us to search an image inside other images, this can be used to find marked documents with special markers or just by some common item.
In the result below you will see examples of detection of distorted, skewed, and rotated markers, this was all made using an algorithm called SURF.