numbersprotocol / pyc2pa

Python implementation of C2PA: Coalition for Content Provenance and Authenticity.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Injection error

shc261392 opened this issue · comments

Image https://drive.google.com/file/d/17DgQCF-TOUEk9LaroHYa9OPaGhXw6soQ/view?usp=sharing

Image sha256sum: 55146f1665fa84fe2a76d13772f7f83ea02a188cde68a047cb9acd2e28005d90

$ git checkout 72735b
$ mv <downloaded-image> dog.jpeg
$ cp dog.jpeg dog-thumbnail.jpeg
$ python3 utils/starling_multiple_injection.py dog.jpeg                                                15:41:23  Traceback (most recent call last):  File "/Users/shc/numbers/github/starling-cai/utils/starling_multiple_injection.py", line 166, in <module>    starling = Starling(photo_bytes,
  File "/Users/shc/numbers/github/starling-cai/cai/starling.py", line 74, in __init__
    self.app11_headers = get_app11_marker_segment_headers(self.raw_bytes)
  File "/Users/shc/numbers/github/starling-cai/cai/jumbf.py", line 219, in get_app11_marker_segment_headers
    header['tbox']   = data_bytes[offset + 16 : offset + 20].decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 3: unexpected end of data

Root Cause

CAI module treats non-CAI data as CAI metadata and tries to parse it.

Analysis

Currently, the CAI module finds the CAI metadata (APP11 Marker Segments if more precisely) only by searching the 0xFFEB which represents the APP11 Marker.

Screenshot from 2021-05-14 17-42-42

Under this condition, any data identical to 0xFFEB will be treated as the beginning of a CAI metadata.

Solution

Method 1: Workaround (I will go in this way because of resource constraint)

Checking both the App11 Marker and the CI parameter can be a quick workaround. It's a workaround because it only reduces the probability to treat non-CAI data as CAI metadata.

Screenshot from 2021-05-14 17-44-36

Method 2: Root cause solution

To my best knowledge, to fix this issue completely, we need to find all the starting points of the Marker Segments between SOI and DQT and only parse the APP11 Marker Segments.

Screenshot from 2021-05-14 17-49-15

Testing image information

$ file scott.jpg 
scott.jpg: JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=12, height=3024, manufacturer=samsung, model=SM-N9810, orientation=upper-right, xresolution=210, yresolution=218, resolutionunit=2, software=N9810ZSU1ATI4, datetime=2020:10:24 15:03:33, width=4032], baseline, precision 8, 4032x3024, components 3

The workaround seems to work (although with known issue #15)

  • Raw photo
    • scott
  • Thumbnail (100x100)
    • scott-thumbnail
  • Multi-injection photo
    • scott-cai-cai-cai
$ sha256sum scott*
47e148074e9a3f658119c82e1a2e5aebb148a2a3864f6a1e4d1f58a4bd31a0ee  scott-cai-cai-cai.jpg
55146f1665fa84fe2a76d13772f7f83ea02a188cde68a047cb9acd2e28005d90  scott.jpg
bfd0c280dfa195a0e8468a0f0d1d6beecb652a70cf591c19187d0c5166cef6a8  scott-thumbnail.jpg