evanoberholster / imagemeta

Image Metadata (Exif and XMP) extraction for JPEG, HEIC, AVIF, TIFF and Camera Raw in golang. Focus is on providing features and improved performance.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Interesting work

soypat opened this issue Β· comments

I've written my own exif parser recently. Maybe we can consolidate work?
My library is quite simple- Keep in mind I learned about exif last week and all of it is quite new to me.

https://github.com/soypat/exif

Benchmarks show your library is around 2 times faster for parsingexif metadata.

BenchmarkThisPackage_SmallImage-12    	  156421	      7274 ns/op	    2512 B/op	      52 allocs/op
BenchmarkImageMeta_SmallImage-12      	  286161	      3742 ns/op	     921 B/op	       5 allocs/op

Hi! I just took a look at your library. It looks great, I would agree about collaborating on an exif library would be useful. This library focuses on coverage and performance.

I think Phil Harvey's exiftool is a great library and would be useful to have something similar in golang.

What ideas did you have?

So I primarily created my own library because I wanted to load images lazily onto disk. I work with hyperspectral images which are in the gigabytes of size and it'd be nice to implement the image.Image interface for them by caching segments of the image at a time- basically lazy load them.

My pipeline basically is as follows:

  • Read tens or hundreds of raw spectral images onto disk. These are all part of the same hyperspectral image (hypercube)
  • Do operations with images by combining spectral data between all of them to generate result images which are of value

Right now the process is dog slow, taking several minutes to process a single result hyperspectral image. I think if the images are cached smartly the process can be sped up by a lot.

My problem is that I do not know how to generically process the images. There's a pipeline for it but I do not understand how it was created. So I do not know how to process the images I have today based on their EXIF data i.e. implement the image.Image interface.

So there are my reasons for building the library.

As for having an exiftool in golang- I'm all for for it. Take what you need from my library!

What I'm particularily fond of from my library is that extracting all exif data and visualizing it is pretty easy and straightformward. I can't tell if your library only extracts only certain exif tags contained or all exif tags, judging by the exif2.Exif type.

I also think there's value in providing a simple implementation with no external dependencies which others can then build upon. As it stands most of other exif libraries have tons of imports and are inmensely complex and hard to maintain (thinking of the go-exif library). I'd like to have a library that is simple, performant and easy to use.

My need is to process tens of hundreds of images quickly, many of these images are RAW images and don't need to be fully read from disk.

I like your usage of image.Image interface, what filetype are your raw spectral images? I am not interested in reimplementing image types (ex: JPG, Tiff, CR2, PNG, NEF, etc.) but wanted to focus on exif and other metadata information.

I agree that a "ReadSeeker interface" is more efficient than loading the entire image into memory. In regards to the exif2.Exif type, I agree that it is limited. @mholt brought this up previously #49. The challenge that I have faced with using a map is that it significantly slows down processing. I believe most, if not all commonly used fields can be represented in a a more efficient way.

This library could improve on reducing dependencies. Some of the dependencies are for testing purposes but a more simple/straight forward library would be beneficial.

What interface methods would you suggest for simple, performant, and easy to use?

@abrande, @matrixik , and @mholt would appreciate any insight you have.

The output of my images is TIFF I believe. I'm not really sure. If I set the file extension to .tiff my file browser opens them correctly and all. The file header starts with 0xff 0xd8 0xff 0xe1 0x63 0xc6 0x45 0x78.

As for my suggestions, I am no perfomance engineer- I have little knowledge on performant data structures like trees and such. I'd be happy with a library that is not blazingly fast, just fast enough and a pleasure to use.

I wonder if the way I deal with the API would serve your use case. It does have more boilerplate but I'm afraid you would not be able to get around it without sacrificing flexibility or performance:

	xResTag, err := decoder.GetTag(fp, 0, exifid.XResolution)
	if err != nil {
		panic("resolution tag not found")
	}
	fmt.Printf("the x resolution is %v", xResTag.Value())

Theres a couple things to note about this API:

  • It allows searching for the same tag in multiple IFDs. Some tags may exist in more than a single IFD
  • "Future proof": On more additions to the EXIF spec users will still be able to use API
  • "Generic": Tags contain interface{} type, which hurts static typing but allows for tags containing any data type. easy to print out the data.
  • Some form of tag types: There are the Int, Float and Rational methods that return a static type or error upon type mismatch failure. So there is some typing, albeit somewhat weak.
  • Somewhat performant- Although not as fast as your library it is ""fast"" to some extent. Since I am doing some more work it does make sense this way would be slower (allocating slices for multiple IFDs, interface heap allocations, tag slices instead of static fields, two loops due to lazy decode nature). That said, the decoder implementation is nearing completion or already finished, it could get faster, but probably not slower.
  • Safe: Users can choose whether to decode tags or not depending on their size or ID. Users can avoid large tags to speed up or simply out of convenience

I do love the idea of combining your libraries!

In my use case, decoding and processing images is way more expensive than reading metadata. So while I value performance, squeezing out every possible allocation (as awesome as that is) is not the top priority for me, since image processing in my pipeline is way slower anyway. That said, a library that streams the input rather than loading the whole thing into memory is a must, at least. Which is why I like your packages so far.

What is also important for my use case is the ability to read any and all metadata fields. I'm ok with any types since exif documentation describes what the underlying type is, and reflection like fmt.Sprintf("%v") can turn any value into a string. However, I'd still use strongly typed helper functions like Float64() etc to get the values as a specific type efficiently, or an error if it's not that type. As well as helper functions to get common values that need a lot of special logic, like timestamp and coordinates.

A way to iterate all the data and also access any arbitrary fields are necessary for me.

Maybe it'd be possible to combine these features into a performance focused library with a note in the docs that using these may have more allocations? I'm personally ok with that for my needs.

@mholt It does sound like my library might fulfill your requirements for exif parsing. That said, I'm not sure if it covers all of your needs. It'd be great to have an experience report to know if it's worth merging the libraries. Maybe there's a fatal flaw in how I've approached the problem that is only known on real use.

To clarify- I think we could better make the decision on how to merge the libraries with more UX. As it stands I'm unsure on how to go about it...

Haven't been able to get back to replying. Below I have some suggestions for what the end user would work with. This can be done while retaining flexibility and performance. I think it would be important to agree on what would be the best User facing interface. The non-exported code can change but it would be important to agree on a simple API that can remain constant.

Helper values for GPS, Dates, and "etc" would still need to be defined as there are many types included in exif information.

Below are some suggestions.

// example
sr := io.ReadSeeker // any implementation of io.ReadSeeker
reader, err := exif.Decode(sr)
if err != nil {
	panic("exif decoder failed")
}
xResTag, err := reader.GetTag(0, exififd.XResolution) // GetTag(ifd Number, Tag ID)  Tag
if err != nil {
	panic("resolution tag not found")
}
fmt.Printf("the x resolution is %v", xResTag.Value()) // tag.Value() string

// Exif Tag
type Tag interface {
	Value() any             // returns tag value as specified by defaults
	Uint16() uint16         // reffered to Short
	Int16() int16           // reffered to SignedShort
	Uint32() uint32         // reffered to Long
	Int32() int32           // reffered to SignedLong
	Uint64() uint64         // refrered to as double
	Rational32() Rational32 // returns tag as Rational32
	Rational64() Rational64 // returns tag as Rational64
	String() string         // returns string value of tag
	Bytes() []byte          // returns original byte value of tag
	Valid() bool            // returns false if tag does not meet "valid parameters"
}

// Exif Rational
type Rational interface {
	Float32() float32
	Float64() float64
}

type Rational32 [2]uint32 // Similar to Exif Rational
type Rational64 [2]uint64 

Thanks for the suggestions.
Addendum: This is similar to what @soypat mentioned above.

Hey! Thanks for getting back and for providing a comprehensive list of suggestions for the API.
Looking good overall. I wonder if we need so many methods on the Tag interface- for example: Wouldn't a Int64 encompass all of the integer types in the Exif format? I mean like even if a tag is of type uint16 it could still be returned by the method Int64. It would make it simpler I feel.

Do try to test out my library and see if you get what I mean. Instead of having all the integer methods there is a single Int method, a Rational method and a Float method which could reduce the number of methods a user needs to work with exif data. So maybe that interface can be reduced to the following.

// Exif Tag
type Tag interface {
   Int() (int64,error) // from 8bit integers to 32 bit integers (I couldnt find examples of 64bit integers in EXIF)
   Float() (float64, error) // double or single precisions
   Rational() (Rational, error)
   Bytes() ([]byte, error) // can return both string or byte kind.
}

See code relevant to the Tag interface here.

Moreover I'd like you to see if the rational.Rational intertface I have devised suits your needs. I tried to make everything as simple as possible and easy to use and understand!

(I'm following along, and hope to have more time to respond later this week -- but don't wait up for me in the meantime πŸ˜… - discussion so far looks great)

It may take me some time to give an "experience report", as I'd need to refactor a significant part of my app to use the different API -- although I understand why that'd be helpful.

@evanoberholster Overall, I like that API -- and I do like @soypat's suggestion of fewer methods. For instance, I imagine float64 could be used for float32 values as well. Howeever, I would add a String() method since I'd rather only deal with []byte if it's binary data.

The String() method is currently used to implement the fmt.Stringer interface so users can easily pretty print it. You can still convert the []byte to string after a call to Bytes(). Another solution would be to think of a fitting name for the method.

I'll also mention in passing that I saw your conversation today with Dustin over at go-exif and went ahead and tried my hand at implementing a exif data offset searcher.

The String() method is currently used to implement the fmt.Stringer interface so users can easily pretty print it.

Is that different than just getting the value as a string? πŸ€” I feel like that's all I'd want a Stringer to do anyway...

You can still convert the []byte to string after a call to Bytes()

I know, it's just annoying πŸ™ƒ Not a big deal though if that's just how it has to be.

I'll also mention in passing that I saw your conversation today with Dustin over at go-exif and went ahead and tried my hand at implementing a exif data offset searcher.

Cool! That seems useful. Again, if it helps for information, my particular use case is to be able to load images, potentially very large ones, efficiently (streaming), and to extract as much of the metadata as possible. πŸ‘

Is that different than just getting the value as a string? thinking I feel like that's all I'd want a Stringer to do anyway...

The issue here is that a Tag is defined by the Exif ID and the data it contains so I'd like to print out a human readable representation of both fields. For the special case of wanting only the value's string representation I've created a new Describe method which would act as the String() method you describe. It will return a meaningful textual representation for numbers that have a special meaning (Exif enums).

efficiently (streaming),

Hmm- I wonder if the API would be comfy enough for this use case πŸ€”.

I imagine maybe the stream reader could be wrapped to implement the ReadAt method.

The issue here is that a Tag is defined by the Exif ID and the data it contains so I'd like to print out a human readable representation of both fields.

I see... is its output kind of like what fmt.Sprintf("%+v") does? If so, I'd probably prefer to use String() to get the data, but I can see the argument for printing the whole thing if you need that.

is its output kind of like what fmt.Sprintf("%+v") does?

Yes, more or less.

I'd probably prefer to use String() to get the data

Another issue I found is that if we want String to return the data we'd also need to return an error in case the tag is malformed or the data cannot be converted to String. Doing this would mean the Tag type could never implement the fmt.Stringer interface which I'd prefer avoiding if at all possible since it makes pretty-printing tags really easy.