oduwsdl / hypercane

A toolkit for developing algorithms that sample mementos from a web archive collection.

Home Page:https://oduwsdl.github.io/hypercane

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update Image Report to Score Images From Metadata Higher

shawnmjones opened this issue · comments

The current scores produced by hc report image-data are not as effective as they could be. Humans may have already supplied their desired striking images in the metadata of the web pages making up the collection.

Hypercane's existing image scoring function in hypercane/report/imagedata.py:rank_images currently adds image properties to a list on lines 143 - 152

imageranking.append(
(
score,
pixelsize,
colorcount,
1 / ratio,
noverN,
image_urim
)
)

Add another column to the left containing values of 1 or 0. If Hypercane discovers the image in the metadata, set this column to 1 otherwise 0. This way, when the sorting occurs on line 154, all images discovered in the metadata will exist at the highest ranks in the list and then will be sorted by their MementoEmbed score.