Exiv2 / exiv2

Image metadata library and tools

Home Page:http://www.exiv2.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problematic use of std:regex on Windows (MSYS2 UCRT64)

Wormnest opened this issue · comments

Describe the bug

The use of std:regex in src/datasets.cpp in IptcKey::decomposeKey causes severe slowdowns (minutes) and the use of huge amounts of memory (gigabytes). This is called in GIMP's dev version when built on Windows under the UCRT64 profile of MSYS2 through the gexiv2 library (0.14.2).

To Reproduce

  1. On Windows, under MSYS2 UCRT64, install package mingw-w64-ucrt-x86_64-gimp3 (dev version 2.99.16) or self build current master.
  2. Start gimp-2.99 and open an image that has IPTC metadata, for testing I used e.g. https://github.com/psd-tools/psd-tools/blob/main/tests/psd_files/16bit5x5.psd.
  3. Observe the very long waiting time and with e.g. process explorer you can see huge amounts of memory being used.
    Steps to reproduce the behavior:

Expected behavior

Expected is quick loading and a reasonable amount of memory used.

Desktop (please complete the following information):

  • OS and version: Windows 10 Home, 64-bit
  • Exiv2 version and source: mingw-w64-ucrt-x86_64-exiv2 0.28.1-1 from the MSYS2 repository
  • Compiler and version: whatever is used by default on up-to-date MSYS2 UCRT64 repository
  • Compilation mode and/or compiler flags: Release

Additional context

This issue is discussed in this GIMP MR to evaluate moving from MINGW64 to UCRT64 on Windows.

As I describe there, I added print statements to verify where the slowdown occurred, which is on the following line in decomposeKey:

static const std::regex re(R"((\w+)(\.\w+){2})");

A reply in that issue suggests that std:regex on Windows is indeed buggy and should better not be used with references to gcc and Inkscape issues.

mhm. Do seem like a bigger issue at hand.

I wonder, if we could try and reduce the grammar in use a bit (do look like a sane thing to try) do to the following:

"By default, if no grammar is specified, ECMAScript is assumed. Only one grammar may be specified." [1]

I wonder if that could/would/should change anything.

[1] https://learn.microsoft.com/en-us/cpp/standard-library/regular-expressions-cpp?view=msvc-170

Also, yet, another interesting thread on GCC side of things: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723

commented

We may fix the issue in GCC/libstdc++ then. Taking in account the return value / errno in calls to strxfrm() should be enough.

and now it gets stuck on a different image with IPTC DateCreated/TimeCreated tags; looks like they also use regex.

Should this be reopened, or is a new issue better, or wait for a fix in libstdc++?

new issue probably. would be great to find which regex causes problems.