americanexpress / jest-image-snapshot

✨ Jest matcher for image comparisons. Most commonly used for visual regression testing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Advice for handling visual discrepancies between Intel and Apple Silicon images

ReDrUm opened this issue · comments

Does anyone have advice on how best to handle the visual discrepancies between an image produced on an Intel machine vs an Apple Silicon machine for VR testing?

I'm encountering differences in how drop shadows are rendered. Enough to trip up jest-image-snapshot unless i set the pixel tolerance excessively high.

My setup is running Chromium via Puppeteer on a multi-arch docker image running Debian 12. It seems like shadow differences are common

Has anyone tackled this problem successfully yet?

There are also discrepancies between macOS and Ubuntu.

The font weight rendered in Ubuntu is lighter compare to macOS. This causes the test to fail as I use macOS locally but the CI is based on Ubuntu.

Here is a comparison. Above is from macOS, below is from Ubuntu:

image

Hi, any updates for this issue? I also encounter discrepancies between Intel and Apple Silicon images and still have no idea how to handle this.

For me, what I ended up doing is to create different sets of snapshots for different OS/platform.

commented

I've just ended up setting a failureThreshold option. 😢

expect(screenshot).toMatchImageSnapshot({
  failureThreshold: 0.005,
  failureThresholdType: 'percent'
});

reference: The LogRocket Blog Article

I run the tests in an Ubuntu container using Docker for deterministic results.

Dockerfile:

FROM mcr.microsoft.com/playwright:v1.35.1-jammy

# Set the work directory for the script that follows
WORKDIR /test

# Copy visual-testing package.json
COPY package.json ./

# Install dependencies
RUN yarn

# Copy current source directory
COPY . .

My yarn scripts:

  "scripts": {
    "test": "yarn stop && docker run --name visual-testing --network host --add-host=host.docker.internal:host-gateway -v ${PWD}/baseline-snapshots:/test/baseline-snapshots -v ${PWD}/failure-diffs:/test/failure-diffs --rm visual-testing yarn container:wait-then-test",
    "container:run-test": "yarn test-storybook --stories-json --ci --url http://host.docker.internal:6006",
    "container:wait-then-test": "yarn container:wait-for-storybook && yarn container:run-test",
    "container:wait-for-storybook": "yarn wait-on -i 5000 -t 600000 http://host.docker.internal:6006"
  },

We've opted for this pragmatic approach:

if (process.platform === 'darwin') {
  expect(pngBuffer).toMatchImageSnapshot({
    failureThreshold: 0.00009,
    failureThresholdType: 'percent'
  });
} else {
  expect(pngBuffer).toMatchImageSnapshot();
}

Not ideal, but since CI isn't darwin nothing should slip through undetected.

https://github.com/justland/just-web-react/actions/runs/5792587155/job/15699083890

Expected image to match or be a close match to snapshot but was 0.06825086805555555% different from snapshot (629 differing pixels).

I have this case where the snapshot generated locally from ubuntu in WSL and the one in the CI doesn't match.

process.platform are both linux in this case.... 🤷

For me, what I ended up doing is to create different sets of snapshots for different OS/platform.

How did you do that?

How did you do that?

I do this:

customSnapshotsDir: `${process.cwd()}/__snapshots__/${process.platform}`

The rendering discrepancy has caused text layout to shift and wrap depending on if I run the tests locally (Apple silicon) vs in an Ubuntu CI pipeline that the difference between images is greater than 30%. This makes setting a custom failure threshold unhelpful as it will not catch smaller changes anymore, which IMO is the whole point of image snapshot testing.

I am attempting to run them locally in a docker container now, but the issue I run into is that it maxes out the CPU and causes tests to start timing out (even with a generous test timeout).