cashapp / AccessibilitySnapshot

Easy regression testing for iOS accessibility

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text-based strategy

akaDuality opened this issue · comments

Hi! Thank you for the great tool!

I would like to create text-based strategy that can be used to create non-fragile tests for VoiceOver. I described this idea in this discussion.

How do you think is it possible to create it as a part of your instrument above of core module? Or I should create the different one that use just core part?

What kind of difficulties do you see?

Hey Mikhail! Great question, this has come up a number of times.

I don't have access to the repo you linked to, can you provide some more background on your motivation behind using a text-based strategy? By "non-fragile" tests I assume you're referring to the problems related to GPU differences? We're working on some alternate image comparison methods and image processing techniques that should hopefully mitigate those issues.

The biggest thing that I find a text-based strategy misses is location verification, since you don't get the region highlighting that you get with the image. Of course, you can print out description of the accessibilityFrame and accessibilityActivationPoint, but this is difficult to visualize and easy to regress if your view layout changes. Representing the accessibilityPath is even more difficult, and I've found in recent iOS versions more UIKit views are defaulting to paths. Location is important for highlighting to the user which elements they're focused on as well as playing into interaction, since by default activating an element simulates a touch event at the accessibilityActivationPoint (unless you override accessibilityActivate()), so I worry that omitting it from the snapshot gives a false confidence that the accessibility is fully being regression tested when in reality there's a large component missing.

I've considered a hybrid solution where the snapshot is split between two files: an image with the region highlighting and a text file with the descriptions. My main concern with this is it makes code review a bit more difficult and the files could potentially get out of sync (which could be addressed by putting a hash of one file in the other, but I think there's still more developer friction with this).

I'd love to hear more about your use case and motivations for using a text-based strategy.

I understand that there are several downgrades in comparison to screenshot-based strategy, but is not replacement, just one another tool. They can have different approach: screenshots if great for components of design-system, but text representation can snapshot the whole screen. Advantages of text-based strategy in comparison to image: less size in repository, more stable (don't fail when UI's layout changed, but VoiceOver stays the same).

I'm working on VoiceOver Designer app that allows to design and prototype VoiceOver experience. I have an idea that text-based strategy can be created from the app.

Interesting, thanks for the context. That project looks awesome!

don't fail when UI's layout changed, but VoiceOver stays the same

In some ways this is a feature, not a bug. 🙂 Beyond the issues with locations getting out of sync that I mentioned earlier, layout changes might affect things like the expected order of iteration, so the system has no way of knowing whether your accessibility hierarchy didn't change with the layout because it's still correct or if they've now gotten out of sync and it's a regression.

I hear you on stability around minor changes (or simply GPU differences) though.

I have an idea that text-based strategy can be created from the app.

Test-driven development is a really interesting consideration here. I definitely see the value in exporting the reference from your VoiceOver Designer app (or hand-writing it) and developing against it.

I'm working on some reformatting right now to include more information in the snapshot (there's still a lot missing right now around grouping, some traits, etc.). Let me do some experimentation and see how that information might be able to be reflected in a text snapshot.

Thank you! I can help you with text-based strategy in the feature or another staff that can be connected with my app, I see great integrations in the feature between the app and your library.

Also, it will be great to have info about accessibility containers.

We explored text-based strategies a bit more internally and decided there's a lot of subtle information lost. Some of this is already in the thread above, but I'll summarize here:

  • The most obvious lost information is around where the elements are highlighted on the screen. This is really important for a number of assistive technologies such as Switch Control, since there's no description of the element read; and drag navigation with Voice Over, since having the region of the accessibility element cover the visuals is key to being able to find the element.

  • Activation points are another key visual component. In many cases it's important to match the accessibility element with a touch point in the visual experience. If the element's frame or activation point is off, you could have an element that appears correct from the description but doesn't actually perform any action (or even worse, performs the wrong action). This could very easily be missed in text form, but is much more obvious when you see an image.

  • There are also some relatively small visual components to the accessibility description, such as the image associated with a custom action (tracked by #99).

  • While the snapshots are often framed as "ensuring your accessibility hierarchy doesn't regress," it might be more accurate to frame as "ensuring your accessibility hierarchy doesn't get out of sync with your visuals." If the visual and accessibility snapshots were completely separate, it would be easy to add both snapshots, then come back at a later date (or a different engineer on the team, etc.), make a change that fails the visual snapshot test, update the reference image, and have no warning that you've caused the accessibility and visual experience to get out of sync. While the framework would technically be doing the correct thing here (your accessibility hierarchy hasn't changed, so it's correctly still passing the test), this isn't really protecting you from regressions in the way it appears it is.

For these reasons, I don't think a purely text-based snapshot strategy is the right approach for this framework. I'd love to find a way to integrate test-driven development with your VoiceOver Designer app, but I think it would need to be a different snapshot format - either image-based or a mix of image- and text-based snapshotting. Let's keep that discussion going separately, but I'm going to close out this ticket for now.