ua-parser / uap-core

The regex file necessary to build language ports of Browserscope's user agent parser.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handling of empty string replacement values differs between implementations

longzheng opened this issue · comments

I'm not sure if this is the right place to ask this question since it affects multiple implementations, but I thought it may be since the "ua-parser specification" is defined as part of this repository.

Background

There are certain user-agents where it is not possible to derive or infer the device family, brand or model from the user-agent alone.

In our application, we want to explicitly distinguish between user-agents we recognise but we don't know of the device family (which we want to output as an empty string), and user-agents we don't recognise (fallback to the "Other" value as defined in the specification).

We've observed a difference between the uap-ref-impl Javascript and uap-csharp C# implementations of ua-parser.

Even though the Javascript is described as the "reference" implementation, we're not sure if this is a bug or the intention of the specification since it is vague about how to handle a replacement value that is an empty string "".

The specification writes,

In case that no replacement for a match is given, the first match defines the family and the model. If a *_replacement string is specified it shall overwrite or replace the match.

One could interpret "no replacement" to mean either

  • undefined or null
  • undefined or null or empty string

Repro

The regex YAML is defined as

device_parsers:
  - regex: '(radiocomandroid)'
    brand_replacement: ''
    device_replacement: ''
    model_replacement: ''

Testing with the user-agent string of radiocomandroid

The Javascript implementation outputs

{
    "family": "radiocomandroid",
    "brand": null,
    "model": "radiocomandroid"
}

The C# implementation outputs

{
    "IsSpider": false,
    "Brand": "",
    "Family": "",
    "Model": ""
}

Side-by-side

Javascript .NET
Device family "radiocomandroid" ""
Device brand null ""
Device model "radiocomandroid" ""

Question

The JavaScript implementation uses a conditional (ternary) operator which will treat an empty string "" as false, which means it will fallback to the first match value.

https://github.com/ua-parser/uap-ref-impl/blob/f038cf5ddd7b9b52c724fe2b6c4d949d2ef8e6b8/lib/device.js#L24

The C# implementation uses a == null condition which will use an empty string "" as the final value.

https://github.com/ua-parser/uap-csharp/blob/master/UAParser/UAParser.cs#L523

Which of these implementations is the intent of the specification? Is an empty string considered "no replacement"?

Hi @longzeng,

The good news is, that this edge case you describe is not part of the ruleset in regexes.yaml.
So we have not yet came across it so far.

Historically the reference implementation was first then the specification was created.

With "no replacement" a missing "*._replacement" line is meant. So we do not refer to an empty string nor undefined nor null here.

As the intention of the project is to get information from the different rule I would not consider your sketched use case as relevant for us, such avoiding any discussion on individual parser implementation details.

For us the reference implementation backed by the specification together with the test-set is the relevant source for ensuring compatibility with the rule-set amonst the different implementations.

As you are referring to a custom rule which is not part of our rule-set here, I would like to encourage you to rethink your aproach with "empty strings". I beleive you can achieve your intented result by using a custom marker string.

E.g.

device_parsers:
  - regex: '(radiocomandroid)'
    brand_replacement: 'DETECTED_BUT_UNKNOWN'
    device_replacement: 'DETECTED_BUT_UNKNOWN'
    model_replacement: 'DETECTED_BUT_UNKNOWN'