Handling of empty string replacement values differs between implementations
longzheng opened this issue · comments
I'm not sure if this is the right place to ask this question since it affects multiple implementations, but I thought it may be since the "ua-parser specification" is defined as part of this repository.
Background
There are certain user-agents where it is not possible to derive or infer the device family, brand or model from the user-agent alone.
In our application, we want to explicitly distinguish between user-agents we recognise but we don't know of the device family (which we want to output as an empty string), and user-agents we don't recognise (fallback to the "Other" value as defined in the specification).
We've observed a difference between the uap-ref-impl
Javascript and uap-csharp
C# implementations of ua-parser.
Even though the Javascript is described as the "reference" implementation, we're not sure if this is a bug or the intention of the specification since it is vague about how to handle a replacement value that is an empty string ""
.
The specification writes,
In case that no replacement for a match is given, the first match defines the family and the model. If a *_replacement string is specified it shall overwrite or replace the match.
One could interpret "no replacement" to mean either
undefined
ornull
undefined
ornull
or empty string
Repro
The regex YAML is defined as
device_parsers:
- regex: '(radiocomandroid)'
brand_replacement: ''
device_replacement: ''
model_replacement: ''
Testing with the user-agent string of radiocomandroid
The Javascript implementation outputs
{
"family": "radiocomandroid",
"brand": null,
"model": "radiocomandroid"
}
The C# implementation outputs
{
"IsSpider": false,
"Brand": "",
"Family": "",
"Model": ""
}
Side-by-side
Javascript | .NET | |
---|---|---|
Device family | "radiocomandroid" |
"" |
Device brand | null |
"" |
Device model | "radiocomandroid" |
"" |
Question
The JavaScript implementation uses a conditional (ternary) operator which will treat an empty string ""
as false, which means it will fallback to the first match value.
The C# implementation uses a == null
condition which will use an empty string ""
as the final value.
https://github.com/ua-parser/uap-csharp/blob/master/UAParser/UAParser.cs#L523
Which of these implementations is the intent of the specification? Is an empty string considered "no replacement"?
Hi @longzeng,
The good news is, that this edge case you describe is not part of the ruleset in regexes.yaml
.
So we have not yet came across it so far.
Historically the reference implementation was first then the specification was created.
With "no replacement" a missing "*._replacement" line is meant. So we do not refer to an empty string nor undefined
nor null
here.
As the intention of the project is to get information from the different rule I would not consider your sketched use case as relevant for us, such avoiding any discussion on individual parser implementation details.
For us the reference implementation backed by the specification together with the test-set is the relevant source for ensuring compatibility with the rule-set amonst the different implementations.
As you are referring to a custom rule which is not part of our rule-set here, I would like to encourage you to rethink your aproach with "empty strings". I beleive you can achieve your intented result by using a custom marker string.
E.g.
device_parsers:
- regex: '(radiocomandroid)'
brand_replacement: 'DETECTED_BUT_UNKNOWN'
device_replacement: 'DETECTED_BUT_UNKNOWN'
model_replacement: 'DETECTED_BUT_UNKNOWN'