Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata fields, and in any "inputs" to plot JSONs

Question

Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata fields, and in any "inputs" to plot JSONs

fedarko opened this issue 6 years ago · comments

Apparently vega treats these specially. See this page for context.

This is causing a problem with the rank names in the Byrd data example -- trying to switch to a rank that isn't "Intercept" brings up an error.

I guess we have to apply this not only for each column but for every possible string that's passed to Vega: so every feature/sample ID, augmented feature ID, sample metadata, and probably more. sheesh.

ideally we should have tests that verify that our measures to protect against Vega interpreting things wrongly work (#2).

Note: you can escape these either with a ton of backslashes or by enclosing the field names in square brackets. The latter sounds easier.

Note: related to vega/vega-lite#4965

Note: should also ensure that field names (when passed into the plot JSONs, e.g. for things like setting an encoding field of the sample plot's color or setting the encoding field of the rank plot's y-axis) are escaped in JS via something like vega.stringValue().

Marcus Fedarko · Answer 1 · Thu Mar 07 2019 11:43:38 GMT+0800 (China Standard Time)

So I think that due to our use of json.dump(), we shouldn't have to worry about most of these aside from the Vega-Lite-specific ones (periods and brackets). But again, it's still a good idea to be sure.

Marcus Fedarko · Answer 2 · Thu Mar 07 2019 12:18:13 GMT+0800 (China Standard Time)

If we want to be 100% safe, we'll need to escape all of the following:

Rank IDs
Feature IDs
Sample IDs
Sample Metadata IDs
Feature Metadata IDs

In practice, I'm not sure that this is necessary for feature metadata IDs, feature IDs, or sample IDs (since I've used .s in these IDs before without issue). I think json.dump takes care of those -- the main issue seems with fields that end up being set as an axis/encoding/etc in Vega/Vega-Lite (e.g. ranks).

still worth adding lots of test cases that verify that this all works as intended.

Marcus Fedarko · Answer 3 · Thu Mar 07 2019 12:45:12 GMT+0800 (China Standard Time)

ahsdfiusdoifjsdfioj

so it looks like even if you escape a rank ID properly for the axis stuff, you still need to use the non-escaped ID in the underlying dataset???? bluhg

Marcus Fedarko · Answer 4 · Thu Mar 07 2019 12:57:26 GMT+0800 (China Standard Time)

@mortonjt small question: is preserving the patsy formulas in rank IDs (e.g. C(Timepoint, Treatment('F'))[T.B] in the Byrd data) helpful when looking at the ranks? It looks like periods, brackets, and quotes all cause problems when you pass them into Vega-Lite as field IDs.

I've implemented a basic solution that converts periods to colons and square brackets to parentheses (along with filtering out quotes and backslashes). This takes care of the problem for now, but if you think it's worth it I can come back to this later (probably after exams are over) and add back in support for some of these weird characters.

Marcus Fedarko · Answer 5 · Thu Mar 07 2019 15:10:15 GMT+0800 (China Standard Time)

note to self: if we go with the solution of filtering out/converting certain special characters in IDs, ensure that they're still unique afterwards.

Jamie Morton · Answer 6 · Thu Mar 07 2019 22:04:33 GMT+0800 (China Standard Time)

The special characters don't matter, but the covariates (I. E. Timepoint), treatment (I. E. F) and control (I. E. B) are all important.

…

On Wed, Mar 6, 2019, 11:57 PM Marcus Fedarko ***@***.***> wrote: @mortonjt <https://github.com/mortonjt> small question: is preserving the patsy formulas in rank IDs (e.g. C(Timepoint, Treatment('F'))[T.B] in the Byrd data) helpful when looking at the ranks? It looks like periods, brackets, and quotes all cause problems when you pass them into Vega-Lite as field Is. I've implemented a basic solution that converts periods to colons and square brackets to parentheses (along with filtering out quotes and backslashes). This takes care of the problem for now, but if you think it's worth it I can come back to this later (probably after exams are over) and add back in support for some of these weird characters. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#66 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD_a3QJSySl-eYD10nc7cWe7jmrgicejks5vUJw2gaJpZM4bYQGx> .