Category column options do not match
irenewestra opened this issue · comments
Instance: iucn.akvolumen.org
Dataset: TOF_2021
I want to group options on the question "Participants were mainly from these sectors (option)" into a new Category Column, where options that are barely used are clubbed together into "Other". All other options (e.g. 'Academia') I want to keep as they are. However, when creating this column, a lot of options are put into Other that shouldn't be there (e.g. 'Academia').
Check: https://iucn.akvolumen.org/s/a1lTfjeaX44
As you'll see the option 'NGO' is (partially) clubbed under 'Other', whilst there is already a separate category for 'NGO'.
the problem seems to be related to using a subbucket column of option type in bar charts
the problem seems to be related to using a subbucket column of option type in bar charts
The visualisation is not the issue. The visualisation was meant to show the issue. The issue is that in the derived Category column Participants a lot of options are grouped into 'Other', that shouldn't be there. There is a separate category for 'Academica', so how is it possible that not all values == 'Academia' remain 'Academia', but some become 'Other'?
I spoke with Irene today to understand the issue better. Here are the steps she took and the expectation:
- She is working with a OPTION column, where more than one option (variable) has been selected. So the data in a cell holds one or more options
- When visualising the data, all works fine but there are many options that have not been selected enough to make it worth showing them as individual options. So she wants to group them under one category
- To do that she is using the Category derived transformation - exactly meant for such cases
- When she opens the column in the Category transformation the individual options show well
- She defines the new categories for these options, where some are the same and the rest are grouped into one new category called Other.
- Now she expects that in the newly derived column she will see the new categories matching the original values but with the change that if the cell had more than one option, the new column will also have that (respecting the new category transformation rule).
- And that this newly derived OPTION column with more than one category in the cell will behave the same way in visualisations as any other OPTION column does
Example
- Options available:
apple
,banana
,mango
,strawberry
,blueberry
,blackberry
- I want my new column to use
apple
,banana
,mango
, but group the others underberries
See this fake example below showing the original column and the new category values
original | new |
---|---|
apple, banana, strawberry | apple, banana, berries |
blueberry | berries |
blueberry, blackberry | berries, berries |
banana, blueberry, blackberry | banana, berries, berries |
Today we discussed this issue in a call, here are the notes:
- Juan tried to resolve it yesterday but the functionality actually works well - technically. It takes the values in each cell as a string and transforms them into a new category. So if the original string was
blueberry, blackberry
then Lumen will make it intoUncategorised
(or anything that the user defines. - The Category transformation cannot output an OPTION column
- Despite the implementation working correctly technically, it is confusing to users.
- We decided to not allow to select an OPTION column for Category column transformations to remove the possible confusion BUT I just realised that this way we will limit users with OPTION columns that are single select, so have one single value per cell, to be able to group values if needed. So we will not make changes to how Category columns are implemented
- We will see if we can support Irene's case with a Derive JS transformation that she can adapt to different columns and with adding the option to change the column data type to OPTION
closing by this comment #3114 (comment)
@tangrammer I am trying to understand the status of the tasks we set for this issue. Can you help me?
We said that you will create a Derived JS formula for Irene to use to transform her data to bundle the values she is not specifically interested in into a other
category. This is completed, right?
Then we said we will add the option to change the TEXT
column type to OPTION
so she can still visualise this newly created column using the Lumen's visualisation magic for OPTION
columns. Did we do this part as well?
I understand now how this works:
From Irene:
With the JS code provided by Juan, I can categorise multiple values and indicate "The new derived column is OPTION". However, with the current column ('Type of contribution' in this dataset https://iucn.akvolumen.org/dataset/60a28b40-6d5d-4fc2-b3cc-7848800eb5b8, Lumen doesn't recognise it (yet) as OPTION column and therefore I can also not 're-categorise' the values.
I will create a separate issue for this request