Negative local segregation values in the decomposition into racial groups
kaisarea opened this issue · comments
Hello,
I have the following 'dataset' called local_data (trying to create a reproducible example here):
# A tibble: 14 x 3
SCHOOLID group count
<chr> <chr> <dbl>
1 100005_870 WHITE 669
2 100005_870 BLACK 12
3 100005_870 HISP 80
4 100005_870 AIAN 0
5 100005_870 ASIAN 2
6 100005_870 PACIFIC 16
7 100005_870 TR 25
8 100005_871 WHITE 703
9 100005_871 BLACK 12
10 100005_871 HISP 47
11 100005_871 AIAN 0
12 100005_871 ASIAN 2
13 100005_871 PACIFIC 0
14 100005_871 TR 27
Then I run:
mutual_local(local_data, "SCHOOLID", "group", weight = "count", wide = TRUE)
I get the following output:
group ls p
1: ASIAN 5.2951875 0.002507837
2: BLACK 3.5034280 0.015047022
3: HISP 1.8714444 0.079623824
4: PACIFIC 4.6020403 0.010031348
5: TR 2.7309779 0.032601881
6: WHITE -0.5422359 0.860188088
My question is how does one interpret negative values from the mutual_local() function? I actually even had all components being negative (I can try to create a reproducible example for that too if needed). What is the interpretation of a zero, positive, and negative values here?
Hi, thanks for the issue. The local segregation scores can't be negative, so you found a bug. The problem is that your variable is named "group", and the package doesn't deal well with that. If you use "race", for instance, the problem goes away:
library(tibble)
library(segregation)
options(scipen=5)
local_data = tribble(~SCHOOLID, ~race, ~count,
"100005_870", "WHITE", 669,
"100005_870", "BLACK", 12,
"100005_870", "HISP", 80,
"100005_870", "AIAN", 0,
"100005_870", "ASIAN", 2,
"100005_870", "PACIFIC", 16,
"100005_870", "TR", 25,
"100005_871", "WHITE", 703,
"100005_871", "BLACK", 12,
"100005_871", "HISP", 47,
"100005_871", "AIAN", 0,
"100005_871", "ASIAN", 2,
"100005_871", "PACIFIC", 0,
"100005_871", "TR", 27)
(mutual_local(local_data, "SCHOOLID", "race", weight = "count", wide = TRUE))
#> race ls p
#> 1: ASIAN 0.00003321619 0.002507837
#> 2: BLACK 0.00003321619 0.015047022
#> 3: HISP 0.03206493691 0.079623824
#> 4: PACIFIC 0.68502974604 0.010031348
#> 5: TR 0.00108653019 0.032601881
#> 6: WHITE 0.00054228911 0.860188088
Created on 2021-10-24 by the reprex package (v2.0.1)
I'll try to fix that issue soon.
It's working now, thank you!
Hello,
I have the same problem, but I can't resolve it whit the names changes. In my case, the problem arises when I use the "se" argument and the function make the bias corrections.
Here is the code:
library(tidyverse)
library(segregation)
base <- tribble(~ID_s, ~PRI, ~SEC, ~SUP,
1, 4, 4, 6,
2, 27, 34, 36,
3, 9, 15, 15,
4, 21, 33, 38,
5, 15, 23, 19,
6, 6, 8, 6,
7, 7, 14, 18,
8, 6, 8, 12,
9, 23, 34, 45,
10, 9, 16, 19
)
base |>
pivot_longer(cols = PRI:SUP, names_to = "EDU",
values_to = "n") |>
mutual_local(group = "EDU", unit = "ID_s",
weight = "n", se = T,
wide = T ) |>
select(ID_s, p, ls)
And this is my output:
ID_s p ls
1: 1 0.02539623 -0.072997966
2: 2 0.18315094 -0.002555072
3: 3 0.07269811 -0.024815143
4: 4 0.17362264 -0.010504312
5: 5 0.10986792 -0.004451141
6: 6 0.03701887 -0.019953732
7: 7 0.07281132 -0.010572342
8: 8 0.04958491 -0.036701383
9: 9 0.19315094 -0.004720493
10: 10 0.08269811 -0.017325337
The problem disappear when I select "se = F".
Thank you!
Hi, yes that can happen when your sample is small. Basically this means that your ls
scores are most likely exactly zero. I could probably just set them to 0 manually if this occurs, but I think this is probably more transparent. This is just something that can happen with the combination of bootstrap and bias correction when the parameters are close to 0. Maybe it would be good to have a FAQ entry about this, though.
Perfect. I did this manually but was not sure if it was correct.
Thank you for your response and your work with this package!