Negative local segregation values in the decomposition into racial groups

Question

Negative local segregation values in the decomposition into racial groups

kaisarea opened this issue 4 years ago · comments

Hello,
I have the following 'dataset' called local_data (trying to create a reproducible example here):

# A tibble: 14 x 3
   SCHOOLID   group   count
   <chr>      <chr>   <dbl>
 1 100005_870 WHITE     669
 2 100005_870 BLACK      12
 3 100005_870 HISP       80
 4 100005_870 AIAN        0
 5 100005_870 ASIAN       2
 6 100005_870 PACIFIC    16
 7 100005_870 TR         25
 8 100005_871 WHITE     703
 9 100005_871 BLACK      12
10 100005_871 HISP       47
11 100005_871 AIAN        0
12 100005_871 ASIAN       2
13 100005_871 PACIFIC     0
14 100005_871 TR         27

Then I run:

mutual_local(local_data, "SCHOOLID", "group", weight = "count", wide = TRUE)
I get the following output:

     group         ls           p
1:   ASIAN  5.2951875 0.002507837
2:   BLACK  3.5034280 0.015047022
3:    HISP  1.8714444 0.079623824
4: PACIFIC  4.6020403 0.010031348
5:      TR  2.7309779 0.032601881
6:   WHITE -0.5422359 0.860188088

My question is how does one interpret negative values from the mutual_local() function? I actually even had all components being negative (I can try to create a reproducible example for that too if needed). What is the interpretation of a zero, positive, and negative values here?

Benjamin Elbers · Answer 1 · Sun Oct 24 2021 22:32:22 GMT+0800 (China Standard Time)

Hi, thanks for the issue. The local segregation scores can't be negative, so you found a bug. The problem is that your variable is named "group", and the package doesn't deal well with that. If you use "race", for instance, the problem goes away:

library(tibble)
library(segregation)
options(scipen=5)

local_data = tribble(~SCHOOLID, ~race, ~count,
"100005_870", "WHITE",     669,
"100005_870", "BLACK",      12,
"100005_870", "HISP",       80,
"100005_870", "AIAN",        0,
"100005_870", "ASIAN",       2,
"100005_870", "PACIFIC",    16,
"100005_870", "TR",         25,
"100005_871", "WHITE",     703,
"100005_871", "BLACK",      12,
"100005_871", "HISP",       47,
"100005_871", "AIAN",        0,
"100005_871", "ASIAN",       2,
"100005_871", "PACIFIC",     0,
"100005_871", "TR",         27)

(mutual_local(local_data, "SCHOOLID", "race", weight = "count", wide = TRUE))
#>       race            ls           p
#> 1:   ASIAN 0.00003321619 0.002507837
#> 2:   BLACK 0.00003321619 0.015047022
#> 3:    HISP 0.03206493691 0.079623824
#> 4: PACIFIC 0.68502974604 0.010031348
#> 5:      TR 0.00108653019 0.032601881
#> 6:   WHITE 0.00054228911 0.860188088

^{Created on 2021-10-24 by the reprex package (v2.0.1)}

I'll try to fix that issue soon.

Nail Hassairi · Answer 2 · Mon Oct 25 2021 06:37:42 GMT+0800 (China Standard Time)

It's working now, thank you!

Pablo Serrati · Answer 3 · Fri Nov 19 2021 03:51:35 GMT+0800 (China Standard Time)

Hello,
I have the same problem, but I can't resolve it whit the names changes. In my case, the problem arises when I use the "se" argument and the function make the bias corrections.
Here is the code:

library(tidyverse)
library(segregation)
base  <- tribble(~ID_s, ~PRI, ~SEC, ~SUP,
                 1,     4,     4,     6,
                 2,    27,    34,    36,
                 3,     9,    15,    15,
                 4,    21,    33,    38,
                 5,    15,    23,    19,
                 6,     6,     8,     6,
                 7,     7,    14,    18,
                 8,     6,     8,    12,
                 9,    23,    34,    45,
                 10,    9,    16,    19
                 )
base |> 
  pivot_longer(cols = PRI:SUP, names_to = "EDU", 
               values_to = "n") |> 
  mutual_local(group = "EDU", unit = "ID_s",
               weight = "n", se = T, 
               wide = T ) |>
   select(ID_s, p, ls)

And this is my output:

    ID_s            p              ls
1:    1     0.02539623     -0.072997966
2:    2     0.18315094     -0.002555072
3:    3     0.07269811     -0.024815143
4:    4     0.17362264     -0.010504312
5:    5     0.10986792     -0.004451141
6:    6     0.03701887     -0.019953732
7:    7     0.07281132     -0.010572342
8:    8     0.04958491     -0.036701383
9:    9     0.19315094     -0.004720493
10:   10   0.08269811     -0.017325337

The problem disappear when I select "se = F".

Thank you!

Benjamin Elbers · Answer 4 · Fri Nov 19 2021 05:03:31 GMT+0800 (China Standard Time)

Hi, yes that can happen when your sample is small. Basically this means that your ls scores are most likely exactly zero. I could probably just set them to 0 manually if this occurs, but I think this is probably more transparent. This is just something that can happen with the combination of bootstrap and bias correction when the parameters are close to 0. Maybe it would be good to have a FAQ entry about this, though.

Pablo Serrati · Answer 5 · Fri Nov 19 2021 05:09:46 GMT+0800 (China Standard Time)

Perfect. I did this manually but was not sure if it was correct.
Thank you for your response and your work with this package!