matthieugomez / statar

R package for data manipulation — inspired by Stata's API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

String handling and truncation in tab()

gvelasq opened this issue · comments

  1. Using the CO2 dataset included in the datasets package, the values of factor variables Type and Treatment are displayed as integers, and the colname for variable Treatment is truncated to Treatmen by tab(). I'd suggest displaying the factors as strings here.
> library(statar)
> tail(CO2)
   Plant        Type Treatment conc uptake
79   Mc3 Mississippi   chilled  175   18.0
80   Mc3 Mississippi   chilled  250   17.9
81   Mc3 Mississippi   chilled  350   17.9
82   Mc3 Mississippi   chilled  500   17.9
83   Mc3 Mississippi   chilled  675   18.9
84   Mc3 Mississippi   chilled 1000   19.9
> tab(CO2, Type, Treatment)
 
    TypeTreatmenFreq.  Percent     Cum. 
─────────┼──────────┼────────────────────────────
       1121    25.00    25.00 
       1221    25.00    50.00 
-----------------------------------------------
       2121    25.00    75.00 
       2221    25.00   100.00 
  1. Using the sample tribble below, there is no truncation with the character variable stringvar (but there is truncation of the colname for integer variable numvar_longvarname). I'd suggest right-justifying the strings and increasing the maximum width limit for long colnames (or, see below, wrap the long colname to fit your max width as Stata does):
> library(statar)
> df <-
+   tibble::tribble(
+                                  ~stringvar, ~numvar_longvarname,
+                         "Lorem ipsum dolor",                  1L,
+    "sit amet, consectetur adipiscing elit,",                  2L,
+                     "sed do eiusmod tempor",                  3L
+   )
> tab(df, stringvar)
 
                             stringvarFreq.  Percent     Cum. 
───────────────────────────────────────┼────────────────────────────
Lorem ipsum dolor1    33.33    33.33 
sed do eiusmod tempor1    33.33    66.67 
sit amet, consectetur adipiscing elit, │        1    33.33   100.00 
> tab(df, numvar_longvarname)
 
numvar_lFreq.  Percent     Cum. 
─────────┼────────────────────────────
       11    33.33    33.33 
       21    33.33    66.67 
       31    33.33   100.00

For comparison, here is how Stata behaves (it seems to allow wrapping of varnames to a maximum of 3 rows):

. tab stringvar

                             stringvar |      Freq.     Percent        Cum.
---------------------------------------+-----------------------------------
                     Lorem ipsum dolor |          1       33.33       33.33
                 sed do eiusmod tempor |          1       33.33       66.67
sit amet, consectetur adipiscing elit, |          1       33.33      100.00
---------------------------------------+-----------------------------------
                                 Total |          3      100.00

. tab numvar_longvarname 

numvar_long |
    varname |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          1       33.33       33.33
          2 |          1       33.33       66.67
          3 |          1       33.33      100.00
------------+-----------------------------------
      Total |          3      100.00

. rename numvar_longvarname numvar_longvarname_longervarname

. ta numvar_longvarname_longervarname 

numvar_long |
varname_lon |
 gervarname |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          1       33.33       33.33
          2 |          1       33.33       66.67
          3 |          1       33.33      100.00
------------+-----------------------------------
      Total |          3      100.00

Thanks a lot. Correction + tests in 80ac32a