NA is still converted to NaN even in a string column
indeedhat opened this issue · comments
indeedhat commented
Issue
It doesn't seem to matter what the column type is in the data frame NA will always get converted to NaN.
Expected Behavior
NA would only be converted to NaN in int
and float
columns
Actual Behaviour
NA is converted to NaN regardless of the type
Example
Take this modified version of some data from the README
package main
import (
"fmt"
"strings"
"github.com/go-gota/gota/dataframe"
"github.com/go-gota/gota/series"
)
const ExampleData = `
Country,Region,Date,Age,Amount,Id
"United States",US,2012-02-01,50,112.1,01234
"United States",US,2012-02-01,32,321.31,54320
"United Kingdom",GB,2012-02-01,17,18.2,12345
"United States",NA,2012-02-01,32,321.31,54320
"United States","NA",2012-02-01,17,321.31,54320
"United Kingdom",GB,2012-02-01,NA,18.2,12345
"United States",NA,2012-02-01,32,321.31,54320
Spain,EU,2012-02-01,66,555.42,00241
`
func main() {
frame := dataframe.ReadCSV(
strings.NewReader(ExampleData),
dataframe.WithTypes(map[string]series.Type{
"Age": series.String,
}),
)
fmt.Println(frame)
}
which produces the following output
[8x6] DataFrame
Country Region Date Age Amount Id
0: United States US 2012-02-01 50 112.100000 1234
1: United States US 2012-02-01 32 321.310000 54320
2: United Kingdom GB 2012-02-01 17 18.200000 12345
3: United States NaN 2012-02-01 32 321.310000 54320
4: United States NaN 2012-02-01 17 321.310000 54320
5: United Kingdom GB 2012-02-01 NaN 18.200000 12345
6: United States NaN 2012-02-01 32 321.310000 54320
7: Spain EU 2012-02-01 66 555.420000 241
<string> <string> <string> <string> <float> <int>
In both colums:
Region
with an implicit string typeAge
where i have explicitly set the type to string
The string value NA is converted
Paladin R. Liu commented
This problem is due to the ReadCSV function converting the "NA" string to "NaN" before we assign a data type to the column, it should be tagged as a bug.
I'm trying to make a patch for this issue.
indeedhat commented
Much appreciated, i look forward to this being merged :)