haifengl / smile

Statistical Machine Intelligence & Learning Engine

Home Page:https://haifengl.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is the efficient way to fill null values in a column with an arbitrary string in a Dataframe?

jamalromero opened this issue · comments

As we know many data sources have missing values. After reading the data source (csv file for example), is there a way to fill in missing entries in the DataFrame with an arbitrary value. As a comparison with Python Pandas DataFrame we can just call dataframe['some_column_name'].fillna('Missing')
Is that possible? Also, is there a forum or a user group for discussions available where we can post questions like these?
Thanks

There are several algorithms to handle missing values in package smile.feature.imputation. SimpleImputer may be used to fill a fixed value. I would suggest trying other advanced algorithms in the package too.

For simplicity, I will add some methods like fillna to Vector classes.

Feel free to ask questions by creating tickets.

I added DataFrame.fillna() that applies on FloatVector and DoubleVector.