Consider casting to float32 by default in TableVectorizer

Question

GaelVaroquaux opened this issue 7 months ago · comments

Using float64 instead of float32 typically incurs compute and memory loads, and users do not have this in mind.

We should add an option to the TableVectorizer to output float32. We should consider whether this is the default.

N/A

N/A

Jérôme Dockès · Answer 1 · Tue Dec 19 2023 19:13:43 GMT+0800 (China Standard Time)

that also applies (maybe even more) to encoders, for example MinHash outputs float64

Gael Varoquaux · Answer 2 · Tue Dec 19 2023 23:31:57 GMT+0800 (China Standard Time)

that also applies (maybe even more) to encoders, for example MinHash outputs float64

Absolutely! Thanks for raising this. Maybe we should start there

Théo Jolivet · Answer 3 · Tue May 28 2024 23:32:12 GMT+0800 (China Standard Time)

Closing because this has been addressed in the postprocessing step of the TableVectorizer in #902