[Splink 4] Find new matches can be simplified by creating a new linker
RobinL opened this issue · comments
This function is very complicated and relies on a lot of hacks to make it work.
I think it could be simplified by:
- creating a new linker in link only mode with two datasets
- computing the tf columns on the new records by joining to __splink__df_tf_with_concat
- somehow ensuring the linker knows that it's a link only, and __splink__df_tf_with_concat is the left dataset, and the new records are the right dataset
We can also get rid of this since the tf columns on the new records can be obtained by joining to __splink__df_tf_with_concat