Ensure against duplicate favorites in the database
O-I opened this issue · comments
The validation for uniqueness of tweet_id
was added after the pluck_all
rake task was run in production. The result is that the last tweet saved from each iterative batch call to the Twitter API is also the first tweet saved in the subsequent call. Essentially, every 200th tweet plucked with pluck_all
is duplicated. Note that, it isn't precisely every 200th tweet due to the chronological error I mention in issue #1. To fix:
- Write a rake task to remove duplicates from the current database
- Change line 12 of
faves.rake
to setmax_id
to one less than the current minimum. Something like this should do (but check):
options[:max_id] = faves.map(&:id).min - 1 unless faves.map(&:id).min.nil?
Fixed. Implemented a remove_dupluckates
rake task, amended pluck_all
, and added uniqueness constraint on tweet_id
at database level.