mholt / timeliner

All your digital life on a single timeline, stored locally -- DEPRECATED, SEE TIMELINIZE (link below)

Home Page:https://timelinize.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Twitter: hitting rate limits - "429 Too Many Requests"

joonas-fi opened this issue · comments

I'm running $ timeliner -twitter-replies -twitter-retweets get-latest twitter/joonas_fi:

2019/11/30 16:36:03 [ERROR][twitter/joonas_fi] Getting latest:
	getting items from service:
	processing tweet from API:
	processing tweet 123:
	making item from tweet that this tweet (123) is in reply to (456):
	making item from tweet that this tweet (456) is in reply to (789):
	making item from tweet that this tweet (789) is in reply to (AAA):
	getting tweet that this tweet (AAA) is in reply to (BBB):
	HTTP error: https://api.twitter.com/1.1/statuses/show.json?id=BBB&tweet_mode=extended:
	429 Too Many Requests

(anonymization and line breaks added by me)

Timeliner configures its Twitter client's rate limit ("with some leeway") to 5 900/h. Bursting is disabled for Twitter, so it's 610ms between requests.

Hidden wrong detours in my thought process

Are all proper requests rate limited?

As you can see, Timeliner is digging some considerable reply chains. My first instinct was "are replies counted against the quota or are only my own tweets counted?". Upon further digging, rate limiter is implemented in http.RoundTripper level for a HTTP client, so that's not the issue. Nice approach BTW, I might use that idea later in my own projects! 👍

Upon making the ezhttp HTTP suggestion in my other PR I remembered there's a plain resp, err := http.Get(mediaURL) call in twitter/twitter.go that bypasses the rate limiting HTTP client. That is used for fetching media, and some (or most) media are fetched from https://pbs.twimg.com/... which is Twitter's domain - are those counted against the quota? Probably not, because if the quotas are tied to the user or the app (API key) and requests for that domain would not count against API quotas..

Does the ratelimiter work properly?

I had a hard time using the ratelimiter standalone, so I just plopped a fmt.Printf(".") in the RoundTrip() and watched the dots appear on the screen as Timeliner chugged along. The dots were appearing calmly so the ratelimiter is working.

What I think is the problem

twitter/api.go use three API endpoints:

Endpoint User limit / 15min App limit / 15min
/users/show.json 900 900
/statuses/show.json 900 900
/statuses/user_timeline.json 900 1500

Source for limits: https://developer.twitter.com/en/docs/basics/rate-limits

Timeliner's ratelimit is shared across all of those endpoints. Timeliner theoretically lets me do 1 475 reqs/15 min to /statuses/show.json - pushing it over the limit of 900. Now I don't know for real what the ratios of the endpoint call rates are, but if we were to avoid going over the limit with this current "all endpoints have same rate limit" -design, the limit should be re-calculated based on the 900 number.

Another thought: is the 1 500 correct even for user_timeline.json?

This depends on the authorization model, if Timeliner:

a) uses only the app's credentials to read public data and doesn't get authorization from the user
b) gets authorization from the user and operates on behalf of the user (I don't remember seeing any authorization screen, but that might be because I had an API key laying around which I had authorized way back)

A few quotes:

Rate limiting of the standard API is primarily on a per-user basis — or more accurately described, per user access token.

When using application-only authentication, rate limits are determined globally for the entire application. If a method allows for 15 requests per rate limit window, then it allows you to make 15 requests per window — on behalf of your application. This limit is considered completely separately from per-user limits.

Source: https://developer.twitter.com/en/docs/basics/rate-limiting

If I understand correctly, Timeliner is not using "When using application-only authentication", so shouldn't the limit be based on the 900 anyway?

I'm not sure of this. WDYT?

Workaround

This is not a serious issue, since after throttling I can wait a while and continue later.

Code suggestions

I don't know if you're interested in code suggestions, but a couple came to mind while kicking the tires:

  1. I came across timeliner.FakeCloser, just in case you're not aware there's a stdlib implementation for that: https://godoc.org/io/ioutil#NopCloser
  2. ratelimit.go: you're chucking empty structs (struct{}) on the token channel. Usually when a channel is only used for signalling, I've seen interface{} used so one can just chuck nil's down the channel. I'm not sure if it's more performant but I think it's more semantic. This might be subjective though.

Their docs were confusing, I took my best guess at the rate limit, guess I was a little high.

Thanks for the great issue and investigation btw! This was kind of a joy to read, despite the fact that it means there's a problem. 😅

Should we just lower the burst size to base it on the 900 count instead?

Thanks :)

Sorry for slow response.. I think lowering it to 900 is a good and easy improvement for now.

There might be more novel rate limiting approaches, but those will be more difficult than improving things right now by lowering the number. :)