Rate limited too soon
mxmzb opened this issue Β· comments
I'm running the script locally like this:
yarn index example.com
β― yarn index example.com
π Processing site: sc-domain:example.com
π Found 1189 URLs in 2 sitemap
π¦ Batch 1 of 24 complete
π¦ Batch 2 of 24 complete
π¦ Batch 3 of 24 complete
π¦ Batch 4 of 24 complete
π¦ Batch 5 of 24 complete
π¦ Batch 6 of 24 complete
π¦ Batch 7 of 24 complete
π¦ Batch 8 of 24 complete
π¦ Batch 9 of 24 complete
π¦ Batch 10 of 24 complete
π¦ Batch 11 of 24 complete
π¦ Batch 12 of 24 complete
π¦ Batch 13 of 24 complete
π¦ Batch 14 of 24 complete
π¦ Batch 15 of 24 complete
π¦ Batch 16 of 24 complete
π¦ Batch 17 of 24 complete
π¦ Batch 18 of 24 complete
π¦ Batch 19 of 24 complete
π¦ Batch 20 of 24 complete
π¦ Batch 21 of 24 complete
π¦ Batch 22 of 24 complete
π¦ Batch 23 of 24 complete
π¦ Batch 24 of 24 complete
π Done, here's the status of all 1189 pages:
β’ β
Submitted and indexed: 410 pages
β’ π Crawled - currently not indexed: 151 pages
β’ π Discovered - currently not indexed: 2 pages
β’ π Page with redirect: 2 pages
β’ π¦ RateLimited: 506 pages
β’ β Server error (5xx): 9 pages
β’ β Alternate page with proper canonical tag: 1 pages
β’ β Duplicate, Google chose different canonical than user: 108 pages
β¨ Found 659 pages that can be indexed.
[... list of urls]
π Processing url: https://example.com/foo/bar
π Indexing already requested previously. It may take a few days for Google to process it.
π Processing url: https://example.com/foo/bar1
π¦ Rate limit exceeded, try again later.
The rate limit exceeds after only around 100-120 urls, and if I rerun it starts from start and again aborts on rate limit aroudn 100-120 urls, so I'm not able to request index for all the URLs that come later.
What am I doing wrong?
I think this was fixed by a recent PR, want to try again?
The cache isn't being written to when the url:s are being processed. When the rate limit is exceeded, the program exits with a cache full of "RateLimited". Then when you run it again, it starts from the beginning, and gets rate limited at the same place again.
I added this to index.ts:168 and it can now pick up where it left off:
statusPerUrl[url] = { status: Status.SubmittedAndIndexed, lastCheckedAt: new Date().toISOString() };
writeFileSync(cachePath, JSON.stringify(statusPerUrl, null, 2));
The cache isn't being written to when the url:s are being processed. When the rate limit is exceeded, the program exits with a cache full of "RateLimited". Then when you run it again, it starts from the beginning, and gets rate limited at the same place again.
I added this to index.ts:168 and it can now pick up where it left off:
statusPerUrl[url] = { status: Status.SubmittedAndIndexed, lastCheckedAt: new Date().toISOString() }; writeFileSync(cachePath, JSON.stringify(statusPerUrl, null, 2));
I'm not quite sure where you added this line. Did you replace something else with it or simply add it?
Also, could this be implemented in the lib itself in a PR, so everyone else could benefit from it?