JustAnotherArchivist / snscrape

A social networking service scraper in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

All Twitter scrapes are failing: `blocked (404)`

JustAnotherArchivist opened this issue · comments

With the exception of twitter-trends, all Twitter scrapes are failing since sometime in the past hour. This is likely connected to Twitter as a whole getting locked behind a login wall since earlier today. There is no known workaround at this time, and it's not known whether this will be fixable.

So sad :-(
My research project is strongly related to this lib, and pay tribute to your effort in maintaining this.

Twitter disabled their public web site today (2023-06-30) and require users to login, twitter used to be public prior to this date. Would it be possible to automate the login as well providing a username and pw to snscrape, i.e. before calling a graphql api to login to twitter and simulate a logged-in session?

I do not think the developer would do this, as he said that auth would never be added into features: see #270 .
Let's see what our great developers' solution, hope it would not take long.

Before using this library, I had started doing manual scrapping myself using Puppeteer and I had automated the sign in part (even through 2FA). The issue is that if you frequently sign in in a small period of time you get blocked by Twitter and you cannot sign in again for a certain amount of time. So I'm not sure what the ideal setup would be in this case...

If this comment is off-topic, please consider deleting it. Uh. It was mentioning Twitter failing in this regard, not you. btw.

Please consider deleting my prior off-topic comment.

Don't nuke this one as off-topic: A Twitter employee says it's temporary:

https://twitter.com/AqueelMiq/status/1674843555486134272
"this is a temporary restriction, we will re-enable logged out twitter access in the near future"

can i use my personal oauth key to twitter snscrape ?

Elon talked about it too 💀 https://twitter.com/elonmusk/status/1674942336583757825

Musk referred to EXTREME scraping, indicating that scrapers may no longer be functional post changes. Let's see how it is done.

can i edited the "twitter.py" modules w/ my own bearer key or event oauth login key? (locally, at my computer when i installed snscraper module) since it change to my local snscraper module ? thanks
image_2023-07-01_153433286

Hello,

This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com):
https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505

Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web/*/https://twitter.com/tesla/status*)

If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.

Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated

Ben

Hello,

This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505

Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)

If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.

Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated

Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

This seems to be working, the problem might be the rate limit and stability, more tests are needed.

It does not allow you to see all the followed by a user either, would there be a solution for that? they help me?

https://twitter.com/elonmusk/status/1675187969420828672

😂

@ElonMusk
To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:

  • Verified accounts are limited to reading 6000 posts/day
  • Unverified accounts to 600 posts/day
  • New unverified accounts to 300/day
commented

My IP was banned although I was using a proxy that change the IP dynamically, what options we have now?

@JustAnotherArchivist Are the scrapers working anytime soon? Also, I want to thank you for your hard work on these scrapers.

hi guys im new to github and coding but maybe this is helpful

https://twitter.com/iam4x/status/1675194767854956546?s=20

hi guys im new to github and coding but maybe this is helpful

https://twitter.com/iam4x/status/1675194767854956546?s=20

This doesn't work since a long time ago.

what about using Selenium first to make a login after that use Sntwitter to get tweets?
the question here is how can link between Selenium session with Sntwitter?

hi guys im new to github and coding but maybe this is helpful
https://twitter.com/iam4x/status/1675194767854956546?s=20

This doesn't work since a long time ago.

lol this seems to be working,
na never mind, besides it was fun for some minutes, it messes up the rest of the features so no lol after all

commented

what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?

The beauty ofsnscrapeis that it doesn't require authentication, if we're going to have to start using login/auth and tools like Selenium then it should be spun off into another project and not snscrape. Also using any form of auth gives twitter another way to ban mass collection which is the use case for many users of snscrape.

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

Hi! :)) It works great! Is there perhaps any way to scrape repost and comment data as well? I need a mapping of twitt spread for my master thesis, but what companies are doing lately with their API (like Twitter or Reddit) is terrible....

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result
CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

Hi! :)) It works great! Is there perhaps any way to scrape repost and comment data as well? I need a mapping of twitt spread for my master thesis, but what companies are doing lately with their API (like Twitter or Reddit) is terrible....

You are describing my situation now I need the comments for the same purpose please let me know when you find a solution my submission in September

what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?

The beauty ofsnscrapeis that it doesn't require authentication, if we're going to have to start using login/auth and tools like Selenium then it should be spun off into another project and not snscrape. Also using any form of auth gives twitter another way to ban mass collection which is the use case for many users of snscrape.

So you would rather have it completely stop working for all other use cases as well?

Would be great if snscrape would add a new function like TwitterProfileScraperSyn that grabs the tweet data from the still publicly available syndication profile feeds. The sny feed shows 20 tweets with is good for many applications.

Insomnia

Great!

Is there any other param I can put in querystring except the tweet id?
I want to get tweets for specific users, but can't find what params should I use.

  • Auth will not be added, as has been mentioned at least twice now.
  • Yes, if the syndication feeds are the only remaining option, I will switch to that or add a separate scraper for them. The thing is that I don't want to have to (read: don't have time to) change everything again in two days when Elon has another one of his brilliant ideas, so I'm waiting for the dust to settle down a bit.
commented

what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?

The beauty ofsnscrapeis that it doesn't require authentication, if we're going to have to start using login/auth and tools like Selenium then it should be spun off into another project and not snscrape. Also using any form of auth gives twitter another way to ban mass collection which is the use case for many users of snscrape.

So you would rather have it completely stop working for all other use cases as well?

Yes (for twitter), and I expressed why and so has JustAnotherArchivist#issuecomment-1616774736 / #270

May I please ask how we can have a specific user's tweets from the start time to the end time for now? Really in a hurry and currently have no clues....

And this one seems to have no params for screen name? Do we have other urls?
https://cdn.syndication.twimg.com/tweet-result

Thank you for all your help, and many great praise to the author @JustAnotherArchivist

Broke by Musk

i hope a solution would be found soon i really need this libs its for my final studies project otherwise i could fail...

Does anyone know if someone's working on a snscraper fork that implements login/auth for Twitter?

Really appreciate your work, JustAnotherArchivist, thank you for all you do. Hoping Elon pulls back some of the restriction and we can have snscrape working as original! Best wishes

@pleblira this library uses the SNScrape classes for User and Tweet and supports auth
https://github.com/vladkens/twscrape

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

What's the URL to see the user profile? Sorry if it's a dumb question, but I could not find any reference on the net.

@nbrahmani You can try

https://syndication.twitter.com/srv/timeline-profile/screen-name/[username]

For example, https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk

User info will be stored inside the <script id="__NEXT_DATA__> tag. The tag itself is server-side rendered so you can use requests with BeautifulSoup (assume using Python) to extract the data you need. You can get the user profile and up to 20 most recent tweets from that user.

Unfortunately this endpoint has been dead since 16, 17 hours ago.

@nickchen120235

I tried this, but I need the twitter blue status of a user, and this does not return that.

@nickchen120235

I tried this, but I need the twitter blue status of a user, and this does not return that.

There's a boolean is_blue_verified or something similar in the user key iirc. Maybe that's what you need?

@nickchen120235
I tried this, but I need the twitter blue status of a user, and this does not return that.

There's a boolean is_blue_verified or something similar in the user key iirc. Maybe that's what you need?

As far as I can see, it does not have that boolean. I get the following output:

{"props":{"pageProps":{"contextProvider":{"features":{},"scribeData":{"client_version":null,"dnt":false,"widget_id":"embed-0","widget_origin":"","widget_frame":"","widget_partner":"","widget_site_screen_name":"","widget_site_user_id":"","widget_creator_screen_name":"","widget_creator_user_id":"","widget_iframe_version":"bb06567:1687853948269","widget_data_source":"screen-name:elonmusk","session_id":""},"messengerContext":{"embedId":"embed-0"},"hasResults":true,"lang":"en","theme":"light"},"lang":"en","maxHeight":null,"showHeader":true,"hideBorder":false,"hideFooter":false,"hideScrollBar":false,"transparent":false,"timeline":{"entries":[]},"headerProps":{"screenName":"elonmusk"}},"__N_SSP":true},"page":"/timeline-profile/screen-name/[screenName]","query":{"screenName":"elonmusk"},"buildId":"vn5fUacsNpP-nIkFRlFf6","assetPrefix":"https://platform.twitter.com","isFallback":false,"gssp":true,"customServer":true}

@nbrahmani Sorry for the confusion 😓

As I mentioned earlier this endpoint is dead, so it's no longer outputting the correct response.

If it were working, the info you need would be in the user key in one of the entries.

hello guys hello @JustAnotherArchivist any update about the issue?

AFAIK

  1. The login wall is still there.
  2. Single embedded tweet works, but embedded timeline doesn't. (You can try at https://publish.twitter.com)
  3. Authentication won't be implemented anyway.

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

Can we get the IDs of the post generated by a specific profile? If a single embedded tweet is working, a for-loop through all the IDs will work in the interim. Thank you!

what is this code for ? @prasunshrestha

As of now it seems to be possible to view public tweets without logging in. Wayback Machine can save tweet pages again.
Current snscrape scraping methods still return 404, so it's likely that API endpoints or something else has been changed.

Can't confirm anything more than that for now.

Yes, it is a different endpoint which only returns the single requested tweet, no replies or the replied-to tweet.

Yes, it is a different endpoint which only returns the single requested tweet, no replies or the replied-to tweet.

is it already implemented? if yes wich version should i update or have then

No, my previous comment still applies.

The thing is that I don't want to have to (read: don't have time to) change everything again in two days when Elon has another one of his brilliant ideas, so I'm waiting for the dust to settle down a bit.

Yes, it is a different endpoint which only returns the single requested tweet, no replies or the replied-to tweet.

I will be happy if only I could get the content of the tweet. Please correct me if I am wrong, but the only way possible I see at the moment (without authentication) is through the embedded tweets. As a result, if only I could get the post IDs, I can get the content. Is there a way possible at the moment?

No, my previous comment still applies.

The thing is that I don't want to have to (read: don't have time to) change everything again in two days when Elon has another one of his brilliant ideas, so I'm waiting for the dust to settle down a bit.

okey thank you brother hope it wont take a loong time i really need this in my project

Yes, it is a different endpoint which only returns the single requested tweet, no replies or the replied-to tweet.

Ah, indeed, I didn't notice there's no replies. As for replied-to tweet, I see there's in_reply_to_status_id_str field to get replied-to tweet, quoted_status_id_str to get quoted tweet, and conversation_id_str to get conversation root tweet (not sure), so it may be solved with another request, I suppose, if needed. Yet, well, it might stay this way, or it might not. We can only observe for now.

zedeus/nitter#919 (comment)

Maybe this could provide some help?

Disclaimer: This may be broken by the Tesla guy anytime, so proceed with caution

While I was taking data for my thesis, my twitter developer account was suddenly closed. Right now I have little time left and I desperately need my twitter data. I couldn't start the digger here, is it my fault??

commented

https://twitter.com/TitterDaily/status/1676624363787894784?s=20

👀

No, it does not work, unfortunately

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

Hi, thanks for the code. This works amazingly. However, instead of scraping a single tweet using the tweet id, I want to get multiple tweets based on a search string, with start and end dates, and a tweet limit of how many tweets it should scrape. What changes do I have to make to the query string? I searched online, but could not find documentation on this. I made an assignment for the students at a university and that depended on the snscrape library. Any guidance is appreciated.

Any update?

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result
CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

Hi, thanks for the code. This works amazingly. However, instead of scraping a single tweet using the tweet id, I want to get multiple tweets based on a search string, with start and end dates, and a tweet limit of how many tweets it should scrape. What changes do I have to make to the query string? I searched online, but could not find documentation on this. I made an assignment for the students at a university and that depended on the snscrape library. Any guidance is appreciated.

yes please same question and is it possible to scrape with period of date

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result
CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

Hi, thanks for the code. This works amazingly. However, instead of scraping a single tweet using the tweet id, I want to get multiple tweets based on a search string, with start and end dates, and a tweet limit of how many tweets it should scrape. What changes do I have to make to the query string? I searched online, but could not find documentation on this. I made an assignment for the students at a university and that depended on the snscrape library. Any guidance is appreciated.

yes please same question and is it possible to scrape with period of date

AFAIK

  1. As of now, Twitter allows accessing only the Tweet content ( not the associated replies ) at the TweetResultByRestId GraphQL api, without logging in. The mentioned api requires you to have the tweet id beforehand.
  2. To access the "search by query" endpoint you would either have to login before running any scraping task, or use the official TwitterAPI ( the free version does not allow read requests. The $100 basic account allows reading tweets with a monthly cap of 10,000 )
  3. Logging in and querying the search/usertweet endpoints may result in your account getting banned. They are also rate limited at 50 requests per 15 minutes per endpoint.

I'm not going to refer any other tools that allow authentication support but you'll find some online that use the snscrape and twint models and build on top of that to support user authentication and GQL endpoint querying.

NOTE: Anybody attempting to do so should research and understand the risks and liabilities associated with scraping with an authenticated user.

Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

When I use this code on a retweet, it ONLY returns the original tweeter's username in the JSON. Is there any way to return the retweeter's username?

For example, putting this Tweet ID into the code (1499332226177286144, a retweet by AaronBell4NUL) returns information about the original tweet (1499056854688841732, a tweet by NewsNBC). AaronBell4NUL is not reported in the JSON that's returned, even though I entered his Tweet ID. However, I'd like to be able to enter the retweet's Tweet ID and return AaronBell4NUL.

Here is what I have gathered about what snscrape twitter scraping may look like in the future, if anyone could confirm, deny, or add details that would be awesome.

There is some optimism that it is possible for twitter users to be queried by screen name and that their syndication feeds can be captured. These syndication feeds would consist of up to 20 tweets per user (no retweets or replies) and wouldn't be subject to rate limiting? These updates are pending until the dust on the Twitter changes settles a bit.

https://github.com/zedeus/nitter/pull/927/commits

Nitter has updated endpoints, could be useful for people to mess around with.

I'm curious if this is going to just be a never ending cycle of cat and mouse.

Current state of twitter sucks.

hello guys hello @JustAnotherArchivist any update or news about the issue?

https://github.com/zedeus/nitter/pull/927/commits

Nitter has updated endpoints, could be useful for people to mess around with.

I'm curious if this is going to just be a never ending cycle of cat and mouse.

Current state of twitter sucks.

Yet, keyword scraping not possible as far as I understand.

hello guys hello @JustAnotherArchivist any update or news about the issue?

waiting for an answer guys

when will API endpoint free from locked down ? Any idea ?

Nitter keyword scrapping is possible now. I just tried.

Yeah, I saw, it got implemented in zedeus/nitter@67203a4.

Looks like things have been fairly stable for a few days now. I'm not sure yet when I'll have time to implement the necessary changes, possibly on the weekend.

thanks for your job @JustAnotherArchivist you are really saving our life

Is there a way to donate for your work ? @JustAnotherArchivist

it seems that the nitter keyword research shows the results for the last 10 days only. maybe the new endpoint limit ? hope it will be lifted

zedeus/nitter#938

Looks like things have been fairly stable for a few days now. I'm not sure yet when I'll have time to implement the necessary changes, possibly on the weekend.

Hey @JustAnotherArchivist, any update on this? Thanks!

I literally have my final year project depending on this module. Please save us @JustAnotherArchivist

If you for some reason absolutely need to use snscrape for getting a single tweet, and you don't mind just getting it by tweet id purely in code, you can do it like that:

from snscrape.base import ScraperException
from snscrape.modules.twitter import (
    Tweet,
    _TwitterAPIType,
    TwitterTweetScraper,
    TwitterTweetScraperMode,
)


def get_items(self):
    variables = {
        "tweetId": str(self._tweetId),
        "includePromotedContent": True,
        "withCommunity": True,
        "withVoice": True,
        # !!! these fields may be deprecated
        # "with_rux_injections": False,
        # "withQuickPromoteEligibilityTweetFields": True,
        # "withBirdwatchNotes": False,
        # "withV2Timeline": True,
    }
    features = {
        "responsive_web_graphql_exclude_directive_enabled": True,
        "verified_phone_label_enabled": False,
        "creator_subscriptions_tweet_preview_api_enabled": False,
        "responsive_web_graphql_timeline_navigation_enabled": True,
        "responsive_web_graphql_skip_user_profile_image_extensions_enabled": False,
        "tweetypie_unmention_optimization_enabled": True,
        "responsive_web_edit_tweet_api_enabled": True,
        "graphql_is_translatable_rweb_tweet_is_translatable_enabled": True,
        "view_counts_everywhere_api_enabled": True,
        "longform_notetweets_consumption_enabled": True,
        "tweet_awards_web_tipping_enabled": False,
        "freedom_of_speech_not_reach_fetch_enabled": True,
        "standardized_nudges_misinfo": True,
        "tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled": False,
        "longform_notetweets_rich_text_read_enabled": True,
        "longform_notetweets_inline_media_enabled": False,
        "responsive_web_enhance_cards_enabled": False,
        "responsive_web_twitter_article_tweet_consumption_enabled": False,  # new?
        "responsive_web_media_download_video_enabled": True,  # new?
        # !!! these fields may be deprecated
        # "rweb_lists_timeline_redesign_enabled": False,
        # "vibe_api_enabled": True,
        # "interactive_text_enabled": True,
        # "blue_business_profile_image_shape_enabled": True,
        # "responsive_web_text_conversations_enabled": False,
    }
    fieldToggles = {
        "withArticleRichContentState": True,
        "withAuxiliaryUserLabels": True,
    }
    params = {
        "variables": variables,
        "features": features,
        "fieldToggles": fieldToggles,  # seems optional
    }
    url = "https://twitter.com/i/api/graphql/3HC_X_wzxnMmUBRIn3MWpQ/TweetResultByRestId"
    if self._mode is TwitterTweetScraperMode.SINGLE:
        obj = self._get_api_data(url, _TwitterAPIType.GRAPHQL, params=params)
        if not obj["data"]["tweetResult"]:
            return
        yield self._graphql_timeline_tweet_item_result_to_tweet(
            obj["data"]["tweetResult"]["result"], tweetId=self._tweetId
        )


# replace this method
TwitterTweetScraper.get_items = get_items


def get_using_snscrape(tweet_id: int) -> Tweet | None:
    print("Sending API request...")
    try:
        for tweet in TwitterTweetScraper(tweet_id).get_items():
            print("Response: %r." % tweet)
            return tweet
        print("No response from public API.")
    except ScraperException:
        print("Scraping failed.")

Just pass tweet id to get_using_snscrape function and it will return Tweet instance, if there's anything to return, or None otherwise. Obviously, it is not the best way to do it, but it works at least. You can also adapt results of #996 (comment) to your needs.

If you for some reason absolutely need to use snscrape for getting a single tweet, and you don't mind just getting it by tweet id purely in code, you can do it like that:

from snscrape.base import ScraperException
from snscrape.modules.twitter import (
    Tweet,
    _TwitterAPIType,
    TwitterTweetScraper,
    TwitterTweetScraperMode,
)


def get_items(self):
    variables = {
        "tweetId": str(self._tweetId),
        "includePromotedContent": True,
        "withCommunity": True,
        "withVoice": True,
        # !!! these fields may be deprecated
        # "with_rux_injections": False,
        # "withQuickPromoteEligibilityTweetFields": True,
        # "withBirdwatchNotes": False,
        # "withV2Timeline": True,
    }
    features = {
        "responsive_web_graphql_exclude_directive_enabled": True,
        "verified_phone_label_enabled": False,
        "creator_subscriptions_tweet_preview_api_enabled": False,
        "responsive_web_graphql_timeline_navigation_enabled": True,
        "responsive_web_graphql_skip_user_profile_image_extensions_enabled": False,
        "tweetypie_unmention_optimization_enabled": True,
        "responsive_web_edit_tweet_api_enabled": True,
        "graphql_is_translatable_rweb_tweet_is_translatable_enabled": True,
        "view_counts_everywhere_api_enabled": True,
        "longform_notetweets_consumption_enabled": True,
        "tweet_awards_web_tipping_enabled": False,
        "freedom_of_speech_not_reach_fetch_enabled": True,
        "standardized_nudges_misinfo": True,
        "tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled": False,
        "longform_notetweets_rich_text_read_enabled": True,
        "longform_notetweets_inline_media_enabled": False,
        "responsive_web_enhance_cards_enabled": False,
        "responsive_web_twitter_article_tweet_consumption_enabled": False,  # new?
        "responsive_web_media_download_video_enabled": True,  # new?
        # !!! these fields may be deprecated
        # "rweb_lists_timeline_redesign_enabled": False,
        # "vibe_api_enabled": True,
        # "interactive_text_enabled": True,
        # "blue_business_profile_image_shape_enabled": True,
        # "responsive_web_text_conversations_enabled": False,
    }
    fieldToggles = {
        "withArticleRichContentState": True,
        "withAuxiliaryUserLabels": True,
    }
    params = {
        "variables": variables,
        "features": features,
        "fieldToggles": fieldToggles,  # seems optional
    }
    url = "https://twitter.com/i/api/graphql/3HC_X_wzxnMmUBRIn3MWpQ/TweetResultByRestId"
    if self._mode is TwitterTweetScraperMode.SINGLE:
        obj = self._get_api_data(url, _TwitterAPIType.GRAPHQL, params=params)
        if not obj["data"]["tweetResult"]:
            return
        yield self._graphql_timeline_tweet_item_result_to_tweet(
            obj["data"]["tweetResult"]["result"], tweetId=self._tweetId
        )


# replace this method
TwitterTweetScraper.get_items = get_items


def get_using_snscrape(tweet_id: int) -> Tweet | None:
    print("Sending API request...")
    try:
        for tweet in TwitterTweetScraper(tweet_id).get_items():
            print("Response: %r." % tweet)
            return tweet
        print("No response from public API.")
    except ScraperException:
        print("Scraping failed.")

Just pass tweet id to get_using_snscrape function and it will return Tweet instance, if there's anything to return, or None otherwise. Obviously, it is not the best way to do it, but it works at least. You can also adapt results of #996 (comment) to your needs.

there is no way to get tweets by username and date since until isn't it?

Hello, I'm still having this problem (CRITICAL:snscrape.base:Errors: blocked (404)), do you have any solution?

It seems that Twitter has once again discontinued access to these APIs. like UserByRestId, UserTweets....

@JustAnotherArchivist if you are implementing snscrape using graphql API like what nitter did, will snscrape also encounter the followng issue with keyword search?:

#996 (comment)

Does anyone know if there is a forked snscrape library that uses Twitter login, or something similar to this what works for keyword search and can scrape historical data?

@leockl Almost certainly, yes.

@leockl Almost certainly, yes.

hello @JustAnotherArchivist does the implementation of solution of the blocked(404) gonna be implemented soon?

Is there a library which uses twitter login to scrape?

Update: It seems profiles and the tweets on profiles are available to the public without login again.

@JustAnotherArchivist have you seen that? Do the endpoints work again?

Update: It seems profiles and the tweets on profiles are available to the public without login again.

@JustAnotherArchivist have you seen that? Do the endpoints work again?

FR????

Thanks for spotting! Yes you can view a profile again but it is very strange. When I google an account and try to access it, not working. When I click on a tweet shown in Google results and from there click on the account, the account's tweets are shown. But strangely, only tweets from before 2023 and not in chronological order. I hardly think this is a "stable" feature of twitter but perhaps a sign that they are currently tweaking some things regarding how one can view a profile without being signed in.

Thanks for spotting! Yes you can view a profile again but it is very strange. When I google an account and try to access it, not working. When I click on a tweet shown in Google results and from there click on the account, the account's tweets are shown. But strangely, only tweets from before 2023 and not in chronological order. I hardly think this is a "stable" feature of twitter but perhaps a sign that they are currently tweaking some things regarding how one can view a profile without being signed in.

I do think it is related to UserAgent. Twitter Inc. doesn't want to lose search results from Google since it could damage its influence.
In this case, maybe snscrape could change UA to google to bypass its restrictions.

Thanks for spotting! Yes you can view a profile again but it is very strange. When I google an account and try to access it, not working. When I click on a tweet shown in Google results and from there click on the account, the account's tweets are shown. But strangely, only tweets from before 2023 and not in chronological order. I hardly think this is a "stable" feature of twitter but perhaps a sign that they are currently tweaking some things regarding how one can view a profile without being signed in.

Ah yes you're right. If you first go to a single tweet, then traverse to a profile it will work. But also the Tweets shown seem to be "top" tweets or something, not the latest. But certainly worth a look at the Twitter API endpoints to see what this is all about.

Your User Agent doesn't change like that. But it isn't based on the Referer header either (which would have behaviour like you described, different results from direct navigation vs from web search results). Rather, one of the API endpoints for profile timelines is accessible again, but the page URL still isn't. So if you already have the Twitter website open and click on a profile name, it just triggers that API request and works, but if you open a profile page directly (or refresh it after such an API load), you get the login wall.

Interesting development, yes. There are some complications with implementing this (since snscrape sometimes has to load the profile page to fetch a token), but that can be worked around. The results are very poor though, and the 'Replies' tab (which the twitter-profile scraper is/was using) as well as tweet threads (twitter-tweet with scroll or recurse mode) are still inaccessible.

Also, I've been too busy with life, so I haven't had time to implement any of the changes yet, and I can't currently give any ETA either.

Syndication for Twitter profile works again with the latest tweets:

https://syndication.twitter.com/srv/timeline-profile/screen-name/nypost?showReplies=true

@nerra0pos is there a Python library which works to scrape this Syndication for Twitter site?

Hello, does the profile scraper still work?

Hello, does the profile scraper still work?

i dont think so but you can wait for confirmation of my answer

hello @JustAnotherArchivist still there is no solution or implemention for this error?is there any updates?

Syndication for Twitter profile works again with the latest tweets:

https://syndication.twitter.com/srv/timeline-profile/screen-name/nypost?showReplies=true

Notes about this:

  • Returned data in the HTML, following full load by the Javascript, is application/json , within <script id="__NEXT_DATA__" type="application/json">{}</script
  • Only loads about 20 tweets total.

While useless for pulling old data (without crawling through that awful, obfuscated Javascript), new data could be pulled occasionally and consecutively for sources such as newspapers, publishers, etc; no login is needed, nor any login flow issue.

Does anyone know if the syndication streams are subject to rate limiting?

Syndication for Twitter profile works again with the latest tweets:
https://syndication.twitter.com/srv/timeline-profile/screen-name/nypost?showReplies=true

Notes about this:

  • Returned data in the HTML, following full load by the Javascript, is application/json , within <script id="__NEXT_DATA__" type="application/json">{}</script
  • Only loads about 20 tweets total.

While useless for pulling old data (without crawling through that awful, obfuscated Javascript), new data could be pulled occasionally and consecutively for sources such as newspapers, publishers, etc; no login is needed, nor any login flow issue.

Sometimes, the tweets in syndication are trimmed.
Do you know of a working endpoint or page (without login) to get the full tweet given id and the username?