JustAnotherArchivist / snscrape

A social networking service scraper in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

i am getting 404 block

ihabpalamino opened this issue · comments

Describe the bug

Error retrieving https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22%28from%3ANone%29%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D: blocked (404)
4 requests to https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22%28from%3ANone%29%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D failed, giving up.
Errors: blocked (404), blocked (404), blocked (404), blocked (404)
127.0.0.1 - - [03/Jul/2023 13:05:51] "POST /scrape-tweets2 HTTP/1.1" 500 -
Traceback (most recent call last):
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 2213, in call
return self.wsgi_app(environ, start_response)
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 2193, in wsgi_app
response = self.handle_exception(e)
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "C:\Users\HP Probook\PycharmProjects\firstproject\TweetsSraper.py", line 29, in scrape_tweets
for i, tweet in enumerate(scraper.get_items()):
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\modules\twitter.py", line 1699, in get_items
for obj in self._iter_api_data('https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline', _TwitterAPIType.GRAPHQL, params, paginationParams, cursor = self._cursor, instructionsPath = ['data', 'search_by_raw_query', 'search_timeline', 'timeline', 'instructions']):
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\modules\twitter.py", line 867, in _iter_api_data
obj = self._get_api_data(endpoint, apiType, reqParams, instructionsPath = instructionsPath)
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\modules\twitter.py", line 838, in _get_api_data
r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = functools.partial(self._check_api_response, apiType = apiType, instructionsPath = instructionsPath))
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\base.py", line 272, in _get
return self._request('GET', *args, **kwargs)
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\base.py", line 268, in _request
raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22%28from%3ANone%29%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D failed, giving up.

How to reproduce

it was working fine but not anymore

Expected behaviour

my code from datetime import datetime

from flask import Flask, request, jsonify

import snscrape.modules.twitter as sntwitter
import pandas as pd
import json
import re

app = Flask(name)

@app.route('/scrape-tweets2', methods=['POST'])
def scrape_tweets():
Username = request.form.get('username')
SINCE = request.form.get('since')
UNTIL = request.form.get('until')
PLATFORM_NAME = request.form.get('plateform')

if SINCE and UNTIL:
    since_date = datetime.strptime(SINCE, "%Y-%m-%d")
    until_date = datetime.strptime(UNTIL, "%Y-%m-%d")
    date_range = f" since:{since_date.strftime('%Y-%m-%d')} until:{until_date.strftime('%Y-%m-%d')}"
else:
    date_range = ""

scraper = sntwitter.TwitterSearchScraper(f"(from:{Username}){date_range}")
tweets = []
for i, tweet in enumerate(scraper.get_items()):
    if tweet.media is not None and any(mediatype == "video" for mediatype in tweet.media):
        view_count = tweet.viewCount
    else:
        view_count = "Not a video tweet"

    data = {
        "id_post": tweet.id,
        "Date": tweet.date.strftime("%Y-%m-%d"),
        "Heure": tweet.date.strftime("%H:%M:%S"),
        "content": tweet.content,
        "username": tweet.user.username,
        "likecount": tweet.likeCount,
        "shares": tweet.retweetCount,
        "comments": tweet.replyCount,
        "platformname": PLATFORM_NAME,
        "postUrl": tweet.url
    }
    tweets.append(data)
    if i > 800:
        break

tweet_df = pd.DataFrame(tweets, columns=["id_post", "Date", "Heure", "content", "username", "likecount", "shares",
                                         "comments", "platformname", "postUrl"])
tweet_df.to_csv('tweeter.csv', sep=";", encoding='utf-8', index=False)

tweet_json = tweet_df.to_json(orient='records', indent=4, force_ascii=False)

clean_insta_json = re.sub(r"[\x00-\x1F\x7F-\x9F]", "", tweet_json)
response = jsonify(json.loads(clean_insta_json))
response.headers['Content-Type'] = 'application/json'
return response

if name == 'main':
app.run(debug=True)

Screenshots and recordings

No response

Operating system

Windows 11

Python version: output of python3 --version

3.9.13

snscrape version: output of snscrape --version

snscrape-0.6.2.20230321.**dev39+**gc3b216c

Scraper

TwitterSearchScrapper

How are you using snscrape?

Module (import snscrape.modules.something in Python code)

Backtrace

No response

Log output

No response

Dump of locals

No response

Additional context

No response

same, my code was working fine the last 2 weeks. Been using the development version

pip3 install --upgrade git+https://github.com/JustAnotherArchivist/snscrape.git

but as of today, no kind of change is allowing me to get past the 404 block.

@op take a look at #996

Same here, one day I was using simple queries and the other I was getting blocked.
However, you can use Twitter API to tackle this problem but you ill reach a limit of tweets you can scrape.

My thoughts is that the owner of Twitter stopped unlimited queries in order not to let AI improve over his social network. (Many articles I read are saying that)

Same here. Has anybody managed to come up with a solution?