i am getting 404 block

Question

i am getting 404 block

ihabpalamino opened this issue a year ago · comments

Describe the bug

How to reproduce

it was working fine but not anymore

Expected behaviour

my code from datetime import datetime

from flask import Flask, request, jsonify

import snscrape.modules.twitter as sntwitter
import pandas as pd
import json
import re

app = Flask(name)

@app.route('/scrape-tweets2', methods=['POST'])
def scrape_tweets():
Username = request.form.get('username')
SINCE = request.form.get('since')
UNTIL = request.form.get('until')
PLATFORM_NAME = request.form.get('plateform')

if SINCE and UNTIL:
    since_date = datetime.strptime(SINCE, "%Y-%m-%d")
    until_date = datetime.strptime(UNTIL, "%Y-%m-%d")
    date_range = f" since:{since_date.strftime('%Y-%m-%d')} until:{until_date.strftime('%Y-%m-%d')}"
else:
    date_range = ""

scraper = sntwitter.TwitterSearchScraper(f"(from:{Username}){date_range}")
tweets = []
for i, tweet in enumerate(scraper.get_items()):
    if tweet.media is not None and any(mediatype == "video" for mediatype in tweet.media):
        view_count = tweet.viewCount
    else:
        view_count = "Not a video tweet"

    data = {
        "id_post": tweet.id,
        "Date": tweet.date.strftime("%Y-%m-%d"),
        "Heure": tweet.date.strftime("%H:%M:%S"),
        "content": tweet.content,
        "username": tweet.user.username,
        "likecount": tweet.likeCount,
        "shares": tweet.retweetCount,
        "comments": tweet.replyCount,
        "platformname": PLATFORM_NAME,
        "postUrl": tweet.url
    }
    tweets.append(data)
    if i > 800:
        break

tweet_df = pd.DataFrame(tweets, columns=["id_post", "Date", "Heure", "content", "username", "likecount", "shares",
                                         "comments", "platformname", "postUrl"])
tweet_df.to_csv('tweeter.csv', sep=";", encoding='utf-8', index=False)

tweet_json = tweet_df.to_json(orient='records', indent=4, force_ascii=False)

clean_insta_json = re.sub(r"[\x00-\x1F\x7F-\x9F]", "", tweet_json)
response = jsonify(json.loads(clean_insta_json))
response.headers['Content-Type'] = 'application/json'
return response

if name == 'main':
app.run(debug=True)

Screenshots and recordings

No response

Operating system

Windows 11

Python version: output of `python3 --version`

3.9.13

snscrape version: output of `snscrape --version`

snscrape-0.6.2.20230321.**dev39+**gc3b216c

Scraper

TwitterSearchScrapper

How are you using snscrape?

Module (import snscrape.modules.something in Python code)

Backtrace

No response

Log output

No response

Dump of locals

No response

Additional context

No response

Islam Elsayed · Answer 1 · Mon Jul 03 2023 20:25:42 GMT+0800 (China Standard Time)

same, my code was working fine the last 2 weeks. Been using the development version

pip3 install --upgrade git+https://github.com/JustAnotherArchivist/snscrape.git

but as of today, no kind of change is allowing me to get past the 404 block.

@op take a look at #996

GermainBoitel · Answer 2 · Tue Jul 25 2023 18:07:21 GMT+0800 (China Standard Time)

Same here, one day I was using simple queries and the other I was getting blocked.
However, you can use Twitter API to tackle this problem but you ill reach a limit of tweets you can scrape.

My thoughts is that the owner of Twitter stopped unlimited queries in order not to let AI improve over his social network. (Many articles I read are saying that)

Elvin Zeynalli · Answer 3 · Mon Aug 07 2023 22:51:11 GMT+0800 (China Standard Time)

Same here. Has anybody managed to come up with a solution?

i am getting 404 block

Describe the bug

How to reproduce

Expected behaviour

Screenshots and recordings

Operating system

Python version: output of python3 --version

snscrape version: output of snscrape --version

Scraper

How are you using snscrape?

Backtrace

Log output

Dump of locals

Additional context

Python version: output of `python3 --version`

snscrape version: output of `snscrape --version`