RSS-Bridge / rss-bridge

The RSS feed for websites missing it

Home Page:https://rss-bridge.org/bridge01/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why Instagram bridge does not work and possible solutions

em92 opened this issue · comments

Recently there are a lot of new issues and comments about instagram bridge throwing 429 errors #1863 #1885 and even workaround from #1617 (comment) does not help.

First of all, I want to clarify, that there is no maintainer of InstagramBridge. Recent commit 56b2c51 had to be pushed long time ago. By maintainer I mean person, that at least fixes bugs or reports why it is certain bug is not fixable at the moment and comments on InstagramBridge PRs.

429 error means "Too many requests". It means that Instagram servers receive a lot of requests from your server (not only rss-bridge instance). So InstagramBridge on public and popular RSS-Bridge instance will probably throw this error.

There are opinions, if we make somehow InstagramBridge to login via existing account, it won't show such errors. To prove that in practice, private credentials feature has to be implemented. There is issue for that #1170 and draft PR #1343. But @teromene won't continue working on that PR, so someone has to continue his work.

Possible solutions for users:

  • Deploy RSS-Bridge on your personal PC or laptop and use InstagramBridge from there.
  • Deploy RSS-Bridge on your VPS, make sure that only certain people use it and use InstagramBridge from there.
  • Use bibliogram https://sr.ht/~cadence/bibliogram/ instead of RSS-Bridge.

Note, that deploying RSS-Bridge on shared hosting probably won't help, 'cos there would be other users making requests to instagram on the same server.

Deploy RSS-Bridge on your VPS, make sure that only certain people use it and use InstagramBridge from there.

This doesn't actually work. I've deployed my own instance with password protection and with only 4 user accounts to be fetched by bridge every hour, but there is still error.

Deploy RSS-Bridge on your VPS, make sure that only certain people use it and use InstagramBridge from there.

This doesn't actually work. I've deployed my own instance with password protection and with only 4 user accounts to be fetched by bridge every hour, but there is still error.

Same here, I'm the only one using the bridge on my proper instance (@home, not shared IP) and the problem persist.

I would like to investigate more, but the bridge is an hard piece, and I do not understand what is USER_QUERY_HASH as I mentionned in #1864 (comment).
We surely need a developer who knows how the GraphQL facebook API works.

Solution, as you said, is to have a private authentication method, and we should wait for #1170 and #1343
A few years ago I tried to implement such thing (for a MediapartBridge) but the work is outdated now.
Force to @teromene and you all! punch

For private accounts, take a look at https://github.com/dilame/instagram-private-api

429 error means "Too many requests".

I confirm. Switched my feeds to update once a day and no more error. Thanks for the tip.

@em92 Maybe I'm crazy or misreading things, but I'm not sure the GraphQL endpoints are going to work for login based feeds. In the API overview they are pretty clear that Facebook business accounts connected to an Instagram account are required and that business validation is needed.

I'm guessing this is why projects like Instagram Private API went the route of using the consumer API and adding in the extra calls to look like Android for good measure. Facebook even states in their documentation that: "The API cannot access Instagram consumer accounts (i.e., non-Business or non-Creator Instagram accounts). If you are building an app for consumer users, use the Instagram Basic Display API instead."

Perhaps GraphQL could work, but a quick dive in doesn't seem to be the case. It seems more like the Instagram Bridge would need to be converted over to the consumer API. However, I'm totally new to this, so I could be way off base.

Just pushed teromene's patch to add possibility to use private credentials. Example usage is given in first message of PR #1343

So, if anyone wants to make changes to InstagramBridge to use those credentials - feel free to do it. Just for any case post message here like "I am going to do this", just to make sure that none is making same patch simultaneously.

@Fmstrat, your message could be useful to InstagramBridge maintainer, but there is none at the moment.

If a maintainer comes along (or I eventually get time), a bit more research into this looks like you can use the GraphQL endpoint "privately" within a user session with a CSRF token. This is how Instaloader handles it in their JSON call. They get a token, then create the session, then they use the session, then grab the JSON with that session.

Recreating that process should do the trick.

Well, that was easy. Initial PR is in: #1894

Actually, hold on that PR. I figured out if I modify the storage to the cookie text, then I can fix the username/id bug, too. ;)

If I understand what’s happening, another solution could be to throttle the requests (based on the number of feeds making requests to Instagram API) instead of falling back to logging in. Is that possible?

That way, people without an Instagram account would not be excluded.

@arkhi The problem is that kind of delay needs to come from the reader, not the bridge, or the reader may timeout.

@em92 I'll be working today to do a more formal login similar to the style of Instaloader.

Alrighty, I feel like I'm 95% done, but can't seem to get the sessionid back in the post-login header. Maybe someone here can help out? @em92 if I can figure this out I can probably keep this maintained, too.

Here's the changes: Fmstrat@f625892

To test it out, you'll need to put your username/password in here: https://github.com/Fmstrat/rss-bridge/blob/private_insta/bridges/InstagramBridge.php#L141 (Until I integrate into the private feed option that was recently merged.

The problem is when calling the login, the sessionid variable never comes back. This could be as simple as a failed login and I'm just not seeing why. @em92 am I using the post options the way you would expect with json_encode?

In the linked file you can see on line 110 there is a sessionid cookie returned, which does not occur when I make the request in my code.

Request logs: instalogin.txt

Thanks!

Has there been any updates for this issue? Any hope of a solution?

@Aasemoon Not from my end, need some feedback on what seems wrong in my commit, first.

How about using Instaloader as a backend?

Just checked @JimDog546's solution with session id (#1894 (comment)). As for now it is not user friendly and you should use your own instance. Here is quick and dirty patch:

diff --git a/bridges/InstagramBridge.php b/bridges/InstagramBridge.php
index bf2999b..1a1c4ac 100644
--- a/bridges/InstagramBridge.php
+++ b/bridges/InstagramBridge.php
@@ -49,6 +49,7 @@ class InstagramBridge extends BridgeAbstract {
        const USER_QUERY_HASH = '58b6785bea111c67129decbe6a448951';
        const TAG_QUERY_HASH = '9b498c08113f1e09617a1703c22b2f32';
        const SHORTCODE_QUERY_HASH = '865589822932d1b43dfe312121dd353a';
+       const SESSIONID = '';
+       const CACHE_TIMEOUT = 43200; // 12 hours
 
        protected function getInstagramUserId($username) {
 
@@ -62,7 +63,8 @@ class InstagramBridge extends BridgeAbstract {
                $key = $cache->loadData();
 
                if($key == null) {
-                               $data = getContents(self::URI . 'web/search/topsearch/?query=' . $username);
+                               $header = array('cookie: sessionid=' . self::SESSIONID);
+                               $data = getContents(self::URI . 'web/search/topsearch/?query=' . $username, $header);
 
                                foreach(json_decode($data)->users as $user) {
                                        if(strtolower($user->user->username) === strtolower($username)) {
@@ -220,12 +222,13 @@ class InstagramBridge extends BridgeAbstract {
 
                        $userId = $this->getInstagramUserId($this->getInput('u'));
 
+                       $header = array('cookie: sessionid=' . self::SESSIONID);
                        $data = getContents(self::URI .
                                                                'graphql/query/?query_hash=' .
                                                                 self::USER_QUERY_HASH .
                                                                 '&variables={"id"%3A"' .
                                                                $userId .
-                                                               '"%2C"first"%3A10}');
+                                                               '"%2C"first"%3A10}', $header);
                        return json_decode($data);
 
                } elseif(!is_null($this->getInput('h'))) {

In this patch you should set your own SESSIONID. To get it, you should:

  • login to instagram
  • open chrome dev tools, network tab
  • in filter input "instagram.com"
  • click to any request
  • in new frame in headers tab navigate to "Request headers"
  • find cookie value
  • in this cookie value find sessionid value. For example it is sessionid=xxxxxxxxx;
  • that xxxxxxxxx is value for SESSIONID

PR with using #1343 is welcome. Also welcome ideas about how to make this solution user friendly.

[Edited, Apr 12, 2021. 23:19 YEKT - added CACHE_TIMEOUT]

@JimDog546
FYI. I have recently checked, why my instance stopped returning feeds. Reason is I had to accept new terms of usage or something like this.

Hey, all!
Recently I have added documentation, how to setup InstagramBridge for private usage. Could you please review it?
It probably works with 50 feeds and cache_timeout = 43200 (12 hours)
https://github.com/em92/rss-bridge/blob/doc-instagram-2022-01/doc/bridges/InstagramBridge.rst

Merged in master, new link: https://github.com/RSS-Bridge/rss-bridge/blob/master/doc/bridges/InstagramBridge.rst

This issue should be stayed open at least until "content donoring" method is described. Prototype was implemented for my customer in my personal branch: https://github.com/em92/rss-bridge/tree/shi/contrib/InstagramBridge but I am not sure, if it is usable for general cases.

commented

@em92 Can you move it to the correct docu folder? Just create a new folder "Bridge specific" and add an 01_Instagram.md to it. You can see how it works with the others. Its automatic, you only need to create the file and folder.

Maybe Instagram.md, not 01_Instagram.md? It is assumed, that without "01_" pages will be sorted by alphabet (bridge name)

commented

Correct. And yes, alphabetically makes more sense in this case, so keep it as "Instagram.md".

heads up (in case anyone else has the same experience)- I set this up yesterday on a private Heroku instance, and woke up to find that Instagram had locked the account, thinking that I had been the victim of a phishing attack! They made me reset the password, but otherwise, and I had to change the session_id, but it works okay now :-)

FYI, @tvqt, some time ago I configured session_id and ds_user_id on my public instance https://feed.eugenemolotov.ru.
I had approximately 400+ unique instagram username queries and 1 day cache timeout.

In 2 days I got temporary ban, until I verified my phone number. Same happened in two days, again temp ban. Same happend in another two days, again temp ban. Finally after another two days I got permaban.

As for now in my public instance, I am testing the other method of fetching instagram feeds. RSS-Bridge pushes task to queue, browser with userscript and logged in instagram user gets this task, fetches data from instagram and pushes back to RSS-Bridge. When it is ready, I will make PR. Here is branch with those changes if someone is interested. https://github.com/em92/rss-bridge/commits/instagram-rabbitmq (don't mention "rabbitmq" in the branch name, it won't be used).

UPD: Nov 4, 2022. new branch https://github.com/em92/rss-bridge/tree/instagram-jq

@em92 thanks for your reply (and the work you have done!). I can't seem to get the instagram-rabbitmq branch you linked working on my own instance - getting it set up the same way as the original version, searching for a user's profile returns:

Uncaught Exception Error: Call to undefined method InstagramBridge::saveCachedValue() at bridges/InstagramBridge.php line 145
#0 index.php:7
#1 lib/RssBridge.php:15
#2 lib/RssBridge.php:59
#3 actions/DisplayAction.php:136
#4 bridges/InstagramBridge.php:154
#5 bridges/InstagramBridge.php:309
#6 bridges/InstagramBridge.php:145
Query string: action=display&bridge=InstagramBridge&context=Username&u=elonmusk&media_type=all&format=Html
Version: dev.2022-06-14
OS: Linux
PHP version: 8.1.10

Quite odd!

Your instance works, however, I can't seem to see the queuing message ("RSS-Bridge pushed job to retreive data. Meanwhile you can add feed link to your feed reader. Posts will appear when job is done") in the rabbitmq repository- is the instance running on a different version of the branch?

Call to undefined method InstagramBridge::saveCachedValue

There is a typo. Just pushed a fix to my branch.

Also you may need to add this in config.ini.php.

[JobQueue]
file = ./jobqueue.sqlite3

Hey, @deletosh !
No it isn't. As for now, I have no motivation to restructure, tidy up the code and write documentation.