RSS-Bridge / rss-bridge

The RSS feed for websites missing it

Home Page:https://rss-bridge.org/bridge01/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems with Facebook on public RSS-Bridge instances

em92 opened this issue · comments

Due to many recent You must be logged in to view this page. This is not supported by RSS-Bridge issues coming from Facebook users (#2041, comments from #2014, #2037) I investigated those issues more clearly.

If I open "https://www.facebook.com/facebook/posts" from my home laptop, everything is fine posts are returned.
If I open "https://www.facebook.com/facebook/posts" from my public instance (https://feed.eugenemolotov.ru), it will return redirect to login page.

Looks like FacebookBridge has the same problems as InstagramBridge (#1891), which breaks using FacebookBridge on public RSS-Bridge instances.

Possible solutions for users (same as in metioned InstagramBridge):

  • Deploy RSS-Bridge on your personal PC or laptop and use FacebookBridge from there.
  • Deploy RSS-Bridge on your VPS, make sure that only certain people use it and use FacebookBridge from there.

Thank you very much for investigating. I'm using shared hosting for mine and it's folder protected so only I have access, but it probably just checks the IP and since it's shared hosting, it's heavily used. May have to pay for a private IP in that case.

I installed it on my vps (Infomaniak, Switzerland) and I am the only one using it. Unfortunately it doesn't work either.
I tried to visit a Facebook page with Firefox and it automatically redirects me to the login page.

I'm running my single-user RSS Bridge instance on Digital Ocean, and feeds which were giving me error 500 since April 1st just started working again. Let's see for how long...

Edit: stopped working two hours later.

I am having exactly this problem for 1 week. Over 100 feeds created through Facebook main site bridge result in errors. I am using my personal laptop, so this cannot be the reason.
Yesterday, 3 feeds (out of the 100+) delivered lots of previous missed articles ; today, only 1 out of 100+ is working. Looks like it is random and erratic.
I am using them to deliver a daily news digest, it's been 1 week I cannot do it properly and need to check all sources 1 by 1. It is not efficient and time-consuming. What should I do?

If I open "https://www.facebook.com/facebook/posts" from my home laptop, everything is fine posts are returned.
If I open "https://www.facebook.com/facebook/posts" from my public instance (https://feed.eugenemolotov.ru), it will return redirect to login page.

Hi @em92 ,

If you remove the /posts part of the url then you don't get the login page show, even on a public instance.

eg

https://www.facebook.com/facebook/posts (redirect to login page)

https://www.facebook.com/facebook (no redirect, page content shown).

Did you try that?

@tstanbur I have the same problem than @Noutladeesse I dont understand how I can modify https://www.facebook.com/facebook/posts to https://www.facebook.com/facebook, I have an rss feed without facebook inside.

@tstanbur I have the same problem than @Noutladeesse I dont understand how I can modify https://www.facebook.com/facebook/posts to https://www.facebook.com/facebook, I have an rss feed without facebook inside.

I have the same issue too!

I was just trying to help fix it, hopefully @em92 can (I think he's the author?)

@tstanbur understand, my english is too poor. :)

@tstanbur understand, my english is too poor. :)

@cborne : @tstanbur a le même problème que nous bien que ses RSS feeds ne soient pas de feeds de Facebook, il demande si @em92 est l'auteur et s'il peut nous aider à résoudre le problème (je traduis !)

@Noutladeesse merci j'ai fini par comprendre par la suite, au départ je ne comprenais pas ce que faisaient les urls en facebook au milieu mais il s'agit d'une proposition de correction pour @em92. L'anglais c'est pas vraiment comme le vélo, quand tu le pratiques pas ça revient pas tout seul. :)

@Noutladeesse merci j'ai fini par comprendre par la suite, au départ je ne comprenais pas ce que faisaient les urls en facebook au milieu mais il s'agit d'une proposition de correction pour @em92. L'anglais c'est pas vraiment comme le vélo, quand tu le pratiques pas ça revient pas tout seul. :)

:-D
Oui c'est une proposition de correction, mais ça ne marche pas pour les feeds déjà créés.

Just another small "me too". I'm running RSS-Bridge on my personal VPS (only user) since a long while (~2 years) and I'm also affected by the issue. It started about 1-2 weeks ago, then it started working on Monday and was ok for about 2 days and now it stopped again.

It does seem like an Facebook action to block RSS-Bridge (probably with their silly reasoning that this would somehow make the people go back to using their awful service…)

Here just one more ''me too''. I specifically signed up here on Github to ask a few things about the Facebook bridge. Until last week, I had been using a public host from Eugene Molotov to my full satisfaction for about a year (thanks a lot). I don't have any technical background, so it is sometimes difficult for me to be able to keep up with all the terms that come up with this topic here.

I wonder if the above and below option mentioned by em92 still works and how I could get it running on my own PC:

Deploy RSS-Bridge on your personal PC or laptop and use FacebookBridge from there.

I would be very happy if I could still use the Facebook Bridge in this way, but I am not sure if this still works and how to install it on my own PC. I have looked through github quite a bit, but unfortunately I can't figure it out myself, which is why I decided to sign up.

If users could confirm or deny that this feature still works, I would be happy with that. Then my next question would be how I can best put the bridge on my own PC or who I can ask for help or get information how to do so. I also think it would be a very good idea to start a donation fund to get a developer to maintain the facebook bridge and make also the instagram bridge work again. That way, we can all contribute to get our beloved feeds going again. Greetings from the Netherlands and thanks for your great work over the years!

Same here. Since April I got different errors in the same Feeds, like:

"Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: `You must be logged in to view this page. This is not supported by RSS-Bridge."

"Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: `The requested resource cannot be found!"

"Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: Call to a member function children() on null
Query string: action=display&bridge=Facebook&u=hyperlitemountaingear&media_type=all&limit=1000&format=Atom
Version: dev.2020-11-10"
Latest version of RSS-Bridge…

I've been having these errors as well and I found that changing the cache_timeout parameter in FacebookBridge seems to reset the bridge, but it only works for a little while. I've tried 86400, 43200, 21600, 1, 0, and even eliminating the parameter. Somehow resetting the cache every time the bridge is called might be the solution to this problem?

I've been having these errors as well and I found that changing the cache_timeout parameter in FacebookBridge seems to reset the bridge, but it only works for a little while. I've tried 86400, 43200, 21600, 1, 0, and even eliminating the parameter. Somehow resetting the cache every time the bridge is called might be the solution to this problem?

Thank you for suggesting @Mthmgcn05
How do you reset the cache? (I am not an IT professional, only a user)

After more testing and thought, it may be every time I redeployed, it worked for five minutes, so that could have been resetting it.

commented

It's seems that adding cookie "c_user=XXXX" where XXXX is my ID from Facebook cookie helped. I don't know how to add this only via Bridge, so I did it via contents.php for all requests, which is really bad, but... maybe it's the way for better solution :-)

EDIT: False alarm, not working again...

@miwcz on my public instance I used c_user and xs values. Quick and dirty patch looks like this:

diff --git a/bridges/FacebookBridge.php b/bridges/FacebookBridge.php
index c03de4e..fafeabd 100644
--- a/bridges/FacebookBridge.php
+++ b/bridges/FacebookBridge.php
@@ -174,6 +174,8 @@ class FacebookBridge extends BridgeAbstract {
 		} else {
 			$header = array();
 		}
+		$header[] = 'Cookie: c_user=xxxx; xs=yyyy;';
+
 
 		$touchURI = str_replace(
 			'https://www.facebook',
@@ -560,11 +562,15 @@ EOD;
 				$header = array();
 			}
 
+			$header[] = 'Cookie: c_user=xxxx; xs=yyyy;';
+
+
 			$html = getSimpleHTMLDOM($this->getURI(), $header)
 				or returnServerError('No results for this query.');
 
 		}
 
 		// Handle captcha form?
 		$captcha = $html->find('div.captcha_interstitial', 0);
 

So far, so good.

So far, so good.

I meant it is working on my instance at the moment.

@tstanbur

hopefully @em92 can (I think he's the author?)

I am not author of this bridge. I maintain RSS-Bridge in general (reviewing pull requests, pinging bridge maintainers in issues) and bridges for Pikabu and Vk.

Usually maintainer of the bridge does fix bugs, but we don't have maintainer for Facebook bridge. I have little time to fix bugs in bridges, that I don't maintain.

commented

I have 20+ facebook feeds and this is working only for 4-5 first requests. It seems that facebook is blocking mutliple requests after short while.

@tstanbur

hopefully @em92 can (I think he's the author?)

I am not author of this bridge. I maintain RSS-Bridge in general (reviewing pull requests, pinging bridge maintainers in issues) and bridges for Pikabu and Vk.

Usually maintainer of the bridge does fix bugs, but we don't have maintainer for Facebook bridge. I have little time to fix bugs in bridges, that I don't maintain.

Is there anyone who maintains Facebook bridge? @em92

I meant it is working on my instance at the moment.

Now it does not. Facebook disabled my account 'cos my account violates it's community standards. It pursuaded me to upload my photo (I did it, the real photo of me) and now I am waiting for reviewing.

@Noutladeesse

Is there anyone who maintains Facebook bridge?

No.

I don't have any new information to add that other users haven't already discussed. I'm only here to say that it is happening to me too. I am running FB Bridge on Heroku and using Feedly to save the feeds. I started getting Bridge returned error 500! around the beginning of April.

Some feeds only get the error occasionally. Other feeds keep getting the error constantly, which makes those feeds useless.

Example error message:

Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: `You must be logged in to view this page. This is not supported by RSS-Bridge.`
Query string: `action=display&bridge=Facebook&context=User&u=[REDACTED]&media_type=all&limit=-1&format=Atom`
Version: `dev.2020-02-26`

    Press Return to check your input parameters
    Press F5 to retry
    Check if this issue was already reported on GitHub (give it a thumbs-up)
    Open a GitHub Issue if this error persists

teromene, logmanoriginal

Here is final of my story, where I tried to make FacebookBridge work in my public instance (https://feed.eugenemolotov.ru) using my account with real phone number and patch from #2047 (comment).

r1zbxfG

I didn't read their paper with title "Community standards", but it looks like it sharing posts via RSS does not follow it.

It most likely boils down to "overusing API" or "harvesting data". Which just a tad silly.

For me using RSS-Bridge (and RSS in general) it to avoid having facebook account and be able to follow some websites that for some twisted reason are being present only there...

So I think I may have possibly discovered the issue.

I have put rss-bridge on my own server on my computer and realized after putting in the switch for debugging that Facebook responds in two different ways in the url header:

https://www.facebook.com/login/?next=https%3A%2F%2Fwww.facebook.com%2FXXXXXXXXXXXXX%2Fposts

and

https://www.facebook.com/XXXXXXXXXXXXX/posts?_fb_noscript=1

where XXXXXXXXXXXXX is the page id and the first gives the error:

'You must be logged in to view this page. This is not supported by RSS-Bridge.'

The second response gives the posts as requested.

I'm no programmer, but if one could clean up the first response to make it look like the second and then continue onto the rest of the code, I think the bridge would work.

@Mthmgcn05 I already mentioned this facebook behavior in first message of this issue.

Ha! It's been so long since I looked at the beginning, I forgot where we began.

timeline RSS works fine, it already run 1 week, refresh every 6 minutes. maximal use the resource

timeline RSS works fine, it already run 1 week, refresh every 6 minutes. maximal use the resource

@10362227 Could you explain further? How do you get it to work without errors? Is it because you are running it on your home computer?

After redeploying changes to figure out what works, Heroku, which is what I use, gives RSS-Bridge a new IP address, and each time that works for about five minutes. I use Inoreader to request updates and this requests so many times that it gets blocked again. I think there needs to be a system of queuing requests within a certain time frame that doesn't get the IP address blocked. And this may fix other bridges if we can figure out how often is too often and how to queue requests. At https://developers.facebook.com/docs/graph-api/overview/rate-limiting/ there is a description of rate limiting of 200 calls per hour and that's using Facebook's own API. I'm sure if the rate of requests exceeds that for so long a time, an IP address gets blocked. That's one every 18 seconds.

i wrote a simple script for myself, it grabs FB homepage timeline (https://www.facebook.com/?sk=h_chr). but you need an account
first, then follow some people

Hi,

I did the changes mentionned in #2047 (comment)

but nothing works and error occured at line 588 because the array is empty ($html->find('#pagelet_timeline_main_column') returns empty array).

$element = $html
                ->find('#pagelet_timeline_main_column')[0]
                ->children(0)
                ->children(0)
                ->next_sibling()
                ->children(0);

Any ideas why?

What I tried as well is to echo $html just before and commenting the returnServerError.

So like this:

if($loginForm != null) {
    //returnServerError('You must be logged in to view this page. This is not$
}

echo $html;

$element = $html
                ->find('#pagelet_timeline_main_column')[0]
                ->children(0)
                ->children(0)
                ->next_sibling()
                ->children(0);

And I got a login page when going to the rss bridge html view:
https://mydomain/?action=display&bridge=Facebook&context=User&u=fondationtaraocean&media_type=all&limit=-1&format=Html

image

@floviolleau, I am not speaking French, but I think you need to accept Facebook's new terms of usage

Yes but I did an echo so this page is rendered inside rss bridge html view and it is not going working like this 😉. Just to say that for whatever reason, it is asking to login and the problem for me is intermittent.

The issue is not with the cookies per se but with FB blocking crawling. I just ran a local instance on my machine - I was able to open the /posts endpoint in any browser, then launched RRS-Bridge, put it to crawl the page every couple of seconds and to no surprise, i got the error after a couple of minutes and then I was blocking me constantly and nagging to log-in.

Logging-in (and using session cookie) won't work in the long term because because FB will just block your account like it did in @em92 case.

The only possibility I see is try to extract posts from main page (i.e. https://www.facebook.com/XXXXXXXXXXXXX/?_fb_noscript=1 instead of https://www.facebook.com/XXXXXXXXXXXXX/posts?_fb_noscript=1) but that has the following issues:

  • main page is a mess of everything and only display 2 posts on initial load
  • you have to scroll down to load regular posts feed, but I'm not sure how to handle that here.

I'd say that the situation is quite dire. For me, I just looked for the interesting pages on different sites (twitter usually, as quite often they use same tool to push to various "social media") and if some weren't present elsewhere I nagged the admins to be present.

I keep my fingers crossed that maybe the EU will somehow force those platforms via legislation to be interoperable instead of darn walled-gardens :|

The only possibility I see is try to extract posts from main page

Tried on my host. It does redirect to login page.

Another idea is implementing bridge using Facebook's API. For example using this: https://developers.facebook.com/docs/graph-api/reference/post

But for that, you need to find someone who will implement it. Previous maintainers of FacebookBridge lost motivation or time to maintain it. I have written some ideas about donating here #2063, but we need to discuss it first.

Edit: See #2047 (comment)

I don't think the Api would help.
https://developers.facebook.com/docs/apps/features-reference/page-public-content-access

"This permission or feature is only available with business verification. You may also need to sign additional contracts before your app can access data. Learn More."

So every Bridge has to get verification, as i don't think that you can deploy the tokens of your account ;)

"This permission or feature is only available with business verification. You may also need to sign additional contracts before your app can access data. Learn More."

Found it here: https://developers.facebook.com/docs/apps/features-reference/page-public-content-access

Thanks for mentioning it @simarilius

commented

This is huge problem for many - I've decided to use Python and it works. Facebook are trying to prevent bots and scrapers. On Python you actually sign into your personal facebook account by using Selenium (headless browser) and from there you scrape whatever you need

I entered my Facebook cookie into the bridge and that does work to get to the right page; however, the page is under the new design and this bridge is not configured to gather information from the new design. So there's two possible solutions: either configure some rate limit and caching requests so the login screen doesn't appear without the need to have an account or the bridge needs to be reconfigured to gather information from the new design using the cookie method to bypass the login screen.

Hi,
I have been having the same issue for over a month so I deployed RSS Bridge on the Docker application https://hub.docker.com/r/rssbridge/rss-bridge Downloaded Docker copied the rssbridge pull command docker and now it's working perfectly - https://docs.docker.com/get-started/

Hope that helps

I have RSS-bridge on my own deployment (docker doesn't matter here much) and the issue happens from time to time still...

Anyone got a fix for the facebook Rss problem?

Anyone got a fix for the facebook Rss problem?

No, Facebook have intentionally made scraping much harder than before.

Done a little bit of research and found this app called Feedbro definitely scrapes facebook but dosen't convert the feed to Rss. You can post to your Social media account from within the app It's free so worth a try https://nodetics.com/feedbro/ Let me know what you think or have an alternative...

I haven't looked at Feedbro but it probably leverages the fact that the browser is logged into Facebook whereas rss-bridge is not.

Just checked... it doesn't need to be logged in, also will pull personal profiles into the reader...If only I could find a way to schedule posts it would be perfect as it doesn't spit out true rss feeds but will produce an opml.

Yesterday, I switched my Facebook bridge from public to private in order to see if it will solve this error 500 problem. And it does solve it, in 24 hours ! So maybe there is a solution, where we just limit the number of requests we send to Facebook :

  • by increasing the cache timeout,
  • by applying a rate limit of query per second at the reverse proxy level.

In my nginx configuration, I added the following line to switch the Facebook bridge to private:

location ~ ^.+?\.php(/.*)?$ {

    error_page 420 = @fb_routes;

    if ($args ~ "bridge=Facebook") {
       return 420;
    }
.... normal configuration
}

  location @fb_routes {
     allow my ip;
     allow 192.168.1.1/24;
     allow my other ip;
     deny  all;
.... normal configuration
}

Maybe there is a way to add a rate limitation in the fb_routes, something likes that : https://www.nginx.com/blog/rate-limiting-nginx/.

As suggested, I've increased the default cache timeout on PR #2149 which has helped on my instance

Hey, got an quick Idea!
Facebook has an embed Feature for pages (timelines)
Can't the Bridge just index the Embed for Codes?

How to create one: https://developers.facebook.com/docs/plugins/page-plugin

Good idea @dhuschde , the Facebook page plugin must fetch content regardless of whether the browser is logged in, so it should be suitable for our purposes.

The actual content payload seems to exist within a url beginning as such:
https://www.facebook.com/platform/plugin/tab/renderer/?key=timeline

There are several seemingly dynamic parameters within the url so deconstructing the plugin to obtain the content payload url may not be trivial.

On a really quick check, I was able to extract the name of the Page on a shared Host. (Never worked with the original Bridge)

Preview

URL: https://www.facebook.com/plugins/page.php?href=tommysblogde&tabs=timeline&small_header=true&hide_cover=true&show_facepile=false
Selector: /html/body/div[1]/div/div/div[1]/div/div[1]/div/div/div/div[1]
Title: .//a/@title
Bridge: XPath

Tho, it doesn't seem to get IP Blocked with the Embed...

But I didn't get any Posts, I think this is due to the late loading of them...

Maybe it gets blocked...
with a Bridge, that doesn't cache (or another Page) I can't get the Name...

The Bridge:

<?php

class TestBridge extends XPathAbstract {
    const NAME = 'Test';
    const URI = 'https://test.co';
    const DESCRIPTION = 'Test';
    const MAINTAINER = 'dhuschde';
    const CACHE_TIMEOUT = 1;

    const FEED_SOURCE_URL = 'https://www.facebook.com/plugins/page.php?href=tommysblogde&tabs=timeline&small_header=true&hide_cover=true&show_facepile=false';
    const XPATH_EXPRESSION_ITEM = '/html/body/div[1]/div/div/div[1]/div/div[1]/div/div/div/div[1]';
    const XPATH_EXPRESSION_ITEM_TITLE = './/a/@title';
    const XPATH_EXPRESSION_ITEM_CONTENT = '';
    const XPATH_EXPRESSION_ITEM_URI = '';
    const XPATH_EXPRESSION_ITEM_AUTHOR = '';
    const XPATH_EXPRESSION_ITEM_TIMESTAMP = '';
    const XPATH_EXPRESSION_ITEM_ENCLOSURES = '';
    const SETTING_FIX_ENCODING = false;
}

Preview

By the way, the second URL should be loaded Incognito, if you are logged into Facebook it won't work.

Using XPath, I again was able to fetch the Content of the site...
I used the last URL u send...
However, sadly, this is the Output:

for (;;);{"__ar":1,"payload":null,"jsmods":{"define":[["CurrentEnvironment",[],{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["UriNeedRawQuerySVConfig",[],{"uris":["dms.netmng.com","doubleclick.net","r.msn.com","watchit.sky.com","graphite.instagram.com","www.kfc.co.th","learn.pantheon.io","www.landmarkshops.in","www.ncl.com","s0.wp.com","www.tatacliq.com","bs.serving-sys.com","kohls.com","lazada.co.th","xg4ken.com","technopark.ru","officedepot.com.mx","bestbuy.com.mx","booking.com"]},3871],["CometAltpayJsSdkIframeAllowedDomains",[],{"allowed_domains":["https:\/\/live.adyen.com","https:\/\/integration-facebook.payu.in","https:\/\/facebook.payulatam.com","https:\/\/secure.payu.com","https:\/\/facebook.dlocal.com","https:\/\/buy2.boku.com"]},4920]],"require":[["ServerRedirect","redirectPageTo",[],["\/login\/?next=\u00252F",true,false]]]},"hsrp":{"hsdp":{"gkxData":{"708253":{"result":false,"hash":"AT5n4hBL3YTMnQWtQxU"},"1224637":{"result":false,"hash":"AT7JRluWxuwDm3XzLH8"}}},"hblp":{"sr_revision":1004078792,"consistency":{"rev":1004078792},"rsrcMap":{"Vr89bRr":{"type":"js","src":"https:\/\/www.facebook.com\/rsrc.php\/v3\/ya\/r\/OZcLupMIkEN.js?_nc_x=Ij3Wp8lg5Kz"}}}},"allResources":["Vr89bRr"],"lid":"6981838346867017480"}

The Data:
Bridge: XPath
URL: the one you send
selector: *
description: *

Bridge as Code

<?php

class TestBridge extends XPathAbstract {
    const NAME = 'Test';
    const URI = 'https://test.co';
    const DESCRIPTION = 'FacebookTest';
    const MAINTAINER = 'dhuschde';
    const CACHE_TIMEOUT = 1;

    const FEED_SOURCE_URL = 'https://www.facebook.com/platform/plugin/tab/renderer/?key=timeline&config_json=%7B%22app_id%22%3A%22776730922422337%22%2C%22href%22%3A%22https%3A%2F%2Fwww.facebook.com%2Fbbcnews%2F%22%2C%22width%22%3A340%2C%22height%22%3A500%2C%22has_cta%22%3Atrue%2C%22has_small_header%22%3Afalse%2C%22has_adapt_container_width%22%3Atrue%2C%22has_cover%22%3Atrue%2C%22has_posts%22%3Afalse%2C%22tabs%22%3A%22timeline%22%2C%22can_personalize%22%3Afalse%2C%22is_xfbml%22%3Afalse%2C%22referer_uri%22%3A%22%22%7D&__a=1';
    const XPATH_EXPRESSION_ITEM = '*';
    const XPATH_EXPRESSION_ITEM_TITLE = '';
    const XPATH_EXPRESSION_ITEM_CONTENT = '*';
    const XPATH_EXPRESSION_ITEM_URI = '';
    const XPATH_EXPRESSION_ITEM_AUTHOR = '';
    const XPATH_EXPRESSION_ITEM_TIMESTAMP = '';
    const XPATH_EXPRESSION_ITEM_ENCLOSURES = '';
    const SETTING_FIX_ENCODING = false;
}

Preview

As you can see in the Content, it still has this Login stuff in the Code...

Which means, my Bridge got IP Blocked even with this Payload URL...

@dhuschde if you load my URL in Chrome Incognito, can you see the post content?

I am able to see this in both Incognito and normal, as I don't have an Account:
grafik

But I'm not in the same network as my Bridge. This is hosted by one.com

Can you paste the full contents you see in pastebin?

Thanks, so the browser test works and the posts are visible, ready for scraping.

When I copy-paste the first URL into Chorme, I get this
Capture d’écran 2021-07-06 à 21 06 21
And when I copy-paste the second URL, I get this
Capture d’écran 2021-07-06 à 21 10 33

@Noutladeesse it will not work if you are logged into Facebook.

One other thing of interest, this command returns post contents on my home PC:

curl "https://www.facebook.com/platform/plugin/tab/renderer/?key=timeline&config_json=%7B%22app_id%22%3A%22776730922422337%22%2C%22href%22%3A%22https%3A%2F%2Fwww.facebook.com%2Fbbcnews%2F%22%2C%22width%22%3A340%2C%22height%22%3A500%2C%22has_cta%22%3Atrue%2C%22has_small_header%22%3Afalse%2C%22has_adapt_container_width%22%3Atrue%2C%22has_cover%22%3Atrue%2C%22has_posts%22%3Afalse%2C%22tabs%22%3A%22timeline%22%2C%22can_personalize%22%3Afalse%2C%22is_xfbml%22%3Afalse%2C%22referer_uri%22%3A%22%22%7D&__a=1"

But on two cloud VM's I tried, it did not return the posts, which could indicate some IP-level restrictions.

But on two cloud VM's I tried, it did not return the posts, which could indicate some IP-level restrictions.

That would make sense - Instagram rate-limits/blocks known cloud provider IPs: https://git.sr.ht/~cadence/bibliogram-docs/tree/master/docs/Instagram%20rate%20limits.md

It doesn't get blocked due to the over usage in a network, but by known hosts... (At least it appears to be)

Got redirected to the Login Page (or even completely blocked...)
grafik

But the payload still works...
grafik
Copied to pastebin

So maybe we can use the Tor Network, just like Bibliogram...

Even the embed got blocked...

Blocked embed

The current version of Facebook is working ok on my home IP, albeit with usage limits. I suppose the question then is whether the embed plugin url is more lenient on usage limits than what we have now.

@triatic
The current version of Facebook is working ok on my home IP, albeit with usage limits. I suppose the question then is whether the embed plugin url is more lenient on usage limits than what we have now.

As I showed here, using the Tor Browser, the Embed is as limited as the Front Page.
But the Payload did still work, so the Payload is only limited to known Host IPs. Which Bibliogram did came across using the Tor Network, as I know...

#2047 (comment)
#2047 (comment)

Sometimes it doesn't work on the Tor Network too...
So we need to recognize the Login Redirect code and then reload using another Tor Cycle
grafik

(Maybe the Tor Network is also powered by some known Hosts?)

A new test shows me the same results, I got redirected to Login using facebook.com/bbcnews, but the Payload still worked...
tor-facebook-payload

The problem with switching to Tor is that the bridge could be blocked more often than a "good" IP address (such as the one I use at home).

Ok, but is there an equivalent of Bibliogram for Facebook?

Has anyone considered "Heavy artillery"? That is, selenium, an off-screen X server, and imitating a full-featured login session?

There have at least been some attempts to create one:

https://github.com/apurvmishra99/facebook-scraper-selenium

commented

I cannot access any public facebook page that I tried without being logged into facebook. Did facebook make all pages private?

Did facebook make all pages private?

No, but they are subject to stringent rate-limiting and IP-blocking, and there isn't much we can do about that.

commented

Oh damn. And I use VPN and tried like 10 locations so they are all blocked?! Then it means is so easy to block our server that hosts rss-bridge based on a single IP that we have.

Yes.

commented

Yah...no surprise actually....thx for the info.

I use https://fetchrss.com/prices to get a few feeds once a day, there isn't much choice now.

Feedbro is a RSS chrome extension that can scrape FB "even personal profiles" but not generate RSS feeds so it's a matter of checking manually. It's the best there is that I have found...

Feedbro is a RSS chrome extension that can scrape FB "even personal profiles" but not generate RSS feeds so it's a matter of checking manually. It's the best there is that I have found...

It's pretty useful but it's not really portable. I prefer to have a centralised solution where I can rely on from my laptop or android.

@dvikan Is the Facebook bridge still broken? Is it still without a maintainer? I cannot find the maintainers list anymore here on GitHub.

It was totally broken for me as soon as Facebook instigated an IP address block on cloud computing providers.

It was totally broken for me as soon as Facebook instigated an IP address block on cloud computing providers.

For me too.
I'd like to know if any maintainer has worked on this problem since then (due to the issue management done 10 days ago) and if it makes sense to re-test the bridge or will I face the same issues?

Same here, I initially solved it by hosting the bridge at home. Though a few months after that Facebook migrated the pages I was monitoring to the new design which requires javascript to load, which of course completely broke the bridge. So as the new Facebook site is being rolled out, fewer pages will function and at some point the bridge will be completely borked.

commented

Facebook is one of the hardest to maintain and it also has a few key limiters, even if it's maintained. Public instances almost always run into some form of facebook detection, so it's barely possible to keep it working for the people who run it at home.

Also, because facebook is constantly evolving and doing rolling releases, one feed might work while the other is broken, because fb changed the site underneath.

Is there currently a way to rate-limit locally? That is, prevent my self-hosted bridge from making too many requests to a given site too quickly? I'd probably also need my feed reader to refresh feeds in a random order.

@mdemoss you can increase CACHE_TIMEOUT in the bridge code.