Problems with Facebook on public RSS-Bridge instances

Question

Problems with Facebook on public RSS-Bridge instances

em92 opened this issue 3 years ago · comments

Due to many recent You must be logged in to view this page. This is not supported by RSS-Bridge issues coming from Facebook users (#2041, comments from #2014, #2037) I investigated those issues more clearly.

If I open "https://www.facebook.com/facebook/posts" from my home laptop, everything is fine posts are returned.
If I open "https://www.facebook.com/facebook/posts" from my public instance (https://feed.eugenemolotov.ru), it will return redirect to login page.

Looks like FacebookBridge has the same problems as InstagramBridge (#1891), which breaks using FacebookBridge on public RSS-Bridge instances.

Possible solutions for users (same as in metioned InstagramBridge):

Deploy RSS-Bridge on your personal PC or laptop and use FacebookBridge from there.
Deploy RSS-Bridge on your VPS, make sure that only certain people use it and use FacebookBridge from there.

triatic commented 3 years ago

Yes.

Georgi Ivanov · Answer 1 · Sun Apr 04 2021 20:53:05 GMT+0800 (China Standard Time)

Thank you very much for investigating. I'm using shared hosting for mine and it's folder protected so only I have access, but it probably just checks the IP and since it's shared hosting, it's heavily used. May have to pay for a private IP in that case.

Herminien · Answer 2 · Mon Apr 05 2021 02:10:10 GMT+0800 (China Standard Time)

I installed it on my vps (Infomaniak, Switzerland) and I am the only one using it. Unfortunately it doesn't work either.
I tried to visit a Facebook page with Firefox and it automatically redirects me to the login page.

arcctgx · Answer 3 · Mon Apr 05 2021 22:17:42 GMT+0800 (China Standard Time)

I'm running my single-user RSS Bridge instance on Digital Ocean, and feeds which were giving me error 500 since April 1st just started working again. Let's see for how long...

Edit: stopped working two hours later.

Noutladeesse · Answer 4 · Tue Apr 06 2021 21:36:33 GMT+0800 (China Standard Time)

I am having exactly this problem for 1 week. Over 100 feeds created through Facebook main site bridge result in errors. I am using my personal laptop, so this cannot be the reason.
Yesterday, 3 feeds (out of the 100+) delivered lots of previous missed articles ; today, only 1 out of 100+ is working. Looks like it is random and erratic.
I am using them to deliver a daily news digest, it's been 1 week I cannot do it properly and need to check all sources 1 by 1. It is not efficient and time-consuming. What should I do?

tstanbur · Answer 5 · Tue Apr 06 2021 23:38:40 GMT+0800 (China Standard Time)

If I open "https://www.facebook.com/facebook/posts" from my home laptop, everything is fine posts are returned.
If I open "https://www.facebook.com/facebook/posts" from my public instance (https://feed.eugenemolotov.ru), it will return redirect to login page.

Hi @em92 ,

If you remove the /posts part of the url then you don't get the login page show, even on a public instance.

eg

https://www.facebook.com/facebook/posts (redirect to login page)

https://www.facebook.com/facebook (no redirect, page content shown).

Did you try that?

Deleted user · Answer 6 · Wed Apr 07 2021 19:58:22 GMT+0800 (China Standard Time)

@tstanbur I have the same problem than @Noutladeesse I dont understand how I can modify https://www.facebook.com/facebook/posts to https://www.facebook.com/facebook, I have an rss feed without facebook inside.

tstanbur · Answer 7 · Wed Apr 07 2021 20:05:52 GMT+0800 (China Standard Time)

@tstanbur I have the same problem than @Noutladeesse I dont understand how I can modify https://www.facebook.com/facebook/posts to https://www.facebook.com/facebook, I have an rss feed without facebook inside.

I have the same issue too!

I was just trying to help fix it, hopefully @em92 can (I think he's the author?)

Deleted user · Answer 8 · Wed Apr 07 2021 20:10:19 GMT+0800 (China Standard Time)

@tstanbur understand, my english is too poor. :)

Noutladeesse · Answer 9 · Wed Apr 07 2021 22:50:27 GMT+0800 (China Standard Time)

@tstanbur understand, my english is too poor. :)

@cborne : @tstanbur a le même problème que nous bien que ses RSS feeds ne soient pas de feeds de Facebook, il demande si @em92 est l'auteur et s'il peut nous aider à résoudre le problème (je traduis !)

Deleted user · Answer 10 · Wed Apr 07 2021 22:52:57 GMT+0800 (China Standard Time)

@Noutladeesse merci j'ai fini par comprendre par la suite, au départ je ne comprenais pas ce que faisaient les urls en facebook au milieu mais il s'agit d'une proposition de correction pour @em92. L'anglais c'est pas vraiment comme le vélo, quand tu le pratiques pas ça revient pas tout seul. :)

Noutladeesse · Answer 11 · Wed Apr 07 2021 23:01:14 GMT+0800 (China Standard Time)

@Noutladeesse merci j'ai fini par comprendre par la suite, au départ je ne comprenais pas ce que faisaient les urls en facebook au milieu mais il s'agit d'une proposition de correction pour @em92. L'anglais c'est pas vraiment comme le vélo, quand tu le pratiques pas ça revient pas tout seul. :)

:-D
Oui c'est une proposition de correction, mais ça ne marche pas pour les feeds déjà créés.

Wojtek · Answer 12 · Thu Apr 08 2021 13:32:00 GMT+0800 (China Standard Time)

Just another small "me too". I'm running RSS-Bridge on my personal VPS (only user) since a long while (~2 years) and I'm also affected by the issue. It started about 1-2 weeks ago, then it started working on Monday and was ok for about 2 days and now it stopped again.

It does seem like an Facebook action to block RSS-Bridge (probably with their silly reasoning that this would somehow make the people go back to using their awful service…)

RealDutchie · Answer 13 · Fri Apr 09 2021 02:01:06 GMT+0800 (China Standard Time)

Here just one more ''me too''. I specifically signed up here on Github to ask a few things about the Facebook bridge. Until last week, I had been using a public host from Eugene Molotov to my full satisfaction for about a year (thanks a lot). I don't have any technical background, so it is sometimes difficult for me to be able to keep up with all the terms that come up with this topic here.

I wonder if the above and below option mentioned by em92 still works and how I could get it running on my own PC:

Deploy RSS-Bridge on your personal PC or laptop and use FacebookBridge from there.

I would be very happy if I could still use the Facebook Bridge in this way, but I am not sure if this still works and how to install it on my own PC. I have looked through github quite a bit, but unfortunately I can't figure it out myself, which is why I decided to sign up.

If users could confirm or deny that this feature still works, I would be happy with that. Then my next question would be how I can best put the bridge on my own PC or who I can ask for help or get information how to do so. I also think it would be a very good idea to start a donation fund to get a developer to maintain the facebook bridge and make also the instagram bridge work again. That way, we can all contribute to get our beloved feeds going again. Greetings from the Netherlands and thanks for your great work over the years!

hellmachine2000 · Answer 14 · Fri Apr 09 2021 06:11:12 GMT+0800 (China Standard Time)

Same here. Since April I got different errors in the same Feeds, like:

"Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: `You must be logged in to view this page. This is not supported by RSS-Bridge."

"Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: `The requested resource cannot be found!"

"Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: Call to a member function children() on null
Query string: action=display&bridge=Facebook&u=hyperlitemountaingear&media_type=all&limit=1000&format=Atom
Version: dev.2020-11-10"
Latest version of RSS-Bridge…

Deleted user · Answer 15 · Fri Apr 09 2021 10:08:36 GMT+0800 (China Standard Time)

I've been having these errors as well and I found that changing the cache_timeout parameter in FacebookBridge seems to reset the bridge, but it only works for a little while. I've tried 86400, 43200, 21600, 1, 0, and even eliminating the parameter. Somehow resetting the cache every time the bridge is called might be the solution to this problem?

Noutladeesse · Answer 16 · Fri Apr 09 2021 15:26:50 GMT+0800 (China Standard Time)

I've been having these errors as well and I found that changing the cache_timeout parameter in FacebookBridge seems to reset the bridge, but it only works for a little while. I've tried 86400, 43200, 21600, 1, 0, and even eliminating the parameter. Somehow resetting the cache every time the bridge is called might be the solution to this problem?

Thank you for suggesting @Mthmgcn05
How do you reset the cache? (I am not an IT professional, only a user)

Deleted user · Answer 17 · Fri Apr 09 2021 20:00:26 GMT+0800 (China Standard Time)

After more testing and thought, it may be every time I redeployed, it worked for five minutes, so that could have been resetting it.

miwcz · Answer 18 · Sat Apr 10 2021 05:40:29 GMT+0800 (China Standard Time)

It's seems that adding cookie "c_user=XXXX" where XXXX is my ID from Facebook cookie helped. I don't know how to add this only via Bridge, so I did it via contents.php for all requests, which is really bad, but... maybe it's the way for better solution :-)

EDIT: False alarm, not working again...

Eugene Molotov · Answer 19 · Sat Apr 10 2021 16:21:44 GMT+0800 (China Standard Time)

@miwcz on my public instance I used c_user and xs values. Quick and dirty patch looks like this:

diff --git a/bridges/FacebookBridge.php b/bridges/FacebookBridge.php
index c03de4e..fafeabd 100644
--- a/bridges/FacebookBridge.php
+++ b/bridges/FacebookBridge.php
@@ -174,6 +174,8 @@ class FacebookBridge extends BridgeAbstract {
 		} else {
 			$header = array();
 		}
+		$header[] = 'Cookie: c_user=xxxx; xs=yyyy;';
+
 
 		$touchURI = str_replace(
 			'https://www.facebook',
@@ -560,11 +562,15 @@ EOD;
 				$header = array();
 			}
 
+			$header[] = 'Cookie: c_user=xxxx; xs=yyyy;';
+
+
 			$html = getSimpleHTMLDOM($this->getURI(), $header)
 				or returnServerError('No results for this query.');
 
 		}
 
 		// Handle captcha form?
 		$captcha = $html->find('div.captcha_interstitial', 0);

So far, so good.

Eugene Molotov · Answer 20 · Sat Apr 10 2021 16:23:44 GMT+0800 (China Standard Time)

So far, so good.

I meant it is working on my instance at the moment.

Eugene Molotov · Answer 21 · Sat Apr 10 2021 16:45:12 GMT+0800 (China Standard Time)

@tstanbur

hopefully @em92 can (I think he's the author?)

I am not author of this bridge. I maintain RSS-Bridge in general (reviewing pull requests, pinging bridge maintainers in issues) and bridges for Pikabu and Vk.

Usually maintainer of the bridge does fix bugs, but we don't have maintainer for Facebook bridge. I have little time to fix bugs in bridges, that I don't maintain.

miwcz · Answer 22 · Sat Apr 10 2021 19:25:17 GMT+0800 (China Standard Time)

I have 20+ facebook feeds and this is working only for 4-5 first requests. It seems that facebook is blocking mutliple requests after short while.

Noutladeesse · Answer 23 · Sun Apr 11 2021 03:58:02 GMT+0800 (China Standard Time)

@tstanbur

hopefully @em92 can (I think he's the author?)

I am not author of this bridge. I maintain RSS-Bridge in general (reviewing pull requests, pinging bridge maintainers in issues) and bridges for Pikabu and Vk.

Usually maintainer of the bridge does fix bugs, but we don't have maintainer for Facebook bridge. I have little time to fix bugs in bridges, that I don't maintain.

Is there anyone who maintains Facebook bridge? @em92

Eugene Molotov · Answer 24 · Sun Apr 11 2021 09:39:26 GMT+0800 (China Standard Time)

I meant it is working on my instance at the moment.

Now it does not. Facebook disabled my account 'cos my account violates it's community standards. It pursuaded me to upload my photo (I did it, the real photo of me) and now I am waiting for reviewing.

Eugene Molotov · Answer 25 · Sun Apr 11 2021 09:43:04 GMT+0800 (China Standard Time)

@Noutladeesse

Is there anyone who maintains Facebook bridge?

No.

pin-grid-array · Answer 26 · Mon Apr 12 2021 01:32:48 GMT+0800 (China Standard Time)

I don't have any new information to add that other users haven't already discussed. I'm only here to say that it is happening to me too. I am running FB Bridge on Heroku and using Feedly to save the feeds. I started getting Bridge returned error 500! around the beginning of April.

Some feeds only get the error occasionally. Other feeds keep getting the error constantly, which makes those feeds useless.

Example error message:

Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: `You must be logged in to view this page. This is not supported by RSS-Bridge.`
Query string: `action=display&bridge=Facebook&context=User&u=[REDACTED]&media_type=all&limit=-1&format=Atom`
Version: `dev.2020-02-26`

    Press Return to check your input parameters
    Press F5 to retry
    Check if this issue was already reported on GitHub (give it a thumbs-up)
    Open a GitHub Issue if this error persists

teromene, logmanoriginal

Eugene Molotov · Answer 27 · Mon Apr 12 2021 13:57:24 GMT+0800 (China Standard Time)

Here is final of my story, where I tried to make FacebookBridge work in my public instance (https://feed.eugenemolotov.ru) using my account with real phone number and patch from #2047 (comment).

I didn't read their paper with title "Community standards", but it looks like it sharing posts via RSS does not follow it.

Wojtek · Answer 28 · Mon Apr 12 2021 15:39:34 GMT+0800 (China Standard Time)

It most likely boils down to "overusing API" or "harvesting data". Which just a tad silly.

For me using RSS-Bridge (and RSS in general) it to avoid having facebook account and be able to follow some websites that for some twisted reason are being present only there...

Deleted user · Answer 29 · Mon Apr 12 2021 23:13:39 GMT+0800 (China Standard Time)

So I think I may have possibly discovered the issue.

I have put rss-bridge on my own server on my computer and realized after putting in the switch for debugging that Facebook responds in two different ways in the url header:

https://www.facebook.com/login/?next=https%3A%2F%2Fwww.facebook.com%2FXXXXXXXXXXXXX%2Fposts

and

https://www.facebook.com/XXXXXXXXXXXXX/posts?_fb_noscript=1

where XXXXXXXXXXXXX is the page id and the first gives the error:

'You must be logged in to view this page. This is not supported by RSS-Bridge.'

The second response gives the posts as requested.

I'm no programmer, but if one could clean up the first response to make it look like the second and then continue onto the rest of the code, I think the bridge would work.

Eugene Molotov · Answer 30 · Tue Apr 13 2021 01:41:40 GMT+0800 (China Standard Time)

@Mthmgcn05 I already mentioned this facebook behavior in first message of this issue.

Deleted user · Answer 31 · Tue Apr 13 2021 01:46:58 GMT+0800 (China Standard Time)

Ha! It's been so long since I looked at the beginning, I forgot where we began.

10362227 · Answer 32 · Tue Apr 13 2021 11:37:54 GMT+0800 (China Standard Time)

timeline RSS works fine, it already run 1 week, refresh every 6 minutes. maximal use the resource

pin-grid-array · Answer 33 · Tue Apr 13 2021 16:52:06 GMT+0800 (China Standard Time)

timeline RSS works fine, it already run 1 week, refresh every 6 minutes. maximal use the resource

@10362227 Could you explain further? How do you get it to work without errors? Is it because you are running it on your home computer?

Deleted user · Answer 34 · Wed Apr 14 2021 01:05:29 GMT+0800 (China Standard Time)

After redeploying changes to figure out what works, Heroku, which is what I use, gives RSS-Bridge a new IP address, and each time that works for about five minutes. I use Inoreader to request updates and this requests so many times that it gets blocked again. I think there needs to be a system of queuing requests within a certain time frame that doesn't get the IP address blocked. And this may fix other bridges if we can figure out how often is too often and how to queue requests. At https://developers.facebook.com/docs/graph-api/overview/rate-limiting/ there is a description of rate limiting of 200 calls per hour and that's using Facebook's own API. I'm sure if the rate of requests exceeds that for so long a time, an IP address gets blocked. That's one every 18 seconds.

10362227 · Answer 35 · Wed Apr 14 2021 01:29:51 GMT+0800 (China Standard Time)

i wrote a simple script for myself, it grabs FB homepage timeline (https://www.facebook.com/?sk=h_chr). but you need an account
first, then follow some people

Florent V. · Answer 36 · Sun Apr 18 2021 05:39:03 GMT+0800 (China Standard Time)

Hi,

I did the changes mentionned in #2047 (comment)

but nothing works and error occured at line 588 because the array is empty ($html->find('#pagelet_timeline_main_column') returns empty array).

$element = $html
                ->find('#pagelet_timeline_main_column')[0]
                ->children(0)
                ->children(0)
                ->next_sibling()
                ->children(0);

Any ideas why?

What I tried as well is to echo $html just before and commenting the returnServerError.

So like this:

if($loginForm != null) {
    //returnServerError('You must be logged in to view this page. This is not$
}

echo $html;

$element = $html
                ->find('#pagelet_timeline_main_column')[0]
                ->children(0)
                ->children(0)
                ->next_sibling()
                ->children(0);

And I got a login page when going to the rss bridge html view:
https://mydomain/?action=display&bridge=Facebook&context=User&u=fondationtaraocean&media_type=all&limit=-1&format=Html

Eugene Molotov · Answer 37 · Mon Apr 19 2021 20:05:32 GMT+0800 (China Standard Time)

@floviolleau, I am not speaking French, but I think you need to accept Facebook's new terms of usage

Florent V. · Answer 38 · Mon Apr 19 2021 20:21:27 GMT+0800 (China Standard Time)

Yes but I did an echo so this page is rendered inside rss bridge html view and it is not going working like this 😉. Just to say that for whatever reason, it is asking to login and the problem for me is intermittent.

Wojtek · Answer 39 · Mon Apr 19 2021 20:38:03 GMT+0800 (China Standard Time)

The issue is not with the cookies per se but with FB blocking crawling. I just ran a local instance on my machine - I was able to open the /posts endpoint in any browser, then launched RRS-Bridge, put it to crawl the page every couple of seconds and to no surprise, i got the error after a couple of minutes and then I was blocking me constantly and nagging to log-in.

Logging-in (and using session cookie) won't work in the long term because because FB will just block your account like it did in @em92 case.

The only possibility I see is try to extract posts from main page (i.e. https://www.facebook.com/XXXXXXXXXXXXX/?_fb_noscript=1 instead of https://www.facebook.com/XXXXXXXXXXXXX/posts?_fb_noscript=1) but that has the following issues:

main page is a mess of everything and only display 2 posts on initial load
you have to scroll down to load regular posts feed, but I'm not sure how to handle that here.

I'd say that the situation is quite dire. For me, I just looked for the interesting pages on different sites (twitter usually, as quite often they use same tool to push to various "social media") and if some weren't present elsewhere I nagged the admins to be present.

I keep my fingers crossed that maybe the EU will somehow force those platforms via legislation to be interoperable instead of darn walled-gardens :|

Eugene Molotov · Answer 40 · Mon Apr 19 2021 20:54:13 GMT+0800 (China Standard Time)

The only possibility I see is try to extract posts from main page

Tried on my host. It does redirect to login page.

Eugene Molotov · Answer 41 · Tue Apr 20 2021 00:43:09 GMT+0800 (China Standard Time)

~~Another idea is implementing bridge using Facebook's API. For example using this: https://developers.facebook.com/docs/graph-api/reference/post~~

But for that, you need to find someone who will implement it. Previous maintainers of FacebookBridge lost motivation or time to maintain it. I have written some ideas about donating here #2063, but we need to discuss it first.

Edit: See #2047 (comment)

Simarilius · Answer 42 · Tue Apr 20 2021 01:19:27 GMT+0800 (China Standard Time)

I don't think the Api would help.
https://developers.facebook.com/docs/apps/features-reference/page-public-content-access

"This permission or feature is only available with business verification. You may also need to sign additional contracts before your app can access data. Learn More."

So every Bridge has to get verification, as i don't think that you can deploy the tokens of your account ;)

Eugene Molotov · Answer 43 · Tue Apr 20 2021 01:31:44 GMT+0800 (China Standard Time)

"This permission or feature is only available with business verification. You may also need to sign additional contracts before your app can access data. Learn More."

Found it here: https://developers.facebook.com/docs/apps/features-reference/page-public-content-access

Thanks for mentioning it @simarilius

Bull · Answer 44 · Thu Apr 22 2021 12:06:57 GMT+0800 (China Standard Time)

This is huge problem for many - I've decided to use Python and it works. Facebook are trying to prevent bots and scrapers. On Python you actually sign into your personal facebook account by using Selenium (headless browser) and from there you scrape whatever you need

Noutladeesse · Answer 45 · Thu Apr 22 2021 22:53:12 GMT+0800 (China Standard Time)

Thank you @bull for suggesting.Can you elaborate a little bit? I am not a programmer but is there anything I can do through Python to solve the bridge problem? Le jeudi 22 avril 2021 à 07:07:14 UTC+3, Bull ***@***.***> a écrit : This is huge problem for many - I've decided to use Python and it works. Facebook are trying to prevent bots and scrapers. On Python you actually sign into your personal facebook account by using Selenium (headless browser) and from there you scrape whatever you need — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Deleted user · Answer 46 · Thu Apr 22 2021 23:18:53 GMT+0800 (China Standard Time)

I entered my Facebook cookie into the bridge and that does work to get to the right page; however, the page is under the new design and this bridge is not configured to gather information from the new design. So there's two possible solutions: either configure some rate limit and caching requests so the login screen doesn't appear without the need to have an account or the bridge needs to be reconfigured to gather information from the new design using the cookie method to bypass the login screen.

Frankytyrone · Answer 47 · Fri Apr 23 2021 14:28:54 GMT+0800 (China Standard Time)

Hi,
I have been having the same issue for over a month so I deployed RSS Bridge on the Docker application https://hub.docker.com/r/rssbridge/rss-bridge Downloaded Docker copied the rssbridge pull command docker and now it's working perfectly - https://docs.docker.com/get-started/

Hope that helps

Wojtek · Answer 48 · Fri Apr 23 2021 18:12:58 GMT+0800 (China Standard Time)

I have RSS-bridge on my own deployment (docker doesn't matter here much) and the issue happens from time to time still...

Frankytyrone · Answer 49 · Mon Apr 26 2021 21:19:10 GMT+0800 (China Standard Time)

Anyone got a fix for the facebook Rss problem?

triatic · Answer 50 · Mon Apr 26 2021 23:40:39 GMT+0800 (China Standard Time)

Anyone got a fix for the facebook Rss problem?

No, Facebook have intentionally made scraping much harder than before.

Frankytyrone · Answer 51 · Tue Apr 27 2021 17:10:55 GMT+0800 (China Standard Time)

Done a little bit of research and found this app called Feedbro definitely scrapes facebook but dosen't convert the feed to Rss. You can post to your Social media account from within the app It's free so worth a try https://nodetics.com/feedbro/ Let me know what you think or have an alternative...

triatic · Answer 52 · Tue Apr 27 2021 19:52:33 GMT+0800 (China Standard Time)

I haven't looked at Feedbro but it probably leverages the fact that the browser is logged into Facebook whereas rss-bridge is not.

Frankytyrone · Answer 53 · Tue Apr 27 2021 20:34:37 GMT+0800 (China Standard Time)

Just checked... it doesn't need to be logged in, also will pull personal profiles into the reader...If only I could find a way to schedule posts it would be perfect as it doesn't spit out true rss feeds but will produce an opml.

la Bécasse · Answer 54 · Tue Jun 01 2021 00:36:55 GMT+0800 (China Standard Time)

Yesterday, I switched my Facebook bridge from public to private in order to see if it will solve this error 500 problem. And it does solve it, in 24 hours ! So maybe there is a solution, where we just limit the number of requests we send to Facebook :

by increasing the cache timeout,
by applying a rate limit of query per second at the reverse proxy level.

In my nginx configuration, I added the following line to switch the Facebook bridge to private:

location ~ ^.+?\.php(/.*)?$ {

    error_page 420 = @fb_routes;

    if ($args ~ "bridge=Facebook") {
       return 420;
    }
.... normal configuration
}

  location @fb_routes {
     allow my ip;
     allow 192.168.1.1/24;
     allow my other ip;
     deny  all;
.... normal configuration
}

Maybe there is a way to add a rate limitation in the fb_routes, something likes that : https://www.nginx.com/blog/rate-limiting-nginx/.

triatic · Answer 55 · Mon Jun 07 2021 06:47:39 GMT+0800 (China Standard Time)

As suggested, I've increased the default cache timeout on PR #2149 which has helped on my instance

D5k H3h · Answer 56 · Tue Jul 06 2021 06:46:27 GMT+0800 (China Standard Time)

Hey, got an quick Idea!
Facebook has an embed Feature for pages (timelines)
Can't the Bridge just index the Embed for Codes?

How to create one: https://developers.facebook.com/docs/plugins/page-plugin

D5k H3h · Answer 57 · Tue Jul 06 2021 07:09:36 GMT+0800 (China Standard Time)

The Domain would look like this:

https://www.facebook.com/plugins/page.php?href=[page_slug]&tabs=timeline&small_header=true&hide_cover=true&show_facepile=false

triatic · Answer 58 · Tue Jul 06 2021 07:35:15 GMT+0800 (China Standard Time)

Good idea @dhuschde , the Facebook page plugin must fetch content regardless of whether the browser is logged in, so it should be suitable for our purposes.

The actual content payload seems to exist within a url beginning as such:
https://www.facebook.com/platform/plugin/tab/renderer/?key=timeline

There are several seemingly dynamic parameters within the url so deconstructing the plugin to obtain the content payload url may not be trivial.

D5k H3h · Answer 59 · Tue Jul 06 2021 10:24:36 GMT+0800 (China Standard Time)

On a really quick check, I was able to extract the name of the Page on a shared Host. (Never worked with the original Bridge)

Preview

URL: https://www.facebook.com/plugins/page.php?href=tommysblogde&tabs=timeline&small_header=true&hide_cover=true&show_facepile=false
Selector: /html/body/div[1]/div/div/div[1]/div/div[1]/div/div/div/div[1]
Title: .//a/@title
Bridge: XPath

Tho, it doesn't seem to get IP Blocked with the Embed...

But I didn't get any Posts, I think this is due to the late loading of them...

D5k H3h · Answer 60 · Tue Jul 06 2021 10:32:22 GMT+0800 (China Standard Time)

Maybe it gets blocked...
with a Bridge, that doesn't cache (or another Page) I can't get the Name...

The Bridge:

<?php

class TestBridge extends XPathAbstract {
    const NAME = 'Test';
    const URI = 'https://test.co';
    const DESCRIPTION = 'Test';
    const MAINTAINER = 'dhuschde';
    const CACHE_TIMEOUT = 1;

    const FEED_SOURCE_URL = 'https://www.facebook.com/plugins/page.php?href=tommysblogde&tabs=timeline&small_header=true&hide_cover=true&show_facepile=false';
    const XPATH_EXPRESSION_ITEM = '/html/body/div[1]/div/div/div[1]/div/div[1]/div/div/div/div[1]';
    const XPATH_EXPRESSION_ITEM_TITLE = './/a/@title';
    const XPATH_EXPRESSION_ITEM_CONTENT = '';
    const XPATH_EXPRESSION_ITEM_URI = '';
    const XPATH_EXPRESSION_ITEM_AUTHOR = '';
    const XPATH_EXPRESSION_ITEM_TIMESTAMP = '';
    const XPATH_EXPRESSION_ITEM_ENCLOSURES = '';
    const SETTING_FIX_ENCODING = false;
}

Preview

triatic · Answer 61 · Tue Jul 06 2021 21:34:32 GMT+0800 (China Standard Time)

One example is here:

https://www.facebook.com/plugins/page.php?href=https%3A%2F%2Fwww.facebook.com%2Fbbcnews%2F&tabs=timeline&width=340&height=500&small_header=false&adapt_container_width=true&hide_cover=false&show_facepile=true&appId

The payload url is here:

https://www.facebook.com/platform/plugin/tab/renderer/?key=timeline&config_json=%7B%22app_id%22%3A%22776730922422337%22%2C%22href%22%3A%22https%3A%2F%2Fwww.facebook.com%2Fbbcnews%2F%22%2C%22width%22%3A340%2C%22height%22%3A500%2C%22has_cta%22%3Atrue%2C%22has_small_header%22%3Afalse%2C%22has_adapt_container_width%22%3Atrue%2C%22has_cover%22%3Atrue%2C%22has_posts%22%3Afalse%2C%22tabs%22%3A%22timeline%22%2C%22can_personalize%22%3Afalse%2C%22is_xfbml%22%3Afalse%2C%22referer_uri%22%3A%22%22%7D&fb_dtsg_ag&__user=0&__a=1&__dyn=7xeUmxa13xu1syUbAihwRwqo98nwgU5Gex-ewSwMwNw8OdwJwvHwdK4o4O0C82Vwb-q1ewcG0KEswaq1xwEwlU-0nSUS1vwqUcE7e2l2Utw6awZwaOfxW0D83mwkE5G0zE5W0HU1jo6iazo&__csr=&__req=1&__hs=18814.PHASED%3Aplugin_default_pkg.2.0.0.0&dpr=1&__ccg=EXCELLENT&__rev=1004078508&__s=%3A%3Ashrp5t&__hsi=6981805842735414406&__comet_req=0&__sp=1

We need to work out how the Javascript arrives at the second url via the code from the first.

triatic · Answer 62 · Tue Jul 06 2021 21:35:22 GMT+0800 (China Standard Time)

By the way, the second URL should be loaded Incognito, if you are logged into Facebook it won't work.

triatic · Answer 63 · Tue Jul 06 2021 21:47:08 GMT+0800 (China Standard Time)

I've minimised the second URL to just three parameters, and it still loads the payload:

https://www.facebook.com/platform/plugin/tab/renderer/?key=timeline&config_json=%7B%22app_id%22%3A%22776730922422337%22%2C%22href%22%3A%22https%3A%2F%2Fwww.facebook.com%2Fbbcnews%2F%22%2C%22width%22%3A340%2C%22height%22%3A500%2C%22has_cta%22%3Atrue%2C%22has_small_header%22%3Afalse%2C%22has_adapt_container_width%22%3Atrue%2C%22has_cover%22%3Atrue%2C%22has_posts%22%3Afalse%2C%22tabs%22%3A%22timeline%22%2C%22can_personalize%22%3Afalse%2C%22is_xfbml%22%3Afalse%2C%22referer_uri%22%3A%22%22%7D&__a=1

Simply changing the "bbcnews" in that URL will fetch any page, we just need to parse the data. I think we are back in business!

D5k H3h · Answer 64 · Tue Jul 06 2021 23:44:30 GMT+0800 (China Standard Time)

Using XPath, I again was able to fetch the Content of the site...
I used the last URL u send...
However, sadly, this is the Output:

for (;;);{"__ar":1,"payload":null,"jsmods":{"define":[["CurrentEnvironment",[],{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["UriNeedRawQuerySVConfig",[],{"uris":["dms.netmng.com","doubleclick.net","r.msn.com","watchit.sky.com","graphite.instagram.com","www.kfc.co.th","learn.pantheon.io","www.landmarkshops.in","www.ncl.com","s0.wp.com","www.tatacliq.com","bs.serving-sys.com","kohls.com","lazada.co.th","xg4ken.com","technopark.ru","officedepot.com.mx","bestbuy.com.mx","booking.com"]},3871],["CometAltpayJsSdkIframeAllowedDomains",[],{"allowed_domains":["https:\/\/live.adyen.com","https:\/\/integration-facebook.payu.in","https:\/\/facebook.payulatam.com","https:\/\/secure.payu.com","https:\/\/facebook.dlocal.com","https:\/\/buy2.boku.com"]},4920]],"require":[["ServerRedirect","redirectPageTo",[],["\/login\/?next=\u00252F",true,false]]]},"hsrp":{"hsdp":{"gkxData":{"708253":{"result":false,"hash":"AT5n4hBL3YTMnQWtQxU"},"1224637":{"result":false,"hash":"AT7JRluWxuwDm3XzLH8"}}},"hblp":{"sr_revision":1004078792,"consistency":{"rev":1004078792},"rsrcMap":{"Vr89bRr":{"type":"js","src":"https:\/\/www.facebook.com\/rsrc.php\/v3\/ya\/r\/OZcLupMIkEN.js?_nc_x=Ij3Wp8lg5Kz"}}}},"allResources":["Vr89bRr"],"lid":"6981838346867017480"}

The Data:
Bridge: XPath
URL: the one you send
selector: *
description: *

Bridge as Code

<?php

class TestBridge extends XPathAbstract {
    const NAME = 'Test';
    const URI = 'https://test.co';
    const DESCRIPTION = 'FacebookTest';
    const MAINTAINER = 'dhuschde';
    const CACHE_TIMEOUT = 1;

    const FEED_SOURCE_URL = 'https://www.facebook.com/platform/plugin/tab/renderer/?key=timeline&config_json=%7B%22app_id%22%3A%22776730922422337%22%2C%22href%22%3A%22https%3A%2F%2Fwww.facebook.com%2Fbbcnews%2F%22%2C%22width%22%3A340%2C%22height%22%3A500%2C%22has_cta%22%3Atrue%2C%22has_small_header%22%3Afalse%2C%22has_adapt_container_width%22%3Atrue%2C%22has_cover%22%3Atrue%2C%22has_posts%22%3Afalse%2C%22tabs%22%3A%22timeline%22%2C%22can_personalize%22%3Afalse%2C%22is_xfbml%22%3Afalse%2C%22referer_uri%22%3A%22%22%7D&__a=1';
    const XPATH_EXPRESSION_ITEM = '*';
    const XPATH_EXPRESSION_ITEM_TITLE = '';
    const XPATH_EXPRESSION_ITEM_CONTENT = '*';
    const XPATH_EXPRESSION_ITEM_URI = '';
    const XPATH_EXPRESSION_ITEM_AUTHOR = '';
    const XPATH_EXPRESSION_ITEM_TIMESTAMP = '';
    const XPATH_EXPRESSION_ITEM_ENCLOSURES = '';
    const SETTING_FIX_ENCODING = false;
}

Preview

D5k H3h · Answer 65 · Tue Jul 06 2021 23:45:40 GMT+0800 (China Standard Time)

As you can see in the Content, it still has this Login stuff in the Code...

Which means, my Bridge got IP Blocked even with this Payload URL...

triatic · Answer 66 · Wed Jul 07 2021 01:02:39 GMT+0800 (China Standard Time)

@dhuschde if you load my URL in Chrome Incognito, can you see the post content?

D5k H3h · Answer 67 · Wed Jul 07 2021 01:20:10 GMT+0800 (China Standard Time)

I am able to see this in both Incognito and normal, as I don't have an Account:

But I'm not in the same network as my Bridge. This is hosted by one.com

triatic · Answer 68 · Wed Jul 07 2021 01:28:01 GMT+0800 (China Standard Time)

Can you paste the full contents you see in pastebin?

D5k H3h · Answer 69 · Wed Jul 07 2021 02:05:30 GMT+0800 (China Standard Time)

From incognito:
https://bin.snopyta.org/?2c6a8a8bc912358d#4iEZPn65NTn9x8mMhqLQeCSfsbdyazVjWKBJd6tkJ3dH
From Bridge:
https://bin.snopyta.org/?1e08d688f8144c83#DdX2jfuA64729XnZWP73hLAM3Lt4rfLz714QwGqcFon5
From Normal (non-Incognito):
https://bin.snopyta.org/?5649c9cd08d81c0a#9XDm2WwKBbhMt7RemdDGgW7qPsnUTYs1eQZEjDdrR31t

I'm using Firefox on the newest Ubuntu version, if you need that too...

triatic · Answer 70 · Wed Jul 07 2021 02:11:08 GMT+0800 (China Standard Time)

Thanks, so the browser test works and the posts are visible, ready for scraping.

Noutladeesse · Answer 71 · Wed Jul 07 2021 02:12:33 GMT+0800 (China Standard Time)

When I copy-paste the first URL into Chorme, I get this

And when I copy-paste the second URL, I get this

triatic · Answer 72 · Wed Jul 07 2021 02:14:47 GMT+0800 (China Standard Time)

@Noutladeesse it will not work if you are logged into Facebook.

triatic · Answer 73 · Wed Jul 07 2021 02:54:55 GMT+0800 (China Standard Time)

One other thing of interest, this command returns post contents on my home PC:

curl "https://www.facebook.com/platform/plugin/tab/renderer/?key=timeline&config_json=%7B%22app_id%22%3A%22776730922422337%22%2C%22href%22%3A%22https%3A%2F%2Fwww.facebook.com%2Fbbcnews%2F%22%2C%22width%22%3A340%2C%22height%22%3A500%2C%22has_cta%22%3Atrue%2C%22has_small_header%22%3Afalse%2C%22has_adapt_container_width%22%3Atrue%2C%22has_cover%22%3Atrue%2C%22has_posts%22%3Afalse%2C%22tabs%22%3A%22timeline%22%2C%22can_personalize%22%3Afalse%2C%22is_xfbml%22%3Afalse%2C%22referer_uri%22%3A%22%22%7D&__a=1"

But on two cloud VM's I tried, it did not return the posts, which could indicate some IP-level restrictions.

bdutro · Answer 74 · Wed Jul 07 2021 03:05:53 GMT+0800 (China Standard Time)

But on two cloud VM's I tried, it did not return the posts, which could indicate some IP-level restrictions.

That would make sense - Instagram rate-limits/blocks known cloud provider IPs: https://git.sr.ht/~cadence/bibliogram-docs/tree/master/docs/Instagram%20rate%20limits.md

D5k H3h · Answer 75 · Wed Jul 07 2021 05:26:52 GMT+0800 (China Standard Time)

It doesn't get blocked due to the over usage in a network, but by known hosts... (At least it appears to be)

Got redirected to the Login Page (or even completely blocked...)

But the payload still works...

Copied to pastebin

So maybe we can use the Tor Network, just like Bibliogram...

D5k H3h · Answer 76 · Wed Jul 07 2021 05:33:43 GMT+0800 (China Standard Time)

Even the embed got blocked...

triatic · Answer 77 · Wed Jul 07 2021 23:19:32 GMT+0800 (China Standard Time)

The current version of Facebook is working ok on my home IP, albeit with usage limits. I suppose the question then is whether the embed plugin url is more lenient on usage limits than what we have now.

D5k H3h · Answer 78 · Sun Jul 11 2021 23:08:59 GMT+0800 (China Standard Time)

@triatic
The current version of Facebook is working ok on my home IP, albeit with usage limits. I suppose the question then is whether the embed plugin url is more lenient on usage limits than what we have now.

As I showed here, using the Tor Browser, the Embed is as limited as the Front Page.
But the Payload did still work, so the Payload is only limited to known Host IPs. Which Bibliogram did came across using the Tor Network, as I know...

#2047 (comment)
#2047 (comment)

D5k H3h · Answer 79 · Sun Jul 11 2021 23:14:37 GMT+0800 (China Standard Time)

Sometimes it doesn't work on the Tor Network too...
So we need to recognize the Login Redirect code and then reload using another Tor Cycle

(Maybe the Tor Network is also powered by some known Hosts?)

D5k H3h · Answer 80 · Sun Jul 11 2021 23:22:31 GMT+0800 (China Standard Time)

A new test shows me the same results, I got redirected to Login using facebook.com/bbcnews, but the Payload still worked...

triatic · Answer 81 · Mon Jul 12 2021 00:46:42 GMT+0800 (China Standard Time)

The problem with switching to Tor is that the bridge could be blocked more often than a "good" IP address (such as the one I use at home).

D5k H3h · Answer 82 · Mon Jul 12 2021 04:20:40 GMT+0800 (China Standard Time)

The thing is, if it is the thing with known Hosts, the Bridge would either always work or never. So maybe we give the Host the option to turn it on and off. Just like the Instagram Bridge did it with Bibliogram...

triatic · Answer 83 · Mon Jul 12 2021 18:09:04 GMT+0800 (China Standard Time)

Ok, but is there an equivalent of Bibliogram for Facebook?

Lockywolf · Answer 84 · Tue Oct 26 2021 16:48:45 GMT+0800 (China Standard Time)

Has anyone considered "Heavy artillery"? That is, selenium, an off-screen X server, and imitating a full-featured login session?

Lockywolf · Answer 85 · Wed Oct 27 2021 10:20:05 GMT+0800 (China Standard Time)

There have at least been some attempts to create one:

https://github.com/apurvmishra99/facebook-scraper-selenium

Tio · Answer 86 · Tue Nov 02 2021 22:58:50 GMT+0800 (China Standard Time)

I cannot access any public facebook page that I tried without being logged into facebook. Did facebook make all pages private?

triatic · Answer 87 · Tue Nov 02 2021 23:03:07 GMT+0800 (China Standard Time)

Did facebook make all pages private?

No, but they are subject to stringent rate-limiting and IP-blocking, and there isn't much we can do about that.

Tio · Answer 88 · Tue Nov 02 2021 23:14:59 GMT+0800 (China Standard Time)

Oh damn. And I use VPN and tried like 10 locations so they are all blocked?! Then it means is so easy to block our server that hosts rss-bridge based on a single IP that we have.

Tio · Answer 89 · Tue Nov 02 2021 23:17:31 GMT+0800 (China Standard Time)

Yah...no surprise actually....thx for the info.

triatic · Answer 90 · Tue Nov 02 2021 23:20:50 GMT+0800 (China Standard Time)

I use https://fetchrss.com/prices to get a few feeds once a day, there isn't much choice now.

Frankytyrone · Answer 91 · Tue Nov 02 2021 23:45:06 GMT+0800 (China Standard Time)

Feedbro is a RSS chrome extension that can scrape FB "even personal profiles" but not generate RSS feeds so it's a matter of checking manually. It's the best there is that I have found...

himatech · Answer 92 · Sun Jan 30 2022 21:13:09 GMT+0800 (China Standard Time)

Feedbro is a RSS chrome extension that can scrape FB "even personal profiles" but not generate RSS feeds so it's a matter of checking manually. It's the best there is that I have found...

It's pretty useful but it's not really portable. I prefer to have a centralised solution where I can rely on from my laptop or android.

INPoppoRTUNE · Answer 93 · Wed Apr 06 2022 18:52:25 GMT+0800 (China Standard Time)

@dvikan Is the Facebook bridge still broken? Is it still without a maintainer? I cannot find the maintainers list anymore here on GitHub.

triatic · Answer 94 · Wed Apr 06 2022 19:26:00 GMT+0800 (China Standard Time)

It was totally broken for me as soon as Facebook instigated an IP address block on cloud computing providers.

INPoppoRTUNE · Answer 95 · Wed Apr 06 2022 19:31:50 GMT+0800 (China Standard Time)

It was totally broken for me as soon as Facebook instigated an IP address block on cloud computing providers.

For me too.
I'd like to know if any maintainer has worked on this problem since then (due to the issue management done 10 days ago) and if it makes sense to re-test the bridge or will I face the same issues?

Magnus Walbeck · Answer 96 · Wed Apr 06 2022 19:55:05 GMT+0800 (China Standard Time)

Same here, I initially solved it by hosting the bridge at home. Though a few months after that Facebook migrated the pages I was monitoring to the new design which requires javascript to load, which of course completely broke the bridge. So as the new Facebook site is being rolled out, fewer pages will function and at some point the bridge will be completely borked.

Bocki · Answer 97 · Thu Apr 07 2022 03:01:59 GMT+0800 (China Standard Time)

Facebook is one of the hardest to maintain and it also has a few key limiters, even if it's maintained. Public instances almost always run into some form of facebook detection, so it's barely possible to keep it working for the people who run it at home.

Also, because facebook is constantly evolving and doing rolling releases, one feed might work while the other is broken, because fb changed the site underneath.

Matt DeMoss · Answer 98 · Tue Sep 13 2022 12:06:28 GMT+0800 (China Standard Time)

Is there currently a way to rate-limit locally? That is, prevent my self-hosted bridge from making too many requests to a given site too quickly? I'd probably also need my feed reader to refresh feeds in a random order.

triatic · Answer 99 · Tue Sep 13 2022 20:20:20 GMT+0800 (China Standard Time)

@mdemoss you can increase CACHE_TIMEOUT in the bridge code.