ptpb / pb

pb is a formerly-lightweight pastebin and url shortener

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ptpb.pw permanent shutdown

buhman opened this issue · comments

TL;DR: coin miners are assholes, ruining this great service for everyone

Due to continued write and egress abuse, ptpb.pw is now ~permanently shut down. The current pb implementation has always been a toy at best, and is unsuited for mitigating very real modern internet threats. Unchecked, current usage would have resulted in $3000+/mo GCP bills for network egress--grossly innapropriate for what has always been a charity project.

See the following github issues for more information:

In the future, ptpb.pw may be restored, but with an entirely new (possibly backwards incompatible) API/implementation. This is months away, optimistically.

That's a pity, but thank you very much for hosting this service while it lasted.

Pity indeed, ptpb was a great paste service.

Thanks for providing us with the service for so long!

RIP. Forgot how to use this so I went to the site. Very unfortunate to hear this.

RIP best paste service

I feel bad for this being shut down. Is there any way possible for ptpb to be a temporary storage, and use a similar formula to that of what 0x0.st uses to generate its "File Retention Period"? If I may quote so:

FILE RETENTION PERIOD
---------------------

retention = min_age + (-max_age + min_age) * pow((file_size / max_size - 1), 3)

   days
    365 |  \
        |   \
        |    \
        |     \
        |      \
        |       \
        |        ..
        |          \
  197.5 | ----------..-------------------------------------------
        |             ..
        |               \
        |                ..
        |                  ...
        |                     ..
        |                       ...
        |                          ....
        |                              ......
     30 |                                    ....................
          0                        256                        512
                                                              MiB

Something similar of this sort, if not the same formula could be used; which helps against people abusing this service. If there is anything I can do for it, please let me know. I'd be more than happy to help in any way possible.

Happy to discuss creative solutions.

Most cloud providers don't charge terribly much for storage or IO, so this was a mere annoyance (which is why ptpb.pw didn't shut down in december or even earlier).

I think 0x0's formula is interesting.

I've also considered some kind of "fingerprint"-based bucketing and rate-limiting. The same fingerprint buckets could be weighted in to a garbage collector.

I'm also not against tuning a logistic regression model against every imaginable feature to compute a overall "abuse" factor, which could be consumed by both the rate limiter and the garbage collector.

There are also possible privacy issues with this that I'm not sure how to handle (or even if people care about this). I think ideally the pastebin should have no knowledge of paste content via some at-rest encryption scheme, despite this possibly being one of the strongest model features: "does this look like a magnet link, mining pool file, etc.."

If there is anything I can do for it

A distributed service might be inherently more resilient to abuse. I think there is a void here, in that existing distributed services focus more on being fully distributed, greatly sacrificing usability.

A split-view DNS based service might have a centralized coordinator that handles advertisement, membership, health/sanity validation, space partitioning. There are also some harder questions related to how resilient individual pastes should be to single-node failures, how do pastes get read/write partitioned without proxying, etc.. I think this might be a fun experiment at the very least.

The dream would be people interested in contributing resources could leisurely deploy/undeploy any number of object servers, and the process for doing this should be as easy as "copy+run this signed binary".

I thought I'd also point out I considered redeploying pb on a provider that doesn't charge for network egress, but I think the end result of ignoring the underlying problems will be poor performance and high latency, due to abusers saturating ptpb's (now more limited) networking. I think the shutdown is more appropriate than continuing to run a degraded service.

coiners truly are the scum of the earth.

I guess pastebin over IPFS would work for distributing.

I've also considered some kind of "fingerprint"-based bucketing and rate-limiting. The same fingerprint buckets could be weighted in to a garbage collector.

Yes, rate limiting is definitely a viable solution. But it would not take much for people who want to abuse the service to do so.

think ideally the pastebin should have no knowledge of paste content via some at-rest encryption scheme, despite this possibly being one of the strongest model features: "does this look like a magnet link, mining pool file, etc.."

There could be something done like "retention period" based on the type and size of the file. For example: if the service does not delete files that are <10MB in size for over a year's time, there'll be people abusing it as a "free photo backup service". So to limit that, as an example, photos can be retained for only x number of days, after which they get deleted. If someone really needs a "free service" for them to host photos for infinite number of days, I guess it would only make sense for them to self-host it.

Talking of encryption, that would be a good thing to do.

A distributed service might be inherently more resilient to abuse. I think there is a void here, in that existing distributed services focus more on being fully distributed, greatly sacrificing usability.

I think more than the usability, the question that arises is about security. Even though decentralization does seem like a really cool thing, there's a lot of things in terms of encryption that you need to figure out. Something like a mega.nz approach with a file-hash and a key would be really cool but imho, it would really be over the top and take over the simplicity that we have right now in pb.

I thought I'd also point out I considered redeploying pb on a provider that doesn't charge for network egress, but I think the end result of ignoring the underlying problems will be poor performance and high latency

I'd like to point out that pb was legit the fastest pb (duh.) that I've used. 0x0.st is way slower, but it does a lot of things in terms of file checking, and that it also does not use GCP like pb did. What it means is that for a "free" service, that's being hosted by someone for the community, is a reason enough for me to use it. I wouldn't be looking towards how long it takes for the file to upload, or even that the service is slow. If I was to expect a free service to be the best ever service possible, well, that would be me just being a really stupid asshole for asking everything for free. As quoted before, people that expect a free service to do a lot are free to host it themselves (or pay for it).

I respect your decision to shutdown pb, but I would love to see it back up. Sorry for quoting a lot of your messages, but that's just my 10c about where the service could be in the future.

0x0.st is way slower, but it does a lot of things in terms of file checking, and that it also does not use GCP like pb did.

As far as I can tell, the biggest reason for 0x0's slowness is Hetzner's routes outside of Europe. Within Europe, they've got decent routes and 0x0 is fast, but outside you're going to have some very congested paths over bargain-bin transit providers. So not really an issue with the software, but with the host.

I really, really hope that before it was shut down, you at least backed up the data on the site! 😱

Assuming you did, will you release an archive of all the public pastes so that people who uploaded stuff can dig through it?

@buhman do you happen to have a backup of all the pastes? There is some data I am looking for that unfortunately has dropped out of google's web cache...

Weren't at least some of the pastes pseudo-anonymous? Meaning you could only access them if you knew the full hash? Releasing an archive of all the pastes would certainly break that pseudo-anonymity.

@rafasc yes, only public pastes should be made available if possible, as @buhman said recently:

I'm open to making a public-paste-only database dump available if someone is interested in analysis.

#240 (comment)

Thanks for the interest. Contrast to #240 (comment), I'm no longer willing to provide this archive. As of today there are now zero copies of the ptpb.pw database to the best of my knowledge.

Please restore this useful service by accepting the pastes only within some range (say 1MB or less) and automatically delete the pastes that are rarely read. Thanks for the effort though.

Deleting rarely read pastes if the problem was opposite — the abusers were reading them often — is not fixing anything. Splitting data into 1MB fragments is not rocket science either.

Not Splitting data into 1MB fragments, I'm suggesting not to accept pastes that exceeds 1 MB or some threshold.. or build some authorization to know your users.

@ADITYACODER007 +1 to what @mpan-pl said.

Please restore this useful service

I deeply appreciate this comment.

However, my goals are not exactly aligned with "short-term" restoration. Instead, they are, in decreasing order of importance:

  1. have fun
  2. learn
  3. possibly provide a useful service

My current under-defined plan is to write a new pastebin implementation in scheme, based on the many lessons learned from pb, including:

  • flask sucks
  • mongo sucks
  • all web frameworks suck
  • everything sucks
  • etc..

To illustrate how far away this is, I'm currently writing buhman/route-mux, which is a (very incomplete) generic HTTP router/multiplexer inspired by the implementation details of julienschmidt/httprouter. This is the first component of many that will eventually form my anti-framework.

What I'm really saying is: don't hold your breath, because I'm definitely not pragmatic when it comes to personal projects. (@silverp1 is under-credited for his role in getting the original pb off the ground to begin with)

Not Splitting data into 1MB fragments, I'm suggesting not to accept pastes that exceeds 1 MB or some threshold.. or build some authorization to know your users.

The problem is that the abuser can easily split their data into 1MB pieces and then equally effortlessly reassemble it. For the two specific cases that have led to the current situation:

  1. Using ptpb as a gratis backup service: just split before uploading, cat after downloading. Less tech-savvy people can just use the builtin 7-zip or WinRAR feature to achieve that.
  2. Using ptpb as CnC: those people must have written a bit of code already — writing something that splits/reassembles data is a minimal obstacle.

It will also hit the intended use, because people upload screenshots or even photographs of their screens¹. This can easily go beyond 1MB.

Note that I am not dismissing the idea completely. But if it’s going to work, it will be the same as a bicycle locks: most of its effectiveness comes from thieves simply choosing unsecured bicycles, not because removing a typical lock is a problem to them. This is a perfectly valid security approach, but one has to keep in mind that it will cease working as soon as other services become unavailable or apply similar measures.


¹ Based only on my own observations on IRC channels.

Thanks for this awesome project.

commented

This is a bummer that this had to be shutdown. But I have a few shortlinks that I need to get access to. They have mega links behind them, and I can't get access to the original mega link. How can I go about getting the expanded link?

You'll have to get your warez/pr0n somewhere else, sorry bud.

Alternatively, it's possible ArchiveTeam's URL shortener project has scraped them.

commented

Sounds good, thanks bud

Hello, everybody here. I'm so sorry if I bother you. I'm a previous user of this repo's program. Recently, as my first practice of golang, I implemented this pastebin with same or more features, like:

  • Abuse detection, we do allow pure text only, no photos, no binaries, the uploaded file size can only be less than 2 megabytes.
  • ReCAPTCHA support. If you do experiencing a lot from bot, you could try my product.
  • Client Tool. We build a powerful client cli tools for you to use, you could also use the curl, there's no difference. This tool even allow administrator to delete some illegal snippet quickly. If you wanna use it for private sharing, we could help you generate a random 6-digit password for access, you could also define it by yourself.
  • Data encryption on storage. We do use chacha20-poly1305 to save your snippet. Due to the need of abuse detection, we can't and we won't implement any client-side encryption.
  • Original Shortlink support. We only support shortlink, for convenience.
  • Auto expiring. Previously, ptpb/pb need administrator to clear the database manually. We do force an expiring time for all uploaded data. The maximum allowed expiration of program is 24h, you can define it yourself, but cannot larger than 24. As for MongoDB, it won't release the disk space it allocated even the documents deleted, you could try "compact" database if you want.
  • Burn-after-read support. Optionally defined by an end user, we allow it. But the server default expiration cannot be 0.
  • Syntax highlighting by default, using Google's prettify.js, same version as stackoverflow.com . If you need raw paste, simply add f=raw as url param in request.
  • No extra cost. We don't wanna show anything towards illegal/invalid request. You just get a status code without any html page sent by our server.
  • Web page upload.
  • Single binary. We don't need any other dependencies except MongoDB.

Due to the security reason, we do require and force you to run our application behind a reverse proxy. This reverse proxy should be TLS-supported and supporting rate limiting feature. Personally, I use caddy web server v1 for this purpose, co-operating with Cloudflare might help you more secure.

I hosted an instance at https://pbgo.top , which can be used for experiencing. If the cost is affordable, I will try my best to maintain it running for public. At this phase, this product is still experimental. Please help us find more bugs or file some feature request. Also, feel free to submit any issue.

If you think this project is useful for you, please consider give me a star, Thank you for reading here. And helping us testing, optimizing this project.

I am a bit confused on how this is related to pb at all, any more than any other "yet another pastebin".

No PHOTO, No BINARY

Actually, one of the main motivations for pb was that sprunge.us mangles paste content:

a@b:~$ echo "asdf" | sha256sum
d1bc8d3ba4afc7e109612cb73acbdddac052c93025aa1f82942edabb7deb82a1  -
a@b:~$ echo "asdf" | curl -F 'sprunge=<-' http://sprunge.us 
http://sprunge.us/9XiRHt
a@b:~$ curl -s http://sprunge.us/9XiRHt | sha256sum 
46b0e2df6b8a0af3d5293e3432af7e5457e328925dafa80c2a32c5bbfa8f0fa9  -

pb by contrast had a design goal of "bytes in, bytes out", explicitly to make it easy to share troubleshooting-related screenshots in #archlinux without the awfulness of bloated alternatives like imgur.

Also what the actual fuck?

a@b:~$ curl -X POST -F 'd=@-' https://pbgo.top/api/upload < bytes
Please go to https://pbgo.top/showVerify?id=aGgwMA to finish CAPTCHA.

(you guessed it, google's captchas)

a@b:~$ wc -c bytes 
1048576 bytes
a@b:~$ curl -s https://pbgo.top/hh00 | wc -c
1245940

Cool, so "pb-go" by default wraps all pastes in html, and loads a ton of bogus javascript, and mangles "email addresses" (which, apparently, my 1MB /dev/urandom contained at least one of).

In other words, "pb-go" is the complete polar opposite of ptpb/pb on the pastebin/usability spectrum.

I am a bit confused on how this is related to pb at all, any more than any other "yet another pastebin".

No PHOTO, No BINARY

Actually, one of the main motivations for pb was that sprunge.us mangles paste content:

a@b:~$ echo "asdf" | sha256sum
d1bc8d3ba4afc7e109612cb73acbdddac052c93025aa1f82942edabb7deb82a1  -
a@b:~$ echo "asdf" | curl -F 'sprunge=<-' http://sprunge.us 
http://sprunge.us/9XiRHt
a@b:~$ curl -s http://sprunge.us/9XiRHt | sha256sum 
46b0e2df6b8a0af3d5293e3432af7e5457e328925dafa80c2a32c5bbfa8f0fa9  -

pb by contrast had a design goal of "bytes in, bytes out", explicitly to make it easy to share troubleshooting-related in #archlinux without the awfulness of bloated alternatives like imgur.

Also what the actual fuck?

a@b:~$ curl -X POST -F 'd=@-' https://pbgo.top/api/upload < bytes
Please go to https://pbgo.top/showVerify?id=aGgwMA to finish CAPTCHA.

(you guessed it, google's captchas)

a@b:~$ wc -c bytes 
1048576 bytes
a@b:~$ curl -s c | wc -c
1245940

Cool, so "pb-go" by default wraps all pastes in html, and loads a ton of bogus javascript, and mangles "email addresses" (which, apparently, my 1MB /dev/urandom contained at least one of).

In other words, the complete opposite of ptpb/pb on the pastebin spectrum.

Yes, the main motivation for this project is anti-abuse. By default we show prettified code in html, if you want raw code, just add ?f=raw as url param. The recaptcha feature is optional, can be disabled by administrator.

Tons of JS? I just use a prettify.js(just one) to make sure I can view the code quickly.

As this issue owner said, the public service down due to abuse of binaries uploaded by coin miners. The maintenance cost us time, the server fee is also in consideration. If you want to share screenshot, I'd better prefer to use something like Chevereto. You could try another free image hosting: https://sm.ms .

The legal problem is also in consideration, what about child porn or copyright photo? How should I reaction for that? My time is precious, and should not be wasted by those assholes. I decided to let problem stop before they arise. That's why I wrote it.

We will add another option later to allow administrator set show raw snippet by default. I really appreciate your opinions and suggestions.

As this issue owner said

I am the issue owner. Captcha is absolutely not an acceptable solution. Your "API" is broken by design.

just one

Have you even looked at your own deployment? You have email-decode.min.js, and rocket-loader.min.js (better make sure your "just one" script loads quickly).

sm.ms

I don't expect you to understand why this suggestion is offtopic.

https://github.com/pb-go/pb-go/blob/ce11783949dd8c58c56f39ee512e53a68e2f0c9e/utils/dataenc.go#L43

That's not a nonce.

As this issue owner said

I am the issue owner. Captcha is absolutely not an acceptable solution. Your "API" is broken by design.

just one

Have you even looked at your own deployment? You have email-decode.min.js, and rocket-loader.min.js (better make sure your "just one" script loads quickly).

sm.ms

I don't expect you to understand why this suggestion is offtopic.

https://github.com/pb-go/pb-go/blob/ce11783949dd8c58c56f39ee512e53a68e2f0c9e/utils/dataenc.go#L43

That's not a nonce.

Brilliant idea. Thanks for pointing out the issue. Personally, I ensure the js loaded quickly as I can stand cuz the CDN is perfectly working globally. Prettified code is better for most normal people.

My personal perspective is just stop those assholes at all cost. I can understand why u said so, cuz I am also an Arch user. On the other hand, I am also a site administrator, I need to take care of all shits before that happen.

Thanks again.

BTW, email-decode and rocket-loader JavaScript is added by cloudflare CDN, my project doesn't use them.

My domain is using cloudflare services.

I don't expect you to understand what those spammers did to my previous site.

About rocket-loader: https://support.cloudflare.com/hc/en-us/articles/200168056-What-does-Rocket-Loader-do-

About email-decoder: https://support.cloudflare.com/hc/en-us/articles/200170016-What-is-Email-Address-Obfuscation-

Your project does use them, because the browser unconditionally loads and executes them when it loads your pages. By definition, the fact that they are not critical to the operation of your service makes them unnecessary bloat.

My domain is using cloudflare services.

This is actually multiple levels of bad. Not only can a user encounter your own captchas, but cloudflare will also inject their own captchas at random.

what about child porn or copyright photo?

IMO, the best approach is to remove your ability to read the content of all pastes entirely, likely using some mandatory end-to-end encryption scheme. As an operator, you can't moderate pastes that you have no ability to view the content of. This has end-user usability implications that might be unsolvable though.

Your project does use them

This instance use them cuz I need to protect my machine. I don't expect you to understand what those spammers did to my previous site. This FOSS (software itself), doesn't use cloudflare.

cloudflare will also inject their own captchas at random.

This is not my problem, just there's a previous spammer using the same IP or something like that. Yes, security and safety first.

remove your ability to read the content of all pastes entirely, likely using some mandatory end-to-end encryption scheme.

Then your server fee and storage fee just booms as u experienced. You also have to take your precious time and a lot of money to maintain service for those spammer, instead of who really need it. That's why your instance finally get permanently shutdown.

just there's a previous spammer using the same IP or something like that

I don't think you understand how arbitrary Cloudflare's captchas are. 10 seconds of googling will show you all kinds of nastiness legitmate users are exposed to.

That's why your instance finally get permanently shutdown.

I love how you're trying to explain to me why I decided to shut ptpb.pw down. Honestly I hated running pb, and #246 was the excuse I needed to no longer feel guilty about shutting it down.

This is my last comment on this issue:

I don't think you understand how arbitrary Cloudflare's captchas are. 10 seconds of googling will show you all kinds of nastiness legitmate users are exposed to.

I'm so sorry that you have experienced such horrible things. I have to say that I'm truly in a country you can guessed that has the most serious censorship, I use a private proxy to access global Internet. But the proxy itself is sharing to at least 100 people and it's an IDC IP, cuz it's an expensive International Private Leased Circuit. Even those, I never met a captcha in several years till now.

was the excuse I needed to no longer feel guilty

Glad you think so. I do use my love to maintain free public service to those people who need it.

Would anyone be willing to explain what the goal of the abuse was? Did the miners use pastes to control hacked computers, or was there some other use?

I'm curious what the attack pattern was, because it's not obvious how a pastebin would help with mining. I never used pb when it existed. So, it's not clear to me why pb was affected while 0x0 seemingly isn't. Was the ability to edit older posts (without changing the url) an essential part of the abuse recipie?

The traffic was mostly (highly repetitive) "GET" requests--I didn't do much more detailed analysis than that. The paste bodies were monero miner configuration files.

Was the ability to edit older posts (without changing the url) an essential part of the abuse recipie?

This would make sense, though I don't recall specifically if this feature was in use (it would have been evident from the paste URL).