praw-dev / praw

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

Home Page:http://praw.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AssertionError: Please file a bug report with PRAW

HenryBlackie opened this issue · comments

Describe the Bug

I have been trying to iterate over the comments in a submission and an assertion error is raised on every attempt.

Desired Result

The script should be able to iterate over every available comment.

Relevant Logs

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    submission.comments.replace_more(limit=None)
  File "/home/henry/.local/lib/python3.8/site-packages/praw/util/deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
  File "/home/henry/.local/lib/python3.8/site-packages/praw/models/comment_forest.py", line 183, in replace_more
    new_comments = item.comments(update=False)
  File "/home/henry/.local/lib/python3.8/site-packages/praw/util/deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
  File "/home/henry/.local/lib/python3.8/site-packages/praw/models/reddit/more.py", line 70, in comments
    assert self.children, "Please file a bug report with PRAW."
AssertionError: Please file a bug report with PRAW.

Code to reproduce the bug

import praw

submission = reddit.submission('z1c9z')

print(submission.title)

comments = []
submission.comments.replace_more(limit=None)
for comment in submission.comments.list():
    comments.append(comment.id)

print(len(comments))

My code example does not include the Reddit() initialization to prevent credential leakage.

Yes

This code has previously worked as intended.

No

Operating System/Environment

Ubuntu 20.04.5 LTS on Windows 10 x86_64

Python Version

python3.8.10

PRAW Version

7.6.1

Prawcore Version

2.3.0

Anything else?

I've run this script against other submissions with a large number of comments and I get the same error each time.
Additional post IDs:

  • 2nbslo
  • 39bpam
  • 29qfnm

When I run the same script against a selection of small posts I do not get this error and the script executes as expected.

I'm also having this issue, and "This code has previously worked as intended": Yes

commented

I'm also having this issue, it works for submissions with small number of comments, but fails with larger ones (~1400 comments) which have previously worked with no errors.

commented

Similarly, it is impossible to get comments, this error constantly appears. Is there any workaround?

Literally just started digging into this like 4 minutes ago lol. I report back with my findings.

Long story short: there is a point in the building of the forrest that PRAW gets invalid/corrupted data from Reddit.

I'll investigate if this is in response to PRAW sending invalid/corrupted data to Reddit.

This is a new bug with Reddit. Reddit is returning a response that has the last child's ID truncated to 1 character. @bboe would you mind taking a look?

This issue has been brought to the attention of the platform team at Reddit where it would seem there was a recent code change impacting this old endpoint. No ETA on resolution at the moment.

So here's a summary of my findings: (TL;DR Reddit broke something in their /api/morecomments endpoint and there isn't much we can do about it at the moment)

When you call replace_more PRAW will start replacing all MoreComments instances with its children comments. These can be either Comment or MoreComments instances. In posts with a bunch of comments (1k+ comments), the last comment will basically be the overflow of the rest of the comments (which is a MoreComments instance itself). This last instance of MoreComments can have thousands of children comments. Normally, this isn't an issue because you will just need to request the comments for that instance with the /api/morecomments endpoint and you'll get back more comments (which can be a mixture of Comment and MoreComment instances) and the last one will be another large MoreComments instance. This continues until all the instances are replaced or the limit (the number of MoreComments it will replace, by default this is 32) in the replace_more is reached. Side note, the reason why many people are seeing this is because PRAW starts with the biggest MoreComments first.

Now here is where it falls apart, the first time PRAW takes this last MoreComments and hits the /api/morecomments endpoint to fill those in. It gets the expected comments back, however, the last child of that response is another MoreComments instance (notice the count of 17524) that is corrupted/incomplete:

{
    "json": {
        "data": {
            "things": [
                ...,
                {
                    "data": {
                        "children": [],
                        "count": 17524,
                        "depth": 0,
                        "id": "s",
                        "name": "t1_s",
                        "parent_id": "t3_2nbslo"
                    },
                    "kind": "more"
                }
            ]
        },
        "errors": []
    }
}

The assertion that everyone is seeing is there to make sure there is children to actually fetch. And with this incomplete MoreComments instance, this will prevent PRAW from fetching the majority of the comments (basically PRAW can get the first and second page of comments you can see on the site).

We have a few solutions to this:

  • Remove the assertion
    • this will result in an error later on because PRAW tries to fetch an empty set of comments and it isn't written to handle that
  • Remove the bad MoreComments instance
    • This will cause the replace_more to only replace a small subset of MoreComments.

Both of these solutions are not good and our best bet is that Reddit will fix this bug, and soon. I suspect Reddit has made a change without considering the public API because the website seems unaffected by this and there appears to be different format (c1:t1_c60n8gi,t1_xxxxx,t1_xxxxx,) that Reddit is requesting from the /api/morecomments endpoint.

It seems @bboe beat me to the punch on my comment but it just confirms my findings.

In case you receive some update on the ETA, can you inform the thread. I'd be good to know, in case this is long term.

We certainly will!

This appears to be fixed in my testing earlier today. Though, I couldn't test fully to 100% confirm it is fixed.