AssertionError: Please file a bug report with PRAW
HenryBlackie opened this issue · comments
Describe the Bug
I have been trying to iterate over the comments in a submission and an assertion error is raised on every attempt.
Desired Result
The script should be able to iterate over every available comment.
Relevant Logs
Traceback (most recent call last):
File "test.py", line 8, in <module>
submission.comments.replace_more(limit=None)
File "/home/henry/.local/lib/python3.8/site-packages/praw/util/deprecate_args.py", line 43, in wrapped
return func(**dict(zip(_old_args, args)), **kwargs)
File "/home/henry/.local/lib/python3.8/site-packages/praw/models/comment_forest.py", line 183, in replace_more
new_comments = item.comments(update=False)
File "/home/henry/.local/lib/python3.8/site-packages/praw/util/deprecate_args.py", line 43, in wrapped
return func(**dict(zip(_old_args, args)), **kwargs)
File "/home/henry/.local/lib/python3.8/site-packages/praw/models/reddit/more.py", line 70, in comments
assert self.children, "Please file a bug report with PRAW."
AssertionError: Please file a bug report with PRAW.
Code to reproduce the bug
import praw
submission = reddit.submission('z1c9z')
print(submission.title)
comments = []
submission.comments.replace_more(limit=None)
for comment in submission.comments.list():
comments.append(comment.id)
print(len(comments))
My code example does not include the Reddit()
initialization to prevent credential leakage.
Yes
This code has previously worked as intended.
No
Operating System/Environment
Ubuntu 20.04.5 LTS on Windows 10 x86_64
Python Version
python3.8.10
PRAW Version
7.6.1
Prawcore Version
2.3.0
Anything else?
I've run this script against other submissions with a large number of comments and I get the same error each time.
Additional post IDs:
- 2nbslo
- 39bpam
- 29qfnm
When I run the same script against a selection of small posts I do not get this error and the script executes as expected.
I'm also having this issue, and "This code has previously worked as intended": Yes
I'm also having this issue, it works for submissions with small number of comments, but fails with larger ones (~1400 comments) which have previously worked with no errors.
Similarly, it is impossible to get comments, this error constantly appears. Is there any workaround?
Literally just started digging into this like 4 minutes ago lol. I report back with my findings.
Long story short: there is a point in the building of the forrest that PRAW gets invalid/corrupted data from Reddit.
I'll investigate if this is in response to PRAW sending invalid/corrupted data to Reddit.
This is a new bug with Reddit. Reddit is returning a response that has the last child's ID truncated to 1 character. @bboe would you mind taking a look?
This issue has been brought to the attention of the platform team at Reddit where it would seem there was a recent code change impacting this old endpoint. No ETA on resolution at the moment.
So here's a summary of my findings: (TL;DR Reddit broke something in their /api/morecomments
endpoint and there isn't much we can do about it at the moment)
When you call replace_more
PRAW will start replacing all MoreComments
instances with its children comments. These can be either Comment
or MoreComments
instances. In posts with a bunch of comments (1k+ comments), the last comment will basically be the overflow of the rest of the comments (which is a MoreComments
instance itself). This last instance of MoreComments
can have thousands of children comments. Normally, this isn't an issue because you will just need to request the comments for that instance with the /api/morecomments
endpoint and you'll get back more comments (which can be a mixture of Comment
and MoreComment
instances) and the last one will be another large MoreComments
instance. This continues until all the instances are replaced or the limit
(the number of MoreComments
it will replace, by default this is 32) in the replace_more
is reached. Side note, the reason why many people are seeing this is because PRAW starts with the biggest MoreComments
first.
Now here is where it falls apart, the first time PRAW takes this last MoreComments
and hits the /api/morecomments
endpoint to fill those in. It gets the expected comments back, however, the last child of that response is another MoreComments
instance (notice the count of 17524) that is corrupted/incomplete:
{
"json": {
"data": {
"things": [
...,
{
"data": {
"children": [],
"count": 17524,
"depth": 0,
"id": "s",
"name": "t1_s",
"parent_id": "t3_2nbslo"
},
"kind": "more"
}
]
},
"errors": []
}
}
The assertion that everyone is seeing is there to make sure there is children to actually fetch. And with this incomplete MoreComments
instance, this will prevent PRAW from fetching the majority of the comments (basically PRAW can get the first and second page of comments you can see on the site).
We have a few solutions to this:
- Remove the assertion
- this will result in an error later on because PRAW tries to fetch an empty set of comments and it isn't written to handle that
- Remove the bad
MoreComments
instance- This will cause the
replace_more
to only replace a small subset ofMoreComments
.
- This will cause the
Both of these solutions are not good and our best bet is that Reddit will fix this bug, and soon. I suspect Reddit has made a change without considering the public API because the website seems unaffected by this and there appears to be different format (c1:t1_c60n8gi,t1_xxxxx,t1_xxxxx,
) that Reddit is requesting from the /api/morecomments
endpoint.
It seems @bboe beat me to the punch on my comment but it just confirms my findings.
In case you receive some update on the ETA, can you inform the thread. I'd be good to know, in case this is long term.
We certainly will!
This appears to be fixed in my testing earlier today. Though, I couldn't test fully to 100% confirm it is fixed.