kevin1024 / vcrpy

Automatically mock your HTTP interactions to simplify and speed up testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[CORE BUG][Verified Can Replicate] Wrong number of requests recorded under new_episodes mode

vickyliin opened this issue · comments

AFAIK, when allow_playback_repeats is False, the number of requests should be equal to the length of cassette data.
But when we use new_episodes mode with same/matched requests called several times, the number of cassette data is smaller than the number of requests.

This is the same problem as mentioned here: #245
I changed the reproduce script to httpx so it's easier to count the number of requests:
https://colab.research.google.com/gist/vickyliin/cf06ea0dc0f7f2f8a1b5c30bb8f3909d/vcrpy-bug.ipynb

In the new_episodes context, the number of reuqests is 7, while the length of cassette.data is only 4

The cause is cassette.append doesn't set play_counts just like #245 mentioned. When a new and matched request calls, it uses the previously appended cassette data. So the total number of cassette.data for the above script is 1 (old) + 6/2 (new episodes) = 4

For now I use the following code as a hotfix:

class FixedCassette(Cassette):
    def append_and_play(self, request, response) -> None:
        prev_len_data = len(self.data)
        super().append(request, response)
        is_appended = len(self.data) == prev_len_data + 1
        if is_appended:
            self.play_counts[prev_len_data] += 1

    def _load(self) -> None:
        super()._load()
        self.append = self.append_and_play
        self._load = None


context = vcr.use_cassette(...)
context.cls = FixedCassette
wich context:
    run_requests()

Related code:

Records are only used when no play_count

vcrpy/vcr/cassette.py

Lines 266 to 274 in 69de388

def play_response(self, request):
"""
Get the response corresponding to a request, but only if it
hasn't been played back before, and mark it as played
"""
for index, response in self._responses(request):
if self.play_counts[index] == 0 or self.allow_playback_repeats:
self.play_counts[index] += 1
return response

append does not set play_count

vcrpy/vcr/cassette.py

Lines 234 to 247 in 69de388

def append(self, request, response):
"""Add a request, response pair to this cassette"""
log.info("Appending request %s and response %s", request, response)
request = self._before_record_request(request)
if not request:
return
# Deepcopy is here because mutation of `response` will corrupt the
# real response.
response = copy.deepcopy(response)
response = self._before_record_response(response)
if response is None:
return
self.data.append((request, response))
self.dirty = True

append is also used when loading cassette so we can't set play_count in append directly

vcrpy/vcr/cassette.py

Lines 342 to 348 in 69de388

def _load(self):
try:
requests, responses = self._persister.load_cassette(self._path, serializer=self._serializer)
for request, response in zip(requests, responses):
self.append(request, response)
self.dirty = False
self.rewound = True

For other modes, we can't see this problem because rewound is False (or it's ALL mode)

vcrpy/vcr/cassette.py

Lines 262 to 264 in 69de388

def can_play_response_for(self, request):
request = self._before_record_request(request)
return request and request in self and self.record_mode != RecordMode.ALL and self.rewound

I'm trying to make a PR to fix this issue, but found play_count is heavily relied on in the tests to check if a record is newly added or being played.

I'm wondering if

  • I should modify the logic of play_count to be added whenever a record has been played no matter from loaded cassettes or from a real request and change all the related tests
  • or I should find a work around

This bug is pretty annoying because the cassette keeps growing after each time we run the code, until no new repeated requests is making. It takes log(n) runs to make the cassette stable

I'm willing to help. Anyone can make a decision? Anything else can I do now?