mozilla / jira-bugzilla-integration

Slightly related to #645:

In step functions, we add responses from various services to an action's context object:

Line 58 in fe2f942

context = context.append_responses(jira_response)

Then, after the step has been executed, we log the responses we received, if any:

Lines 121 to 132 in fe2f942

    
           step_responses = context.responses_by_step[step.__name__] 
        
           if step_responses: 
        
               has_produced_request = True 
        
           for response in step_responses: 
        
               logger.debug( 
        
                   "Received %s", 
        
                   response, 
        
                   extra={ 
        
                       "response": response, 
        
                       **context.model_dump(), 
        
                   }, 
        
               )

Finally, we flatten the responses

jira-bugzilla-integration/jbi/runner.py

Lines 135 to 138 in fe2f942

    
           responses = list( 
        
               itertools.chain.from_iterable(context.responses_by_step.values()) 
        
           ) 
        
           return True, {"responses": responses}

and return them (or an error message) in the /bugzilla_webhook router handler:

jira-bugzilla-integration/jbi/router.py

Lines 83 to 87 in fe2f942

    
           try: 
        
               result = execute_action(webhook_request, actions) 
        
               return result 
        
           except IgnoreInvalidRequestError as exception: 
        
               return {"error": str(exception)}

Some thoughts here:

we only return this response to Bugzilla, and it only cares about the 200 status code we respond with. No one (machine nor human) reads the responses
we duplicate a lot of response information in the logs. When we run:

for response in step_responses:
    logger.debug(
        "Received %s",
        response,
        extra={
            "response": response,
            **context.model_dump(),
        },
    )

we'll log each response in a message line, but we'll also log each response len(step_responses) times in some structured log field, since we've already added the responses to the context object by this point.

I propose that we:

don't return anything from the /bugzilla_webhook endpoint (only a 200 response)
- follow this chain all the way down, and don't return anything from functions where we don't meaningfully use the return value
push response logging "closer" to where we receive the response (Step function, service, or somewhere else?). We'll be able to link log lines for an action pipeline as long as we correlate them with our rid field

push response logging "closer" to where we receive the response (Step function, service, or somewhere else?). We'll be able to link log lines for an action pipeline as long as we correlate them with our rid field

If we end up doing this, I don't know what the best pattern would be. Maybe both places? Like, log lines could look something like:

jbi.steps, DEBUG, "executing step `create_issue` for action abc" (extra has action_context, including rid)
jbi.services.jira, DEBUG, "creating issue for Bugzilla bug 123456" (extra has action_context, including rid, request body)
jbi.services.jira, DEBUG, "Issue JBI-123 created for Bugzilla bug 123456" (extra has action_context, including rid, response body)
jbi.steps, DEBUG, executing step `add_link_to_bugzilla` for action abc
...

don't return anything from the /bugzilla_webhook endpoint (only a 200 response)

follow this chain all the way down, and don't return anything from functions where we don't meaningfully use the return value

+1, as long as testing remains straightforward, sort of functional style

push response logging "closer" to where we receive the response (Step function, service,)
[...]
as long as we correlate them with our rid field

+1
Make sure we document well how to track a request (eg. examples of log explorer queries)

If we end up doing this, I don't know what the best pattern would be. Maybe both places?

I think we can log in both places, and use a different log level for example. For example, steps as INFO and services as DEBUG, or logs announcing sending of request as DEBUG and logs with response as INFO.

When I see the amount of logs per minute that the gke clusters produces, I don't want to feel bad having too much log. Unless it becomes an issue when troubleshooting, but I don't we're there yet.

Note that I think that last time I looked no DEBUG log was published in prod

	step_responses = context.responses_by_step[step.__name__]
	if step_responses:
	has_produced_request = True
	for response in step_responses:
	logger.debug(
	"Received %s",
	response,
	extra={
	"response": response,
	**context.model_dump(),
	},
	)

	responses = list(
	itertools.chain.from_iterable(context.responses_by_step.values())
	)
	return True, {"responses": responses}

	try:
	result = execute_action(webhook_request, actions)
	return result
	except IgnoreInvalidRequestError as exception:
	return {"error": str(exception)}

Don't return any data from the `/bugzilla_webhook` endpoint, move response logging to step or service methods