pmem / rpma

Remote Persistent Memory Access Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FEAT: rpma_cq_wait() performance optimization

grom72 opened this issue · comments

FEAT: rpma_cq_wait() performance optimization

Rationale

ibv_ack_cq_events() seems to be the main bottleneck for the librpma library when the completion event channel is used.
To avoid the problem ibv_ack_cq_events() shall be called less frequently.
It is also wise to call it before ibv_get_cq_event() as it is more possible that we still have some spare time before a new event will be ready to obtain via ibv_get_cq_event().

Description

The struct rpma_cq shall be extended with a field unsigned int unack_cqe; and set to 0 in rpma_cq_new().

	(*cq_ptr)->cq = cq;
	(*cq_ptr)->unack_cqe = 0;

unack_cqe shall be increased every time ibv_get_cq_event returns a valid event in rpma_cq_wait().

rpma_cq_wait(struct rpma_cq *cq)
{
...
	if (ibv_get_cq_event(cq->channel, &ev_cq, &ev_ctx))
		return RPMA_E_NO_COMPLETION;

	++cq->unack_cqe;

As minimum the ibv_ack_cq_events() shall be called before ibv_cq is deleted inside rpma_cq_delet():

	if (cq->unack_cqe)
		(void) ibv_ack_cq_events(cq->cq, cq->unack_cqe);

	errno = ibv_destroy_cq(cq->cq);

but it also must be called cyclically as part of rpma_cq_wait (Please observe that ibv_ack_cq_events() operation is moved before ibv_get_cq_event()):

/*
 * cq.c -- librpma completion-queue-related implementations
 */

...
#define RPMA_MAX_UNACK_CQE UINT_MAX
...
int
rpma_cq_wait(struct rpma_cq *cq)
{
...
	/*
	 * ACK the collected CQ event.
	 */
	if (cq->unack_cqe >= RPMA_MAX_UNACK_CQE) {
		ibv_ack_cq_events(cq->cq, cq->unack_cqe);
		cq->unack_cqe = 0;
	}

	/* wait for the completion event */
	struct ibv_cq *ev_cq;	/* unused */
	void *ev_ctx;		/* unused */
	if (ibv_get_cq_event(cq->channel, &ev_cq, &ev_ctx))
		return RPMA_E_NO_COMPLETION;
...
}
...