This deadlock occurred with the consumer's `commit` method and
producer's `send` method for the same reason: calling
`lparallel.queue:pop-queue`[1][2] on an empty queue will block
indefinitely. For both consumer's and producer's, these queues are
filled by a call to `enqueue-payload`. However, because the calls to
`lparallel.queue:pop-queue` and `enqueue-payload` each attempt to
acquire the `+address->queue-lock+` mutex, once the
`lparallel.queue:pop-queue` blocks on an empty queue, we'd end up in a
deadlock.
Ending up with an empty queue during a call to
`lparallel.queue:pop-queue` is not a valid state and the reason why
we'd reach this state is because the mutex was not being acquired
early enough by the consumer and producer:
* In the consumer's case, it should have been acquired before the
call to `cl-rdkafka/ll:rd-kafka-commit-queue`.
* In the producer's case, it should have been acquired before the
call to `%send`.
For each consumer and producer, there are two queues that are supposed
to have the same size and corresponding elements at all times:
* An `rd-kafka-queue`, which is a pointer to an `rd_kafka_queue_t` C
struct. This `rd-kafka-queue` is filled by calls to
`cl-rdkafka/ll:rd-kafka-commit-queue` and `%send`.
* A `queue`, which is the result of `lparallel.queue:make-queue`.
This `queue` is filled by calls to `enqueue-payload`.
Everytime librdkafka enqueues a commit or send event to
`rd-kafka-queue`, a corresponding lparallel promise should be enqueued
to `queue`.
[process-events], which is called by [poll-loop] in a background
thread after acquiring `+address->queue-lock+`, will loop over
`rd-kafka-queue` until it's empty and for each commit/send event that
it pops off, will call [process-commit-event]/[process-send-event]. In
turn, `process-commit-event` and `process-send-event` will pop a
promise off of `queue` and fulfill it accordingly with the commit/send
event details.
Because `cl-rdkafka/ll:rd-kafka-commit-queue` and `%send` were being
called without acquiring the mutex, commit/send events continued to be
enqueued onto `rd-kafka-queue`. This caused `process-events` to
continue looping, which caused `process-commit-event` and
`process-send-event` to continue popping promises off of
`queue`. However, because `enqueue-payload` would attempt to acquire
the mutex held by `poll-loop` before enqueuing promises onto `queue`,
this `queue` would eventually become empty; thus, causing
`lparallel.queue:pop-queue` to block indefinitely and leaving us in a
deadlock.
[1]: https://github.com/SahilKang/cl-rdkafka/blob/9119880aa85382ce815d7ee0dd404f29ec4b6136/src/high-level/consumer/consumer.lisp#L127
[2]: https://github.com/SahilKang/cl-rdkafka/blob/9119880aa85382ce815d7ee0dd404f29ec4b6136/src/high-level/producer.lisp#L83
[process-events]: https://github.com/SahilKang/cl-rdkafka/blob/9119880aa85382ce815d7ee0dd404f29ec4b6136/src/high-level/event-io/kernel.lisp#L67
[poll-loop]: https://github.com/SahilKang/cl-rdkafka/blob/9119880aa85382ce815d7ee0dd404f29ec4b6136/src/high-level/event-io/kernel.lisp#L83
[process-commit-event]: https://github.com/SahilKang/cl-rdkafka/blob/9119880aa85382ce815d7ee0dd404f29ec4b6136/src/high-level/consumer/consumer.lisp#L124
[process-send-event]: https://github.com/SahilKang/cl-rdkafka/blob/9119880aa85382ce815d7ee0dd404f29ec4b6136/src/high-level/producer.lisp#L72
Signed-off-by: Sahil Kang <sahil.kang@asilaycomputing.com>