Reading through this tutorial and this reference, I see that block device requests go through the hardware queue before being dispatched to the queue_rq handler.
When writing a block device driver with only a single hardware queue (configured with nr_hw_queues = 1), can the queue_rq handler still be called from multiple threads simultaneously? It seems to me that a single hardware queue would mean that calls to queue_rq would be implicitly synchronized by the single hardware queue. Is this the case, or can this handler be called from multiple threads simultaneously, even with only a single hardware queue?
If possible, a code reference demonstrating the behavior would be much appreciated.
Cheers!