If I am building a webservice/Web API for servicing requests, there are 2 threading strategies I can think of.
I will explain this in terms of Java (though the question may be relevant for any other language also)
- In the main thread, I accept the request and pass it to a thread. The thread does the processing. 
- In the main thread, I accept the request and pass it to a thread. This thread just puts the request into a job queue and exits. You have a separate thread pool which processes requests in the queue and processes them. 
How do I pick of these methods?
In the first method, I can run as many run as many threads as my OS/memory allows me and each request starts getting processed as soon as it is received. However, things may slow down if there are too many threads. In the 2nd method, if I have a fixed number of worker threads, then if all threads are occupied, then new requests may have to wait before the processing starts.
There is an SLA that there will be an average of so many requests per hour and each webservice call will have be processed within x seconds.
Is there any theory about how to figure out how best to do this. A lot of testing will be done to compare but is there already some theory for this? What factors need to be considered?
This is a Java program running on Tomcat.