1

Currently, I'm running on a thread-less model that isn't working simply because I'm running out of memory before I can process the data I'm being handed. I've made all the changes that I can to optimize the code, and it's still just not quite quick enough.

Clearly I should move on to a threaded model. I'm wondering what the simplest, easiest way to do the following is:

  • The main thread passes some info to the worker
  • That worker performs some work that I'll refactor out of the main method
  • The workers will disappear and new ones will be instantiated when needed

I've never worked with java threading and from what I've read up on it seems pretty complicated, even if what I'm looking for seems pretty simple.

10
  • 5
    What makes you think this will save you any memory? The processing isn't changed. Commented Oct 31, 2012 at 0:03
  • 1
    ExecutorService's handle everything except the actual working logic for you. Commented Oct 31, 2012 at 0:03
  • 2
    Adding multithreading will likely increase, not decrease, memory requirements because multiple threads will handle data concurrently. Perhaps it will be useful to discuss the actual problem you are solving and ask how its memory requirements can be reduced. Commented Oct 31, 2012 at 0:03
  • To play devil's advocate, the "work" could be on a portion of the data loaded into memory, and the time to work on a block of data potentially takes longer than it does for new data to appear. Adding in multiple threads, while adding more overhead, provides the opportunity to work on multiple blocks, potentially bypassing the slower processing issue. Commented Oct 31, 2012 at 0:05
  • @pickypg is correct: the data is actually following high-volume events on twitter. Because of the threadless nature of the way the data is currently being processed, the data is buffering and I am eventually getting OOM errors. Commented Oct 31, 2012 at 0:10

3 Answers 3

1

If you have multiple independent units of work of equal priority, the best solution is generally some sort of work queue, where a limited number of threads (the number chosen to optimize performance) sit in a while(true) loop dequeuing work units from the queue and executing them.

Generally the optimum number of threads is going to be the number of processors +/- 1, though in some cases a larger number will be optimal if the threads tend to get stalled by disk I/O requests or some such.

But keep in mind that tuning the entire system may be required. Eg, you may need more disk arms, and certainly more RAM may be required.

Sign up to request clarification or add additional context in comments.

6 Comments

Yeah, a LinkedBlockingQueue would appear to be the queue of choice.
If the units of work have equal priority, then it inherently doesn't matter what finishes first, and therefore a vanilla, ThreadPooledExecutor will be even easier to implement, as takes care of absolutely everything beyond boxing up the units of work. Using the queue is doing more work than is necessary beyond Java 4. Using a PriorityBlockingQueue would be useful if priority becomes an issue.
Yep, I've not examined the ThreadPoolExecutor, but it may very well be an implementation of a work queue and thread pool as you suggest. Though in this case there may be an advantage to "roll your own" in that it would be easier to instrument. And if there is a good queue implementation available, "roll your own" is not at all difficult -- can be accomplished in a few dozen lines.
As long as the queue, the input to the queue and the queue processors sit in the same engine, it is very unlikely that this will result in less memory consumption.
Granted, what you said is true, in Java, past 1.4, there is not a single reason to roll your own thread management, with the single exception of managing thread prioritization (not my -1 though, as it's still a viable solution, just extra steps are necessary).
|
0

I'd start by having a read through Java Concurrency as refresher ;)

In particular, I would spend some time getting to know the Executors API as it will do most of what you've described without a lot of the overhead of dealing with to many locks ;)

1 Comment

That's a very fluffy 'read the standard text' answer, -1 for not addressing the underlying issues :(
0

Distributing the memory consumption to multiple threads will not change overall memory consumption. From what I read out of your question, I would like to step forward and tell you: Increase the heap of the Java engine, this will help. Looks like you have to optimize the Java startup parameters and not your code. If I am wrong, then you will have to buffer the data. To Disk! Not to a thread in the same memory model.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.