0

I`m trying to learn multithreading programming and I have some questions about the approach that would have to be taken.
So, in my specific case I want to build a program that renames 1000 files and I was thinking to create a worker class:

public class  Worker implements Runnable {

   private List<File> files ;

   public Worker(List<File> f){
       files = f;
   }

   public void run(){
     // read all files from list and rename them
   }
}

and then in main class to do something like:

Worker w1 = new Worker(..list of 500 files...) ;
Worker w2 = new Worker(..list of the other 500 files...) ;

Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w2,"thread2");

t1.start();
t2.start();

Running this brings me no concurrency issues so I do not need synchronized code, but I`m not sure if this is the correct approach...?

Or should I create only one instance of Worker() and pass the entire 1000 files list, and the take care that no matter how many threads access the object thew won`t get the same File from the list ?

i.e :

Worker w1 = new Worker(..list of 1000 files...) ;

Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w1,"thread2");
t1.start();
t2.start();

How should I proceed here ?

3
  • brings me no concurrency issues... Not between t1 and t2 but what about t1 and the main thread (resp t2)? Commented Jun 19, 2014 at 6:31
  • 1
    First measure: the assumption that more threads doing disk I/O, (renaming files) on the same disk at the same time, is faster is usually not true. The underlying file-system provides guarantees that can require locks to be used (e.g. lock a directory when a file is atomically renamed) which in turn means all/most operations are synchronized. The added benefit of being (slightly) faster with multi-threaded code does not always outweigh the simplicity/maintainability of single-threaded code. Commented Jun 19, 2014 at 8:52
  • you are right vanOekel but i found in this program a good opportunity to get some multithreading skills :) Commented Jun 19, 2014 at 13:23

3 Answers 3

4

The First approach you said is correct one. You need to create two Worker as each worker will work on different list of file.

Worker w1 = new Worker(..list of 500 files...) ; // First List
Worker w2 = new Worker(..list of the other 500 files...) ;  // Second List
Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w2,"thread2");

t1.start();
t2.start();

It's simple here two different thread with load of 500 file will execute concurrently.

Sign up to request clarification or add additional context in comments.

Comments

1

A more typical and scalable approach is one of the following:

  • create a collection (likely an array or list) of N threads to perform the work
  • use a thread pool, e.g. from Executors.newFixedThreadPool(N)

You may also wish to use a Producer Consumer pattern in which the threads pull from a common task pool. This allows natural balancing of the work - instead of essentially hard-coding one thread handles 500 tasks and the other the same number.

Consider after all what would happen if all of your larger files end up in the bucket handled by the Thread2? The first thread is done/idle and the second thread has to do all of the heavy lifting.

The producer/consumer pooling approach would be to dump all of the work (generated by the Producer's) into a task pool and then the Consumers (your worker threads) bite off small pieces (e.g. one file) at a time. This approach leads to keeping both threads occupied for a similar duration.

Comments

0

In learning multi-threaded programming one of the important insights is that a thread is not a task. By giving a thread a part of the list of items to process you are halfway there but the next step will take you further: constructing the task in such a way that any number of threads can execute it. To do this, you will have to get familiar with the java.util.concurrent classes. These are useful tools to help constructing the tasks.

The example below separates tasks from threads. It uses AtomicInteger to ensure each thread picks a unique task and it uses CountDownLatch to know when all work is done. The example also shows balancing: threads that execute tasks that complete faster, execute more tasks. The example is by no means the only solution - there are other ways of doing this that could be faster, easier, better to maintain, etc..

import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;

public class MultiRename implements Runnable {

public static void main(String[] args) {

    final int numberOfFnames = 50;

    MultiRenameParams params = new MultiRenameParams();
    params.fnameList = new ArrayList<String>();
    for (int i = 0; i < numberOfFnames; i++) {
        params.fnameList.add("fname " + i);
    }
    params.fnameListIndex = new AtomicInteger();

    final int numberOfThreads = 3;

    params.allDone = new CountDownLatch(numberOfThreads);
    ExecutorService tp = Executors.newCachedThreadPool();
    System.out.println("Starting");
    for (int i = 0; i < numberOfThreads; i++) {
        tp.execute(new MultiRename(params, i));
    }
    try { params.allDone.await(); } catch (Exception e) {
        e.printStackTrace();
    }
    tp.shutdownNow();
    System.out.println("Finished");
}

private final MultiRenameParams params;
private final Random random = new Random();
// Just to show there are fast and slow tasks.
// Thread with lowest delay should get most tasks done.
private final int delay;

public MultiRename(MultiRenameParams params, int delay) {
    this.params = params;
    this.delay = delay;
}

@Override
public void run() {

    final int maxIndex = params.fnameList.size();
    int i = 0;
    int count = 0;
    while ((i = params.fnameListIndex.getAndIncrement()) < maxIndex) {
        String fname = params.fnameList.get(i);
        long sleepTimeMs = random.nextInt(10) + delay;
        System.out.println(Thread.currentThread().getName() + " renaming " + fname + " for " + sleepTimeMs + " ms.");
        try { Thread.sleep(sleepTimeMs); } catch (Exception e) {
            e.printStackTrace();
            break;
        }
        count++;
    }
    System.out.println(Thread.currentThread().getName() + " done, renamed " + count + " files.");
    params.allDone.countDown();
}

static class MultiRenameParams {

    List<String> fnameList;
    AtomicInteger fnameListIndex;
    CountDownLatch allDone;
}

}

2 Comments

"a thread is not a task." True, but you can design a system that runs each task in its own new thread (don't ask me how I know!). It's not a smart idea though if the cost of doing the task is not substantially greater than the cost of creating and destroying the threads, and it may not be smart even then.
@jameslarge Which is where a ThreadPool comes in. I once refactored a program to use one ThreadPool and replace new Thread(task).start() with ThreadPool.execute(task): it was relative easy to do and the performance increase was noticeable. Only thing to watch out for is the amount of similar tasks running at the same time, e.g. limit the amount of tasks working with local files to 8 (or whatever is appropriate) instead of letting the ThreadPool grow to hundreds of threads executing tasks all using the same (limited) resource.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.