3
\$\begingroup\$

The code will ask for a directory and will then read the CSV file and then generate a file in that directory describing a social network for Pajek.

Does anyone know a way to load this CSV file to a Pajek file faster in Java?

Here's the csv file i want to load to a Pajek file. It contains 72180 lines, and looks like:

"","people","committers","repositoryCommitter","authors","repositoryAuthor","repository_id"
"1",1,921,183,896,178,1
"2",1,921,183,896,178,2
"3",1,921,183,896,178,6
"4",1,921,183,896,178,7
…

I have the working code and it is working perfectly fine but it loads very slow. Does anyone know how to fix this solution?

package network;


import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Scanner;


public class NetworkBuilder
{
    static String line;
    static BufferedReader br1 = null, br2 =null;
    static ArrayList<String> pList = new ArrayList<String>();
    static ArrayList<String> pdata = new ArrayList<String>();
    static ArrayList<String> rList = new ArrayList<String>();


    public static void main(String[] args) throws IOException
    {

        String fileContent1 = "*Vertices " ;
        String fileContent2 = "*Edges" ;

        System.out.println("Enter your current directory: ");
        Scanner scanner = new Scanner(System.in);
        String directory = scanner.nextLine();

        try
        {
            br1 =  new BufferedReader(new FileReader(directory + "//people.csv"));
            br2 =  new BufferedReader(new FileReader(directory + "//repo.csv"));

        } catch(FileNotFoundException e)
        {
            System.out.println(e.getMessage() + " \n file not found re-run and try again");
            System.exit(0);
        }
        int count = 0;
        try {
            while((line = br1.readLine()) != null){ //skip first line
            while((line = br1.readLine()) != null)
            {
                pList.add(line); // add to array list
                count++ ;   
            }
            }

        } catch (IOException error) 
        {
            System.out.println(error.getMessage() + "Error reading file");
        }

        System.out.println("Process completed go to directory to see file");
        PrintStream myconsole = new PrintStream(new File(directory + "network.net"));
        System.setOut(myconsole);

        /**************Vertices ***************/

        int size = pList.size();
        int idstatus = 0; 
        int vert = 0;

        /*
         * for loop to count different people_id (*Vertices __ )
         */
        for(int i=0; i < size; i++)
        {

        String[] data=(pList.get(i)).split(",");
        if(idstatus!=Integer.parseInt(data[1])) //Skip same people_id eg (2 2)
        {
            vert++;
            idstatus = Integer.parseInt(data[1]); //identify people_id
        }
        }
        idstatus = 0;  //reset to 0 (people_id)
        System.out.println(fileContent1 +vert);


        /*
         * for loop to print the people_id without repeating the same id
         */
        for(int i=0; i < size; i++)
        {

           String[] data=(pList.get(i)).split(",");
            if(idstatus!=Integer.parseInt(data[1]))
            {
                System.out.println(data[1]);
                idstatus = Integer.parseInt(data[1]);
            }
        }


        /************* Edges****************/
        System.out.println(fileContent2);

        int[] states = new int[vert]; //to declare for later storing of vertices
        idstatus=0; //reset to 0

        /*
         * for loop to store vertices
         */
        for(int i=0; i < size; i++)
        {   
            String[] data=(pList.get(i)).split(",");
            if(idstatus!=Integer.parseInt(data[1])) 
            {

                idstatus = Integer.parseInt(data[1]); 
                states[idstatus-1]=idstatus; //to store vertices

            }
        }

        /*****************Weight*****************/
        idstatus=0;
        int[] repo = new int[count];
        int[] repo2 = new int[count];

        int vert1=0;
        int common=0;   


                for(int b=0; b<states.length; b++)
                {
                    vert1 = b+1;
                    for(int c=0; c<count;c++) // store repoid 1
                    {
                        String[] data=(pList.get(c)).split(",");
                        if(Integer.parseInt(data[1])==states[b])   // store repoid of all peopleid 1
                        {
                            repo[c]=Integer.parseInt(data[6]);

                        }   
                    }

                    for(int d=0; d<states.length; d++)
                    {
                        if(states[d]!=vert1)
                        {
                            for(int c=0; c<count;c++) // store repoid 2
                            {
                                String[] data=(pList.get(c)).split(",");
                                if(Integer.parseInt(data[1])==states[d]) 
                                {
                                    repo2[c]=Integer.parseInt(data[6]);  
                                }   
                            }

                            //Compare
                            for(int e=0; e<repo.length; e++)
                            {

                                for(int f=0; f<repo2.length; f++)
                                {

                                    if(repo[e]==repo2[f]&&repo[e]!=0&&repo2[f]!=0)
                                    {                                                                   
                                        common++;
                                    }

                                }
                            }

                            //remove null values 
                            if(common!=0){
                            System.out.println(vert1+" "+(d+1)+" "+common ); 
                            }
                            common=0;
                            // clear
                            for(int g=0; g<repo2.length; g++)
                            {
                                repo2[g]=0;
                            }
                        }
                    }

                // clear
                    for(int a=0; a<repo.length; a++)
                    {
                        repo[a]=0;
                    }

                }                                           

    } // end of main

}

The ideal output is should turn out like:

*Vertices 5923 1

2

3

4

...

*Edges

1 4 1

1 25 1

until 5923...

\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

I think your problem is not your code but your algorithm. There are multiple nested for-loops on arrays which have 5000 and 70000 of elements, so the algoritm has to be improved.

Alone this loop (which is still nested in other loops, so it runs many thousands times) does 70000*70000 iterations.

for(int e=0; e<repo.length; e++) {
   for(int f=0; f<repo2.length; f++) {
     if(repo[e]==repo2[f]&&repo[e]!=0&&repo2[f]!=0){                                                                   
        common++;
     }
  }
}

Maybe you could describe your problem and algorithm in more detail. This is probably a graph problem which has been solved by some graph algorithm, but in order to apply it to your problem, a description of the problem, the input file und the output file would be needed.

EDIT: I am not sure if I understood the problem correctly, but can you not build a HashMap in a first pass, which saves for every repository the users which are connected to them. Then, in a second pass you can retrieve for every user the repositories and the connected users.

for example

HashMap<Integer,List<Integer>>map =new HashMap<>();
while (/* file is being read */){
   int repositoryId = // read...
   int userId = // read...
   if (map.get(repositoryId)==null){
      List<Integer> userList = new LinkedList<>();
      map.put(repositoryId,userList);
   }
   map.get(repositoryId).put(userId);
}
// second pass
while (/* file is being read */){
   int repositoryId = // read...
   int userId = // read...
   List<Integer> usersOfRepository = map.get(repositoryId);
   // process edges
}
\$\endgroup\$
5
  • \$\begingroup\$ Is there a way to fix this? \$\endgroup\$ Commented Jun 27, 2015 at 17:24
  • \$\begingroup\$ The input file is from the people.csv file and it will then read the file. The vertices are the people ids which is the first column from the csv file and the edges are the common repository which has been written down in the 6th column of the csv. The weight of the edge are the common repositories between the two people. The output would write the vertices and edges format in Pajek file. \$\endgroup\$ Commented Jun 27, 2015 at 20:18
  • \$\begingroup\$ I have not use HashMap in Java before and I've watched a few tutorials of it but I'm still not sure how to use it in my case. Do you know how? \$\endgroup\$ Commented Jun 27, 2015 at 20:54
  • \$\begingroup\$ added example for HashMap \$\endgroup\$ Commented Jun 27, 2015 at 21:32
  • \$\begingroup\$ I'm not sure how that works. how do i read after initialize the variable? And where do i proceed in the process edges? \$\endgroup\$ Commented Jun 28, 2015 at 7:44

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.