1

I have to compare two files with the following format:

Manufacturer,Model,Key
----------------------
Honda,Civic,12
Honda,Civic,13
BMW,z3,14
BMW,X3,15
BMW,z3,16

The files are identical only if the keys are identical for each model and manufacturer (the same order), but models and manufacturers can have different order. For example, the file above mentioned it's identical to:

Honda,Civic,12
BMW,z3,14
Honda,Civic,13
BMW,z3,16
BMW,X3,15

But is not identical with (different order for Civic's keys):

Honda,Civic,13
Honda,Civic,12
BMW,z3,14
BMW,X3,15
BMW,z3,16

Or(different value for BMW z3 key):

Honda,Civic,13
Honda,Civic,12
BMW,z3,16
BMW,X3,15
BMW,z3,16

Which would be the best approach to write a java program that is able two compare files this way? I know that the easiest way is to use some unix commands (use sort to get all manufacturers, grep to get rows for each manufacturer, use sort to get all model and grep again), but I have to use Java. Solutions:

  1. Read those files and add each row to a Map>> structure and after that compare each list from list from this structure. Will it work? How costly/fast will be id there are 100.000 rows in each file?
  2. Try to simulate sort and grep commands using java code(as far as I know is not easy).
  3. Iterate over files for each model from each manufacturer (maybe there will be 5.000 iterations) Any ideas?

Thanks!

2 Answers 2

2

Use a Map<String, List<String>>. The key is the manufacturer and model, the value is a list of the keys for that combination (or the entire line, doesn't matter). I use String because there's no need to parse these things into more specific structures, but you can do so if you like the design better.

Parse each file into such a structure. If afterwards, the maps are equal for the two files, the files are equivalent.

Sign up to request clarification or add additional context in comments.

1 Comment

Ok, this was going to be my answer :)
0

If you're dealing with 100000+ rows, you could use java.util.zip.CRC32 on your list of id's for each make/model. A Map<String,Checksum>, would have a small memory footprint, and would involve only one comparison per make/model at the end.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.