Skip to main content
added 659 characters in body
Source Link
Wildcard
  • 37.5k
  • 30
  • 149
  • 284

What you want is the join command, which is specified by POSIX.

Here is your example pseudocode command:

sqlthingy -i tbl1.csv tbl2.csv -o 'select 1,2,3 from tbl1, tbl2 where tbl1.1 = tbl2.1'

Here is an actual working command using join that is equivalent:

join -t, tbl1.csv tbl2.csv

If both files have only two fields, comma separated, this join command is exactly what you represent in pseudocode.

If they have more fields but you only want up to the second field from each file, still joining on the first field, you would use:

join -t, -o 0,1.2,2.2 tbl1.csv tbl2.csv

If you want to join on a different field there are flags for that, as well.

It isn't a full-fledged RDBMS; for instance you are limited to just two files and a single join field. But for what you requested:

TL;DR - I basically would like a convenient way to do quick and dirty sql joins on csv files. Not looking for a full fledged text based RDBMS, but just a nicer way to do some analysis on csv RDBMS extracts.

It fits the bill perfectly.


You should also check out comm, also specifiedalso specified by POSIX, which is for printing lines common to two files (or only present in one or the other of them, or similar things).

Also note that both join and comm can operate on standard input by using - as a file name.


If you want the equivalent of an SQL "count()" command with a "group by" clause, just get the column you want (which join will sort using the join field, or you can sort yourself if it's directly from a file) and pipe it through uniq -c.


Between Awk, join, uniq, comm, and sort, you can do some pretty fancy stuff with CSVs. And all of it POSIX compliant.

What you want is the join command, which is specified by POSIX.

Here is your example pseudocode command:

sqlthingy -i tbl1.csv tbl2.csv -o 'select 1,2,3 from tbl1, tbl2 where tbl1.1 = tbl2.1'

Here is an actual working command using join that is equivalent:

join -t, tbl1.csv tbl2.csv

If both files have only two fields, comma separated, this join command is exactly what you represent in pseudocode.

If they have more fields but you only want up to the second field from each file, still joining on the first field, you would use:

join -t, -o 0,1.2,2.2 tbl1.csv tbl2.csv

If you want to join on a different field there are flags for that, as well.

It isn't a full-fledged RDBMS; for instance you are limited to just two files and a single join field. But for what you requested:

TL;DR - I basically would like a convenient way to do quick and dirty sql joins on csv files. Not looking for a full fledged text based RDBMS, but just a nicer way to do some analysis on csv RDBMS extracts.

It fits the bill perfectly.


You should also check out comm, also specified by POSIX, which is for printing lines common to two files (or only present in one or the other of them, or similar things).

Also note that both join and comm can operate on standard input by using - as a file name.

What you want is the join command, which is specified by POSIX.

Here is your example pseudocode command:

sqlthingy -i tbl1.csv tbl2.csv -o 'select 1,2,3 from tbl1, tbl2 where tbl1.1 = tbl2.1'

Here is an actual working command using join that is equivalent:

join -t, tbl1.csv tbl2.csv

If both files have only two fields, comma separated, this join command is exactly what you represent in pseudocode.

If they have more fields but you only want up to the second field from each file, still joining on the first field, you would use:

join -t, -o 0,1.2,2.2 tbl1.csv tbl2.csv

If you want to join on a different field there are flags for that, as well.

It isn't a full-fledged RDBMS; for instance you are limited to just two files and a single join field. But for what you requested:

TL;DR - I basically would like a convenient way to do quick and dirty sql joins on csv files. Not looking for a full fledged text based RDBMS, but just a nicer way to do some analysis on csv RDBMS extracts.

It fits the bill perfectly.


You should also check out comm, also specified by POSIX, which is for printing lines common to two files (or only present in one or the other of them, or similar things).

Also note that both join and comm can operate on standard input by using - as a file name.


If you want the equivalent of an SQL "count()" command with a "group by" clause, just get the column you want (which join will sort using the join field, or you can sort yourself if it's directly from a file) and pipe it through uniq -c.


Between Awk, join, uniq, comm, and sort, you can do some pretty fancy stuff with CSVs. And all of it POSIX compliant.

Source Link
Wildcard
  • 37.5k
  • 30
  • 149
  • 284

What you want is the join command, which is specified by POSIX.

Here is your example pseudocode command:

sqlthingy -i tbl1.csv tbl2.csv -o 'select 1,2,3 from tbl1, tbl2 where tbl1.1 = tbl2.1'

Here is an actual working command using join that is equivalent:

join -t, tbl1.csv tbl2.csv

If both files have only two fields, comma separated, this join command is exactly what you represent in pseudocode.

If they have more fields but you only want up to the second field from each file, still joining on the first field, you would use:

join -t, -o 0,1.2,2.2 tbl1.csv tbl2.csv

If you want to join on a different field there are flags for that, as well.

It isn't a full-fledged RDBMS; for instance you are limited to just two files and a single join field. But for what you requested:

TL;DR - I basically would like a convenient way to do quick and dirty sql joins on csv files. Not looking for a full fledged text based RDBMS, but just a nicer way to do some analysis on csv RDBMS extracts.

It fits the bill perfectly.


You should also check out comm, also specified by POSIX, which is for printing lines common to two files (or only present in one or the other of them, or similar things).

Also note that both join and comm can operate on standard input by using - as a file name.