I know the thread How can I inner join two csv files in R which has a merge option, which I do not want.
I have two data CSV files. I am thinking how to query like them like SQL with R.
I really like PostgreSQL so I think it would work here great or similar syntax tools of R.
Two CSV files where primary key is data_id.
data.csv where OK to have IDs not found in log.csv (etc 4)
data_id, event_value
1, 777
1, 666
2, 111
4, 123
3, 324
1, 245
log.csv where no duplicates in ID column but duplicates can be in name
data_id, name
1, leo
2, leopold
3, lorem
Pseudocode by partial PostgreSQL syntax
- Let
data_id=1 - Show
nameandevent_valuefromdata.csvandlog.csv, respectively
Pseudocode like partial PostgreSQL select
SELECT name, event_value
FROM data, log
WHERE data_id=1;
Expected output
leo, 777
leo, 666
leo, 245
R approach
file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))
# TODO here something like the SQL query
# http://stackoverflow.com/a/1307824/54964
Possible approaches where I think sqldf can be sufficient here
sqldfdata.tabledplyr- PostgreSQL database
PostgreSQL thoughts
PostgreSQL Schema pseudocode to show what I am trying to do with CSV files
DROP TABLE IF EXISTS data, log;
CREATE TABLE data (
data_id SERIAL PRIMARY KEY NOT NULL,
event_value INTEGER NOT NULL
);
CREATE TABLE log (
data_id SERIAL PRIMARY KEY NOT NULL,
name INTEGER NOT NULL
);
R: 3.3.3
OS: Debian 8.7
Related: PostgreSQL approach in the relevant thread How to SELECT with two CSV files/… on PostgreSQL?