Skip to main content
Notice removed Draw attention by CommunityBot
Bounty Ended with no winning answer by CommunityBot
narrow the scope, move PostgreSQL approach elsewhere
Source Link

How to select on CSV files like SQL inby R sqldf/data.table/dplyr?

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

   data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

  1. sqldf
  2. data.table
  3. dplyr
  4. PostgreSQL database

PostgreSQL thoughts

PostgreSQL Schema pseudocode to show what I am trying to do with CSV files

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7
Related: PostgreSQL approach in the relevant thread How to SELECT with two CSV files/… on PostgreSQL?

How to select on CSV files like SQL in R?

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

 data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

  1. sqldf
  2. data.table
  3. dplyr
  4. PostgreSQL database

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

How to select on CSV files by R sqldf/data.table/dplyr?

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. Two CSV files where primary key is data_id.  data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

  1. sqldf
  2. data.table
  3. dplyr

PostgreSQL Schema pseudocode to show what I am trying to do with CSV files

CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7
Related: PostgreSQL approach in the relevant thread How to SELECT with two CSV files/… on PostgreSQL?

added 206 characters in body
Source Link

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

  1. sqldf
  2. data.table
  3. dplyr
  4. PostgreSQL database

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

  1. sqldf
  2. data.table
  3. dplyr
  4. PostgreSQL database

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

added 206 characters in body
Source Link

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

I have two data CSV files. I am thinking how to query them with R. I really like PostgreSQL so I think it would work here great. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

  1. Let data_id=1
  2. Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

Notice added Draw attention by Léo Léopold Hertz 준영
Bounty Started worth 50 reputation by Léo Léopold Hertz 준영
clearer
Source Link
Loading
Source Link
Loading