Notice removed Draw attention by CommunityBot

occurred May 21, 2017 at 15:12

Bounty Ended with no winning answer by CommunityBot

occurred May 21, 2017 at 15:12

narrow the scope, move PostgreSQL approach elsewhere

Source Link

edited May 14, 2017 at 9:53

Léo Léopold Hertz 준영

7.1k
30
103
201

How to select on CSV files like SQL inby R sqldf/data.table/dplyr?

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

sqldf
data.table
dplyr

PostgreSQL database

PostgreSQL thoughts

PostgreSQL Schema pseudocode to show what I am trying to do with CSV files

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7
Related: PostgreSQL approach in the relevant thread How to SELECT with two CSV files/… on PostgreSQL?

How to select on CSV files like SQL in R?

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

sqldf
data.table
dplyr

PostgreSQL database

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

How to select on CSV files by R sqldf/data.table/dplyr?

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. Two CSV files where primary key is data_id. data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

sqldf
data.table
dplyr

PostgreSQL Schema pseudocode to show what I am trying to do with CSV files

CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7
Related: PostgreSQL approach in the relevant thread How to SELECT with two CSV files/… on PostgreSQL?

csv postgresql r sql

added 206 characters in body

Source Link

edited May 13, 2017 at 13:31

Léo Léopold Hertz 준영

7.1k
30
103
201

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

sqldf

data.table

dplyr

PostgreSQL database

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query 
# http://stackoverflow.com/a/1307824/54964

Possible approaches where I think sqldf can be sufficient here

sqldf

data.table

dplyr

PostgreSQL database

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

added 206 characters in body

Source Link

edited May 13, 2017 at 13:24

Léo Léopold Hertz 준영

7.1k
30
103
201

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

I have two data CSV files. I am thinking how to query them with R. I really like PostgreSQL so I think it would work here great. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

I know the thread How can I inner join two csv files in R which has a merge option, which I do not want. I have two data CSV files. I am thinking how to query like them like SQL with R. I really like PostgreSQL so I think it would work here great or similar syntax tools of R. Two CSV files where primary key is data_id.

data.csv where OK to have IDs not found in log.csv (etc 4)

data_id, event_value
1, 777
1, 666
2, 111
4, 123 
3, 324
1, 245

log.csv where no duplicates in ID column but duplicates can be in name

data_id, name
1, leo
2, leopold
3, lorem

Pseudocode by partial PostgreSQL syntax

Let data_id=1
Show name and event_value from data.csv and log.csv, respectively

Pseudocode like partial PostgreSQL select

SELECT name, event_value 
    FROM data, log
    WHERE data_id=1;

Expected output

leo, 777
leo, 666 
leo, 245

R approach

file1 <- read.table("file1.csv", col.names=c("data_id", "event_value"))
file2 <- read.table("file2.csv", col.names=c("data_id", "name"))

# TODO here something like the SQL query

PostgreSQL thoughts

Schema

DROP TABLE IF EXISTS data, log;    
CREATE TABLE data (
        data_id SERIAL PRIMARY KEY NOT NULL,
        event_value INTEGER NOT NULL
);
CREATE TABLE log (
        data_id SERIAL PRIMARY KEY NOT NULL,
        name INTEGER NOT NULL
);

R: 3.3.3
OS: Debian 8.7

Notice added Draw attention by Léo Léopold Hertz 준영

occurred May 13, 2017 at 13:11

Bounty Started worth 50 reputation by Léo Léopold Hertz 준영

occurred May 13, 2017 at 13:11

clearer

Source Link

edited May 13, 2017 at 13:09

Léo Léopold Hertz 준영

7.1k
30
103
201

Loading

Source Link

asked May 7, 2017 at 18:37

Léo Léopold Hertz 준영

7.1k
30
103
201

Loading

Stack Exchange Network

Return to Question

How to select on CSV files like SQL inby R sqldf/data.table/dplyr?

R approach

PostgreSQL thoughts

How to select on CSV files like SQL in R?

R approach

PostgreSQL thoughts

How to select on CSV files by R sqldf/data.table/dplyr?

R approach

R approach

PostgreSQL thoughts

R approach

PostgreSQL thoughts

R approach

PostgreSQL thoughts

R approach

PostgreSQL thoughts

PostgreSQL thoughts

R approach

PostgreSQL thoughts