Skip to main content
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
added 2 characters in body; edited tags
Source Link
Jeff Schaller
  • 68.8k
  • 35
  • 122
  • 264

I have two .csv files that I need to match based on column 1.

The two file structures look like this.

FILE1

gopAga1_00004004-RA,1122.825534,    -2.497919969,   0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316,    0.312853434 

gopAga1_00007012-RA, 13.37565185,   -1.973108929,   0.380227982

etc...

FILE2

gopAga1_00004004-RA,    ENSACAP00000013845

gopAga1_00009937-RA,    ENSACAP00000000905

gopAga1_00010932-RA,    ENSACAP00000003279

gopAga1_00000875-RA,    ENSACAP00000000296

gopAga1_00010837-RA,    ENSACAP00000011919

gopAga1_00007012-RA,    ENSACAP00000012682

gopAga1_00017831-RA,    ENSACAP00000016147

gopAga1_00005588-RA,    ENSACAP00000011117

etc..

This is my current command that I am running using join:

This is formatted from what I have also read on the following threads here

join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -k 1 healthy_vs_unhealthy_de.csv) <(sort RBH.csv) > output.txt

However everytime, every time I run this prompt it only writes the first row to output.

Anyone know why my code is running like this and not actually merging the two files based on the GOP ID?

I have two .csv files that I need to match based on column 1.

The two file structures look like this.

FILE1

gopAga1_00004004-RA,1122.825534,    -2.497919969,   0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316,    0.312853434 

gopAga1_00007012-RA, 13.37565185,   -1.973108929,   0.380227982

etc...

FILE2

gopAga1_00004004-RA,    ENSACAP00000013845

gopAga1_00009937-RA,    ENSACAP00000000905

gopAga1_00010932-RA,    ENSACAP00000003279

gopAga1_00000875-RA,    ENSACAP00000000296

gopAga1_00010837-RA,    ENSACAP00000011919

gopAga1_00007012-RA,    ENSACAP00000012682

gopAga1_00017831-RA,    ENSACAP00000016147

gopAga1_00005588-RA,    ENSACAP00000011117

etc..

This is my current command that I am running using join:

This is formatted from what I have also read on the following threads here

join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -k 1 healthy_vs_unhealthy_de.csv) <(sort RBH.csv) > output.txt

However everytime I run this prompt it only writes the first row to output.

Anyone know why my code is running like this and not actually merging the two files based on the GOP ID?

I have two .csv files that I need to match based on column 1.

The two file structures look like this.

FILE1

gopAga1_00004004-RA,1122.825534,    -2.497919969,   0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316,    0.312853434 

gopAga1_00007012-RA, 13.37565185,   -1.973108929,   0.380227982

etc...

FILE2

gopAga1_00004004-RA,    ENSACAP00000013845

gopAga1_00009937-RA,    ENSACAP00000000905

gopAga1_00010932-RA,    ENSACAP00000003279

gopAga1_00000875-RA,    ENSACAP00000000296

gopAga1_00010837-RA,    ENSACAP00000011919

gopAga1_00007012-RA,    ENSACAP00000012682

gopAga1_00017831-RA,    ENSACAP00000016147

gopAga1_00005588-RA,    ENSACAP00000011117

etc..

This is my current command that I am running using join:

This is formatted from what I have also read on the following threads here

join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -k 1 healthy_vs_unhealthy_de.csv) <(sort RBH.csv) > output.txt

However, every time I run this prompt it only writes the first row to output.

Anyone know why my code is running like this and not actually merging the two files based on the GOP ID?

I have two .csv files that I need to match based on column 1.

The two file structures look like this.

FILE1

gopAga1_00004004-RA,1122.825534, -2.497919969, 0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316, 0.312853434

gopAga1_00007012-RA, 13.37565185, -1.973108929, 0.380227982

etc...

gopAga1_00004004-RA,1122.825534,    -2.497919969,   0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316,    0.312853434 

gopAga1_00007012-RA, 13.37565185,   -1.973108929,   0.380227982

etc...

FILE2

gopAga1_00004004-RA, ENSACAP00000013845

gopAga1_00009937-RA, ENSACAP00000000905

gopAga1_00010932-RA, ENSACAP00000003279

gopAga1_00000875-RA, ENSACAP00000000296

gopAga1_00010837-RA, ENSACAP00000011919

gopAga1_00007012-RA, ENSACAP00000012682

gopAga1_00017831-RA, ENSACAP00000016147

gopAga1_00005588-RA, ENSACAP00000011117

etc..

gopAga1_00004004-RA,    ENSACAP00000013845

gopAga1_00009937-RA,    ENSACAP00000000905

gopAga1_00010932-RA,    ENSACAP00000003279

gopAga1_00000875-RA,    ENSACAP00000000296

gopAga1_00010837-RA,    ENSACAP00000011919

gopAga1_00007012-RA,    ENSACAP00000012682

gopAga1_00017831-RA,    ENSACAP00000016147

gopAga1_00005588-RA,    ENSACAP00000011117

etc..

This is my current command that I am running using join:

This is formatted from what I havhave also read on the following threads here

join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -k 1 healthy_vs_unhealthy_de.csv) <(sort RBH.csv) > output.txt

However everytime I run this prompt it only writes the first row to output.

Anyone know why my code is running like this and not actually merging the two files based on the gopGOP ID?

I have two .csv files that I need to match based on column 1.

The two file structures look like this.

FILE1

gopAga1_00004004-RA,1122.825534, -2.497919969, 0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316, 0.312853434

gopAga1_00007012-RA, 13.37565185, -1.973108929, 0.380227982

etc...

FILE2

gopAga1_00004004-RA, ENSACAP00000013845

gopAga1_00009937-RA, ENSACAP00000000905

gopAga1_00010932-RA, ENSACAP00000003279

gopAga1_00000875-RA, ENSACAP00000000296

gopAga1_00010837-RA, ENSACAP00000011919

gopAga1_00007012-RA, ENSACAP00000012682

gopAga1_00017831-RA, ENSACAP00000016147

gopAga1_00005588-RA, ENSACAP00000011117

etc..

This is my current command that I am running using join:

This is formatted from what I hav also read on the following threads here

join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -k 1 healthy_vs_unhealthy_de.csv) <(sort RBH.csv) > output.txt

However everytime I run this prompt it only writes the first row to output.

Anyone know why my code is running like this and not actually merging the two files based on the gop ID?

I have two .csv files that I need to match based on column 1.

The two file structures look like this.

FILE1

gopAga1_00004004-RA,1122.825534,    -2.497919969,   0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316,    0.312853434 

gopAga1_00007012-RA, 13.37565185,   -1.973108929,   0.380227982

etc...

FILE2

gopAga1_00004004-RA,    ENSACAP00000013845

gopAga1_00009937-RA,    ENSACAP00000000905

gopAga1_00010932-RA,    ENSACAP00000003279

gopAga1_00000875-RA,    ENSACAP00000000296

gopAga1_00010837-RA,    ENSACAP00000011919

gopAga1_00007012-RA,    ENSACAP00000012682

gopAga1_00017831-RA,    ENSACAP00000016147

gopAga1_00005588-RA,    ENSACAP00000011117

etc..

This is my current command that I am running using join:

This is formatted from what I have also read on the following threads here

join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -k 1 healthy_vs_unhealthy_de.csv) <(sort RBH.csv) > output.txt

However everytime I run this prompt it only writes the first row to output.

Anyone know why my code is running like this and not actually merging the two files based on the GOP ID?

Source Link
cdxun
  • 11
  • 1
  • 3

Join two csv files by matching columns, join command

I have two .csv files that I need to match based on column 1.

The two file structures look like this.

FILE1

gopAga1_00004004-RA,1122.825534, -2.497919969, 0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316, 0.312853434

gopAga1_00007012-RA, 13.37565185, -1.973108929, 0.380227982

etc...

FILE2

gopAga1_00004004-RA, ENSACAP00000013845

gopAga1_00009937-RA, ENSACAP00000000905

gopAga1_00010932-RA, ENSACAP00000003279

gopAga1_00000875-RA, ENSACAP00000000296

gopAga1_00010837-RA, ENSACAP00000011919

gopAga1_00007012-RA, ENSACAP00000012682

gopAga1_00017831-RA, ENSACAP00000016147

gopAga1_00005588-RA, ENSACAP00000011117

etc..

This is my current command that I am running using join:

This is formatted from what I hav also read on the following threads here

join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -k 1 healthy_vs_unhealthy_de.csv) <(sort RBH.csv) > output.txt

However everytime I run this prompt it only writes the first row to output.

Anyone know why my code is running like this and not actually merging the two files based on the gop ID?