Skip to main content
Became Hot Network Question
added 9 characters in body
Source Link

I'm stuck creating an awk script that prepares a csv file before analysis. I need to create an output file with columns 1-2, 10, 13-15, 19-21. Also I need to replace the numbers on column 2 to the days of the week (so, 1 = Monday, 2 = Tuesday...) and convert the 21th column from nautical miles to km; and delete ""of columns 10, 13 and 14.

Input:

"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,

Output:

"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

So far, I've got the command to take the columns needed:

cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'

And also the code to replace the numbers in column 2 with their respective days of the week:

awk 'BEGIN {FS = OFS = ","} 
     $2 == 1 {$2 = "Monday"} 
     $2 == 2 {$2 = "Tuesday"} 
     $2 == 3 {$2 = "Wednesday"} 
     $2 == 4 {$2 = "Thursday"} 
     $2 == 5 {$2 = "Friday"} 
     $2 == 6 {$2 = "Saturday"} 
     $2 == 7 {$2 = "Sunday"} 
     {print}' 
file.csv

I am also missing a way to wrap all the code into the script to execute it later.

I'm stuck creating an awk script that prepares a csv file before analysis. I need to create an output file with columns 1-2, 10, 13-15, 19-21. Also I need to replace the numbers on column 2 to the days of the week (so, 1 = Monday, 2 = Tuesday...) and convert the 21th column from miles to km; and delete ""of columns 10, 13 and 14.

Input:

"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,

Output:

"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

So far, I've got the command to take the columns needed:

cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'

And also the code to replace the numbers in column 2 with their respective days of the week:

awk 'BEGIN {FS = OFS = ","} 
     $2 == 1 {$2 = "Monday"} 
     $2 == 2 {$2 = "Tuesday"} 
     $2 == 3 {$2 = "Wednesday"} 
     $2 == 4 {$2 = "Thursday"} 
     $2 == 5 {$2 = "Friday"} 
     $2 == 6 {$2 = "Saturday"} 
     $2 == 7 {$2 = "Sunday"} 
     {print}' 
file.csv

I am also missing a way to wrap all the code into the script to execute it later.

I'm stuck creating an awk script that prepares a csv file before analysis. I need to create an output file with columns 1-2, 10, 13-15, 19-21. Also I need to replace the numbers on column 2 to the days of the week (so, 1 = Monday, 2 = Tuesday...) and convert the 21th column from nautical miles to km; and delete ""of columns 10, 13 and 14.

Input:

"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,

Output:

"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

So far, I've got the command to take the columns needed:

cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'

And also the code to replace the numbers in column 2 with their respective days of the week:

awk 'BEGIN {FS = OFS = ","} 
     $2 == 1 {$2 = "Monday"} 
     $2 == 2 {$2 = "Tuesday"} 
     $2 == 3 {$2 = "Wednesday"} 
     $2 == 4 {$2 = "Thursday"} 
     $2 == 5 {$2 = "Friday"} 
     $2 == 6 {$2 = "Saturday"} 
     $2 == 7 {$2 = "Sunday"} 
     {print}' 
file.csv

I am also missing a way to wrap all the code into the script to execute it later.

edited tags
Link
Kusalananda
  • 355.8k
  • 42
  • 735
  • 1.1k
Missing quotes on input first column
Source Link

I'm stuck creating an awk script that prepares a csv file before analysis. I need to create an output file with columns 1-2, 10, 13-15, 19-21. Also I need to replace the numbers on column 2 to the days of the week (so, 1 = Monday, 2 = Tuesday...) and convert the 21th column from miles to km; and delete ""of columns 10, 13 and 14.

Input:

DAY_OF_MONTH"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,

Output:

"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

So far, I've got the command to take the columns needed:

cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'

And also the code to replace the numbers in column 2 with their respective days of the week:

awk 'BEGIN {FS = OFS = ","} 
     $2 == 1 {$2 = "Monday"} 
     $2 == 2 {$2 = "Tuesday"} 
     $2 == 3 {$2 = "Wednesday"} 
     $2 == 4 {$2 = "Thursday"} 
     $2 == 5 {$2 = "Friday"} 
     $2 == 6 {$2 = "Saturday"} 
     $2 == 7 {$2 = "Sunday"} 
     {print}' 
file.csv

I am also missing a way to wrap all the code into the script to execute it later.

I'm stuck creating an awk script that prepares a csv file before analysis. I need to create an output file with columns 1-2, 10, 13-15, 19-21. Also I need to replace the numbers on column 2 to the days of the week (so, 1 = Monday, 2 = Tuesday...) and convert the 21th column from miles to km; and delete ""of columns 10, 13 and 14.

Input:

DAY_OF_MONTH,"DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,

Output:

"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

So far, I've got the command to take the columns needed:

cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'

And also the code to replace the numbers in column 2 with their respective days of the week:

awk 'BEGIN {FS = OFS = ","} 
     $2 == 1 {$2 = "Monday"} 
     $2 == 2 {$2 = "Tuesday"} 
     $2 == 3 {$2 = "Wednesday"} 
     $2 == 4 {$2 = "Thursday"} 
     $2 == 5 {$2 = "Friday"} 
     $2 == 6 {$2 = "Saturday"} 
     $2 == 7 {$2 = "Sunday"} 
     {print}' 
file.csv

I am also missing a way to wrap all the code into the script to execute it later.

I'm stuck creating an awk script that prepares a csv file before analysis. I need to create an output file with columns 1-2, 10, 13-15, 19-21. Also I need to replace the numbers on column 2 to the days of the week (so, 1 = Monday, 2 = Tuesday...) and convert the 21th column from miles to km; and delete ""of columns 10, 13 and 14.

Input:

"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,

Output:

"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

So far, I've got the command to take the columns needed:

cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'

And also the code to replace the numbers in column 2 with their respective days of the week:

awk 'BEGIN {FS = OFS = ","} 
     $2 == 1 {$2 = "Monday"} 
     $2 == 2 {$2 = "Tuesday"} 
     $2 == 3 {$2 = "Wednesday"} 
     $2 == 4 {$2 = "Thursday"} 
     $2 == 5 {$2 = "Friday"} 
     $2 == 6 {$2 = "Saturday"} 
     $2 == 7 {$2 = "Sunday"} 
     {print}' 
file.csv

I am also missing a way to wrap all the code into the script to execute it later.

Source Link
Loading