csvfile1
status,longitude,latitude,timestamp ok,10.12,17.45,14569003 ok,11.34,16.78,14569000
csvfile2
weather,timestamp,latitude1,longitude1,latitude2,longitude2 rainy,14569003,17.45,10.12,17.50,11.25 sunny,14569000,13.76,12.44,16.78,11.34
expected output
status,weather,longitude,latitude,timestamp ok,rainy,10.12,17.45,14569003 ok,sunny,11.34,16.78,14569000
I would like to combine the columns longitude,latitude and timestamp of both the files.
There are two longitudes and two latitudes in csvfile2. So i want to compare if it matches any one of the longitude-latitude pairs along with the timestamp.
And the column name order is also different in both the files.
Any help would be appreciated.
Thank you.
Advertisement
Answer
awk solution:
join_csv.awk script:
#!/bin/awk -f BEGIN { FS=OFS=","; # field separator print "status,weather,longitude,latitude,timestamp" # header line } NR==FNR && NR>1 { # processing the first file a[$4]=$1 FS $2 FS $3 # accumulating the needed values (status, longitude, latitude) } FNR>1 { # processing the second file if ($2 in a) { # if `timestamp` matches split(a[$2],data,FS); # extracting items for further comparison if ((data[2]==$4 || data[2]==$6) && (data[3]==$3 || data[3]==$5)) { print data[1],$1,data[2],data[3],$2 } } }
Usage:
awk -f join_csv.awk file1 file2
The output:
status,weather,longitude,latitude,timestamp ok,rainy,10.12,17.45,14569003 ok,sunny,11.34,16.78,14569000