Skip to content
Advertisement

Join two csv files

csvfile1

status,longitude,latitude,timestamp    
ok,10.12,17.45,14569003    
ok,11.34,16.78,14569000

csvfile2

weather,timestamp,latitude1,longitude1,latitude2,longitude2
rainy,14569003,17.45,10.12,17.50,11.25    
sunny,14569000,13.76,12.44,16.78,11.34

expected output

status,weather,longitude,latitude,timestamp    
ok,rainy,10.12,17.45,14569003    
ok,sunny,11.34,16.78,14569000    

I would like to combine the columns longitude,latitude and timestamp of both the files.

There are two longitudes and two latitudes in csvfile2. So i want to compare if it matches any one of the longitude-latitude pairs along with the timestamp.

And the column name order is also different in both the files.

Any help would be appreciated.

Thank you.

Advertisement

Answer

awk solution:

join_csv.awk script:

#!/bin/awk -f
BEGIN {
    FS=OFS=",";   # field separator
    print "status,weather,longitude,latitude,timestamp"  # header line
}
NR==FNR && NR>1 {          # processing the first file
    a[$4]=$1 FS $2 FS $3   # accumulating the needed values (status, longitude, latitude) 
}
FNR>1 {                    # processing the second file
    if ($2 in a) {         # if `timestamp` matches                                                                                                                                             
        split(a[$2],data,FS);  # extracting items for further comparison
        if ((data[2]==$4 || data[2]==$6) && (data[3]==$3 || data[3]==$5)) {
            print data[1],$1,data[2],data[3],$2
        }
    }
}

Usage:

awk -f join_csv.awk file1 file2

The output:

status,weather,longitude,latitude,timestamp
ok,rainy,10.12,17.45,14569003
ok,sunny,11.34,16.78,14569000
Advertisement