I have two large files (~10GB) as follows:
file1.csv
name,id,dob,year,age,score Mike,1,2014-01-01,2016,2,20 Ellen,2, 2012-01-01,2016,4,35 . .
file2.csv
id,course_name,course_id 1,math,101 1,physics,102 1,chemistry,103 2,math,101 2,physics,102 2,chemistry,103 . .
I want to subtract 1 from the “id” columns of these files:
file1_updated.csv
name,id,dob,year,age,score Mike,0,2014-01-01,2016,2,20 Ellen,0, 2012-01-01,2016,4,35
file2_updated.csv
id,course_name,course_id 0,math,101 0,physics,102 0,chemistry,103 1,math,101 1,physics,102 1,chemistry,103
I have tried awk '{print ($1 - 1) "," $0}' file2.csv
, but did not get the correct result:
-1,id,course_name,course_id 0,1,math,101 0,1,physics,102 0,1,chemistry,103 1,2,math,101 1,2,physics,102 1,2,chemistry,103
Advertisement
Answer
You’ve added an extra column in your attempt. Instead set your first field $1
to $1-1
:
awk -F"," 'BEGIN{OFS=","} {$1=$1-1;print $0}' file2.csv
That semicolon separates the commands. We set the delimiter to comma (-F","
) and the Output Field Seperator to comma BEGIN{OFS=","}
. The first command to subtract 1 from the first field executes first, then the print command executes second, so the entire record, $0
, will now contain the new $1
value when it’s printed.
It might be helpful to only subtract 1 from records that are not your header. So you can add a condition to the first command:
awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1-1} {print $0}' file2.csv
Now we only subtract when the record number (NR
) is greater than 1. Then we just print the entire record.