Following a suggestion from @tripleee, I’m posting another question for a coding issue I’m having trying to be more specific with sources and expected results.
My source .txt
file is quite over populated with lines and using AWK
, I:
- extract only rows identified by a specific code.
- parse the content of the line in order to get only certain values.
- remove leading zeros from substring.
After these first steps, my saved output is like this (I can change the order of the columns as I wish).
1646 | 01 | 1602 | 4 1646 | 01 | 1604 | 19 1646 | 01 | 1605 | 35 1646 | 01 | 1606 | 7 1646 | 01 | 1607 | 37 1646 | 01 | 1609 | 9 1646 | 01 | 1610 | 3 1646 | 01 | 1611 | 2 1646 | 01 | 1612 | 90 1646 | 01 | 1613 | 107 1646 | 01 | 1614 | 12 1646 | 01 | 1615 | 12 1646 | 01 | 1616 | 4 1646 | 01 | 1617 | 1 1646 | 01 | 1618 | 39 3625 | 01 | 2311 | 14 3625 | 01 | 2312 | 9 3625 | 01 | 2313 | 11 3625 | 01 | 2314 | 3 3625 | 01 | 2315 | 1 3625 | 01 | 2316 | 1 3625 | 01 | 2317 | 28 3625 | 01 | 2318 | 9 3625 | 01 | 2320 | 12 3625 | 01 | 2321 | 5 3625 | 01 | 2322 | 29 3625 | 01 | 2324 | 3 1646 | 06 | 1600 | 12 1646 | 06 | 1603 | 19 1646 | 06 | 1608 | 11 3625 | 06 | 2319 | 6 3625 | 06 | 2323 | 16 3547 | AF | 1525 | 1 3547 | AF | 1526 | 1
What I would like to achieve is a group by on the second column, summing the fourth column, with a line for each value in the first column.
In other words, the expected output should be:
DEP SUM 01 1646 381 3625 125 06 1646 42 3625 22 AF 3547 2
Or with a better presentation:
DEP | PDV | | SUM 01 | | | | 1646 | | 381 | 3625 | | 125 06 | | | | 1646 | | 42 | 3625 | | 22 AF | | | | 3547 | | 2
So far I’ve tried this:
awk '{sum[$2]+=$4} END{for (date in sum) print date, sum[date], freq[date]}' Test_Awk_2.txt
But results in not what I expected:
AF 2
01 506
06 64
Advertisement
Answer
You’re only grouping your sum by one column, but you want to group it by 2 columns, so you need to use both of them as the key in the array.
You’re also printing a freq
array, but you never assigned it. It also doesn’t exist in your desired output.
awk '{sum[$2" "$1]+=$4} END{for (date in sum) print date, sum[date]}' Test_Awk_2.txt | sort -n -k1,2
The sort
command will put all the same values of $2
in consecutive rows.