I have a problem with the output produced by the masscan
utility with the -oG
options (“grep-able” output); for instance, it outputs this:
JavaScript
x
# Masscan 1.0.3 scan initiated Wed Jun 4 01:35:02 2014
# Ports scanned: TCP(3;21-23,) UDP(0;) SCTP(0;) PROTOCOLS(0;)
Host: 192.168.100.19 () Ports: 2222/open/tcp////
Host: 192.168.100.13 () Ports: 2222/open/tcp////
Host: 192.168.100.16 () Ports: 443/open/tcp////
Host: 192.168.100.8 () Ports: 21/open/tcp////
Host: 192.168.100.5 () Ports: 22/open/tcp////
Host: 192.168.100.5 () Ports: 443/open/tcp////
Host: 192.168.100.16 () Ports: 80/open/tcp////
Host: 192.168.100.19 () Ports: 22/open/tcp////
Host: 192.168.100.7 () Ports: 80/open/tcp////
Host: 192.168.100.8 () Ports: 80/open/tcp////
Host: 192.168.100.12 () Ports: 2222/open/tcp////
Host: 192.168.100.13 () Ports: 22/open/tcp////
# Masscan done at Wed Jun 4 01:35:16 2014
The above is neither very readable nor easy to understand.
How can I use Linux command-line utilities, e.g. sed
, awk
, or grep
, to output something as follows, using the file above?
JavaScript
Host: 192.168.100.5
Ports: 22, 443
Host: 192.168.100.7
Ports: 80
Host: 192.168.100.8
Ports: 21, 80
Host: 192.168.100.12
Ports: 2222
Host: 192.168.100.13
Ports: 2222, 22
As you can see, the output is much more readable in this layout: sorted by IP address, with all associated ports listed below, consolidated across multiple input lines with the same IP address.
Advertisement
Answer
Try this:
JavaScript
awk -F' +|/' '
!/s*#/ { # ignore comment lines
# Add the port on the current line to the associative array
# element for the IP address on the current line.
ips[$2] = ips[$2] (ips[$2] == "" ? $5 : ", " $5)
}
END {
# Enumerate all IPs and the ports for each.
# Since the IPs will be listed in no specific order, the output
# is piped as a _single_ line to "sort" in order to sort by IP address,
# and then expanded into 2 lines via "tr".
for (ip in ips) {
printf "Host: %s@Ports: %s@n", ip, ips[ip] |
"sort -t. -n -k 1.6,1 -k 2,2 -k 3,3 -k 4,4 | tr @ "n""
}
}
' file
- This solution properly sorts the output by IP address and separates the ports with commas.
- By contrast, for a given IP address, the port numbers are listed in the order they were encountered in the input (as in the sample output data in the question).