I have a problem with the output produced by the masscan
utility with the -oG
options (“grep-able” output); for instance, it outputs this:
# Masscan 1.0.3 scan initiated Wed Jun 4 01:35:02 2014 # Ports scanned: TCP(3;21-23,) UDP(0;) SCTP(0;) PROTOCOLS(0;) Host: 192.168.100.19 () Ports: 2222/open/tcp//// Host: 192.168.100.13 () Ports: 2222/open/tcp//// Host: 192.168.100.16 () Ports: 443/open/tcp//// Host: 192.168.100.8 () Ports: 21/open/tcp//// Host: 192.168.100.5 () Ports: 22/open/tcp//// Host: 192.168.100.5 () Ports: 443/open/tcp//// Host: 192.168.100.16 () Ports: 80/open/tcp//// Host: 192.168.100.19 () Ports: 22/open/tcp//// Host: 192.168.100.7 () Ports: 80/open/tcp//// Host: 192.168.100.8 () Ports: 80/open/tcp//// Host: 192.168.100.12 () Ports: 2222/open/tcp//// Host: 192.168.100.13 () Ports: 22/open/tcp//// # Masscan done at Wed Jun 4 01:35:16 2014
The above is neither very readable nor easy to understand.
How can I use Linux command-line utilities, e.g. sed
, awk
, or grep
, to output something as follows, using the file above?
Host: 192.168.100.5 Ports: 22, 443 Host: 192.168.100.7 Ports: 80 Host: 192.168.100.8 Ports: 21, 80 Host: 192.168.100.12 Ports: 2222 Host: 192.168.100.13 Ports: 2222, 22 ......
As you can see, the output is much more readable in this layout: sorted by IP address, with all associated ports listed below, consolidated across multiple input lines with the same IP address.
Advertisement
Answer
Try this:
awk -F' +|/' ' !/s*#/ { # ignore comment lines # Add the port on the current line to the associative array # element for the IP address on the current line. ips[$2] = ips[$2] (ips[$2] == "" ? $5 : ", " $5) } END { # Enumerate all IPs and the ports for each. # Since the IPs will be listed in no specific order, the output # is piped as a _single_ line to "sort" in order to sort by IP address, # and then expanded into 2 lines via "tr". for (ip in ips) { printf "Host: %s@Ports: %s@n", ip, ips[ip] | "sort -t. -n -k 1.6,1 -k 2,2 -k 3,3 -k 4,4 | tr @ "n"" } } ' file
- This solution properly sorts the output by IP address and separates the ports with commas.
- By contrast, for a given IP address, the port numbers are listed in the order they were encountered in the input (as in the sample output data in the question).