Skip to content
Advertisement

for loop in bash simply prints n times the command instead of reiterating

I have a input.txt file with over 6000 lines.

If a line a has over 10 words then I want it to be split but not at the 10th word but where the first comma character appears. And, if the new line also has more than 10 words, then it should also be split, and keep reiterating this process 7 times.

End product: no lines with more than 10 words and commas because they have all been split.

Example:

Input

Line 1: This is me, and my sample test line that I like to get working, and I want to be able to kick some ass while doing it

Expected output:

Line 1: This is me, 
Line 2: and my sample test line that I like to get working,
Line 3: and I want to be able to kick some ass while doing it

I’m using the following code:

#! /bin/bash

for run in {1..7}
do

awk 'NF >= 10 {
sub (", ", ",n")

}1' input.txt

done

This code is not giving the desired result. Instead I get the following output 7 times.

line 1: This is me,

line 2: and my sample test line that I like to get working, and I want to be able to kick some ass while doing it.

I am leaning to sed, but I’m not clear on something. I see three approaches 1) code reads a line (say line7) and it’s over 10 words and breaks it at comma (but doesn’t check if the newly broken like is over 10 words) and moves on to the next line. At end of file, it reiterates this process (say 7 times) to ensure that newly broken lines are also under 10 words. THEN, it takes the output of this process and does the same thing but with a new condition (eg. word “and “). THEN, it takes the output of this and so on…I can add endless conditions. This is the approach I prefer. I also think it is easier to code.

Second approach 2) Is that code reads line, and if it’s over 10 words it breaks it down at the comma, and then if still over 10 words it breaks that further at the comma and so on until less than 10 words. Only Then, does it move on to the next line. I think this is what Ghoti’s code does. But then it’s complicated to add additional conditions. 3) 3rd approach is: it breaks the line over 10 words at comma, then remainder line gets broken at “and ” and so on. Then in the end, this whole process get reiterated a few time. This is also IMHO not the best way to do it.

Can someone please help.

Thank you in advance!

Advertisement

Answer

I think I see what you’re after. There are a few problems with your approach:

  • awk doesn’t process files in-place. So your sub() makes a change, 1 prints to stdout, but your input file never changes.
  • When you sub(), you don’t insert a new record into the input stream that awk is processing. Your command merely adds a newline to the current record.

Given these, you could get away with processing the input multiple times, as you’ve suggested. But rather than arbitrarily assuming that you’ll have a maximum of seven 10-word phrases on a line, it might be better to actually detect whether you need to continue. Something like this:

#!/usr/bin/env bash

input=input.txt
temp=$(mktemp ${input}.XXXX)
trap "rm -f $temp" 0

while awk '
  BEGIN { retval=1 }
  NF >= 10 && /, / {
    sub(/, /, ","ORS)
    retval=0
  }
  1
  END { exit retval }
' "$input" > "$temp"; do
  mv -v $temp $input
done

This uses an exit value from awk to determine whether we need to run another iteration of the bash loop. If awk detects that no substitutions were required, then the loop stops.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement