Skip to content
Advertisement

How to run iterations asynchronously in shell script

I have a few .csv files like below.

  1. xyz0900@1#-1637746436.csv
  2. xxx0900@1#-1637746436.csv
  3. zzz0900@2#-1637746439.csv
  4. yyy0900@1#-1637746436.csv
  5. sss0900@2#-1637746439.csv

I have written a script to perform below tasks:

  1. Get the large file based on the pattern which we have passed as a argument to the script.
  2. Merge all other files which are having same pattern and create a new file
  3. Remove duplicate header from new file.
  4. Move new file to the destination based on the parameter passed as a argument.

Example: I am passing “1637746436@home/dest1,1637746436@home/dest2” as a second argument to the script. Below script will fetch the pattern(1637746436). Get the bigger file and merge all other files(having same pattern) with it. New file will be get created and same will be moved to the destination(home/dest1).

The below script will perform the pattern matching and execution sequentially.

How to make ‘for loop iteration’ should be executed parallelly? I mean pattern matching of “1637746436@home/dest1,1637746436@home/dest2” should be performed simultaneously(not one after another).

Please help on this.

$merge.sh /home/dummy/17 "1637746436@home/dest1,1637746439@home/dest2" 


 #!/bin/bash
 current=`pwd`
 source=$1
 destination=$2
 echo "$destination" | tr "," "n" > $current/out.txt
 cat out.txt | cut -d "@" -f1 > $current/pattern.txt

 for var in `cat pattern.txt`
 do
 getBiggerfile=$(ls -Sl $source/*$var.csv | head -1)
 cd $source
 getFileName=$(echo $getBiggerfile | cut -d " " -f9-)
 newFileName=$(echo $getFileName | cut -d "@" -f1)
 cat *$var.csv >> $getFileName
 header=$(head -n 1 $getFileName)
(printf "%sn" "$header";
 grep -vFxe "$header" $getFileName
 ) > $newFileName.csv
 rm -rf *$var.csv
 cd $current

 for var1 in `cat out.txt`
 do
 target=`echo $var1 | cut -d "@" -f2`
 id=$(echo $var1 | cut -c-10)
 if [ $id = $var ]
 then
 mv $newFileName.csv $target
 fi
 done
done

Advertisement

Answer

The cleanest would be to make the internals of the loop a function, and call the function inside the loop, putting it in the background (child processes), then wait for the background (child) processes to finish:

function do_the_thing(){
    source="$1"
    current="$2"
    var="$3"
    getBiggerfile=$(ls -Sl $source/*$var.csv | head -1)
    cd $source
    getFileName=$(echo $getBiggerfile | cut -d " " -f9-)
    newFileName=$(echo $getFileName | cut -d "@" -f1)
    cat *$var.csv >> $getFileName
    header=$(head -n 1 $getFileName)
    (printf "%sn" "$header";
       grep -vFxe "$header" $getFileName
    ) > $newFileName.csv
    rm -rf *$var.csv
    cd $current

    for var1 in `cat out.txt`
    do
        target=`echo $var1 | cut -d "@" -f2`
        id=$(echo $var1 | cut -c-10)
        if [ $id = $var ]
        then
            mv $newFileName.csv $target
        fi
    done
}

for var in `cat pattern.txt`
do
    do_the_thing "$source" "$current" "$var" &
done

wait
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement