I have data in source file as below (file.txt)
N4*WALTER*WHITE~DMG*D8*19630625~N4*JESSI*PINKMAN*15108~
Input command: (N4 = segment Identifier, 1= position , ref.txt=reference file)
N4*1*ref.txt
ref.txt has data as below
BILL LEONARDO BALE BRAD PITT
I have below code which displays the data in position x(input) for N4
identifier=N4 position=1 refile=ref.txt awk -F[*~] -v id="$identifier" -v pos="$position" 'id { for (i=1; i<=NF; i++) if ($i == id) { if (i+pos <= NF) print $(i+pos) else print "invalid position" } } ' file.txt WALTER JESSI
identifier=N4 position=2 refile=ref.txt awk -F[*~] -v id="$identifier" -v pos="$position" 'id { for (i=1; i<=NF; i++) if ($i == id) { if (i+pos <= NF) print $(i+pos) else print "invalid position" } } ' file.txt WHITE PINKMAN
Now how can i integrate ref.txt in above code to update WALTER and JESSI in file.txt with random text located in ref.txt file.
I know shuf command gives random data from ref.txt. but not sure how to integrate this in above awk command.
shuf -n -1 ref.txt
expected output: file.txt (Position 1 data for N4 segment) to get updated with random text from ref.txt
N4*BALE*WHITE~DMG*D8*19630625~N4*PITT*PINKMAN*15108~
Advertisement
Answer
Well, I can do it in bash, but it will be slow with while read
loop:
# recreate the input files cat <<EOF >file.txt N4*WALTER*WHITE~DMG*D8*19630625~N4*JESSI*PINKMAN*15108~ EOF cat <<EOF >input N4*1*ref.txt EOF cat <<EOF >ref.txt BILL LEONARDO BALE BRAD PITT EOF # read the input IFS='*' read -r segment position reference_file <input { # for each line while IFS='*' read -r -d'~' -a data; do # if the segment id is the segmend if [ "${data[0]}" = "$segment" ]; then # update the data data[$position]=$(shuf -n1 "$reference_file") fi # and output the data ( IFS=*; printf "%s~" "${data[*]}"; ) done # append a newline on the end echo } < file.txt
I wanted to try it with sed
to iterate over segments, but end up preprocessing sed
s inputs. Below it is, with comments:
IFS='*' read -r segment position reference_file <input # remove the nelwine from input # and substitute each `~` with a newline # so we can nicely process the file in sed <file.txt tr -d 'n' | tr '~' 'n' >tmp.txt # count of segments inside input we are interested in segmentscnt=$( grep "^${segment}*" tmp.txt | wc -l ) # generate single line with random words from reference_file # words separated by `*` # the count of words should that many as many are there # segments we are interested in the input file randoms=$( while shuf -n1 "$reference_file"; do :; done | head -n"$segmentscnt" | tr 'n' '*' ) sed -n " # the first line should be random words from referencefile # load it to hold space 1{ h d } # if this is our segment /^$segment*/{ # append random words to our pattern space G # remember as many fields as the position we want to insert # each one word more # remember rest of line # remember first word from randoms that were inserted from hold space # then just substitute the words in proper order s/^(([^*]**){$position})[^*]*([^n]*)n([^*]*)*.*/143/ # remove the first word from hold space x s/^([^*]*)*// x } p # the first input are the random words separated by * # the words are on a single line # than the input file " - <<<"$randoms" tmp.txt | # then replace newlines with `~`. # also append a newline with echo # as it will be missing tr 'n' '~'; echo