Skip to content
Advertisement

awk renamed few files and left few to renamed

I am trying to replace part of filenames based on matching string of filename from another file. Filenames are in following format:

36872_20190806_00.csv  40800_20190806_00.csv  41883_20190806_00.csv  
38064_20190806_00.csv  40848_20190806_00.csv  41891_20190806_00.csv  
38341_20190806_00.csv  40856_20190806_00.csv  41923_20190806_00.csv  
40417_20190806_00.csv  40948_20190806_00.csv  44373_20190806_00.csv  
40745_20190806_00.csv  41217_20190806_00.csv  45004_20190806_00.csv 
40754_20190806_00.csv  41256_20190806_00.csv                

where digits before first _ represent station code, which I want to replace with its station name from another file named radiosonde.csv. For example : I want

change 36872_20190806_00.csv to ALMATY_20190806_00.csv

change 38064_20190806_00.csvto KYZYLORDA_20190806_00.csv

Data of radiosonde is as given below:

CODE,LAT,LON,Elevation,STN_NAME
41620,31.35,69.467,1407,ZHOB
41600,32.5,74.5333,255,SIALKOT
41598,32.9333,73.7167,232,JHELUM
41594,32.05,72.667,188,SARGODHA
41571,33.6167,73.1,507,ISLAMABAD_AIRPORT
41560,33.8667,70.0833,1725,PARACHINAR
41529,34.0333,71.9333,329,PESHAWAR
41516,35.9167,74.3333,1453,GILGIT
41515,35.5667,71.7833,1464,DROSH
41506,35.9217,71.8,1499,CHITRAL
41316,17.0439,54.1022,23,SALALAH_AIRPORT
41288,20.667,58.9,19,MASIRAH
41256,23.5953,58.2983,8.4,MUSCAT_INTL_AIRPORT
41217,24.4333,54.65,16,ABU_DHABI_INTL_AIRPOR
41169,25.2731,51.6081,4,HAMAD_INTL_AIRPORT
40990,31.5,65.85,1010,KANDAHAR_AIRPORT
40948,34.55,69.2167,1791,KABUL_AIRPORT
40938,34.217,62.217,977,HERAT
40913,36.6667,68.9167,433,KUNDUZ
40911,36.7,67.2,378,MAZAR-I-SHARIF
40875,27.2167,56.3667,10,BANDARABBASS
40856,29.4667,60.8833,1370,ZAHEDAN
40848,29.5333,52.6,1484,SHIRAZ
40841,30.25,56.9667,1748,KERMAN
40821,31.9,54.2833,1238,YAZD
40811,31.3333,48.6667,20,AHWAZ
40809,32.8667,59.2,1491,BIRJAND
40800,32.5175,51.7061,1550.4,ESFAHAN
40754,35.6833,51.3167,1204,TEHRAN-MEHRABAD
40745,36.2667,59.6333,999,MASHHAD
40427,26.267,50.617,2,BAHRAIN
40417,26.45,49.8167,22,KING_FAHD_INTL_AIRPORT
40416,26.267,50.167,19,DHAHRAN
3992,10.83,106.97,11,AN_LOC
38989,35.9,62.9667,375,TAGTABAZAR
38954,37.5,71.5,2077,KHOROG
38927,37.233,67.267,310,TERMEZ
38880,37.987,58.361,211,ASHGABAT_KESHI
38836,38.55,68.783,800,DUSHANBE
38750,37.467,53.967,-22,ESENGYLY
38687,39.083,63.6,190,CHARDZHEV
38613,40.917,72.95,765,DZHALAL-ABAD
38606,40.55,70.95,499,KOKAND
38599,40.217,69.733,427,KHUDJAND
38507,40.0333,52.9833,90,TURKMENBASHI
38457,41.267,69.267,493,TASHKENT
38413,41.733,64.617,237,TAMDY
38392,41.833,59.983,87,DASHKHOVUZ
38353,42.833,74.583,760,BISHKEK
38341,42.85,71.3,652,TARAZ
38064,44.7667,65.5167,133.4,KYZYLORDA
38001,44.55,50.25,-25,FORT SHEVCHENKO
37985,38.733,48.833,-11,LANKARAN
37860,40.5333,50,27,MASHTAGA
36974,41.433,76,2041,NARYN
36872,43.3633,77.0042,662.7,ALMATY
36859,44.167,80.067,645,ZHARKENT
3369,22.77,88.37,0,BARAKPUR
3368,25.88,89.43,0,LALMANIR_HAT

I looked into this question. As suggested there, I tried :

sort -r radiosonde.csv | awk -F"," '{print "for files in *00.csv; do mv $files ${files/" $1 "/" $5 "}; done" }'  | bash

It did work in some sense. It renamed some files and left few as it is and gave error as:

bash: line 25: unexpected EOF while looking for matching `''
bash: line 113: syntax error: unexpected end of file

I am not understanding why it’s behaving so strangely with some files. If I’ll take those filenames and put them into some another file say test.csv and use above command again i.e.

sort -r test.csv | awk -F"," '{print "for files in *00.csv; do mv $files ${files/" $1 "/" $5 "}; done" }'  | bash

then it will rename all those files which were left earlier. Is there any way to do it using shell script. I tried following script but didn’t work:

for file in *00.csv ; do 
         mv $files ${files/" $1 "/" $5 "}; 
done < radiosonde.csv

Advertisement

Answer

What about this:

Make sure that radiosonde.csv file along with all the csv files that you want to rename in the same directory.

$ cd <directory of radiosonde.csv, 36872_20190806_00.csv, 38064_20190806_00.csv and so on...>
$ ls *.csv > .tmp; awk -F ',' '{name[$1]=$5}END{for(;(getline filename < ".tmp")>0;){ori=filename;sub(/_.+$/,"",filename);pre=filename;sub(/^[0-9]+/,"",ori);post=ori;if(name[pre]!="")system("mv " pre post " " name[pre] post)}} ' 'radiosonde.csv'
$ rm -f '.tmp'

Explanation:

  • ls *.csv > .tmp -> List all files in current dir and write them into .tmp
  • awk -F ',' -> Set , (comma) as the field separator for awk. Because we want to split lines like 41620,31.35,69.467,1407,ZHOB into separate fields. Then we can get them via $1, $2, $3 and so on.
  • '{ ... }END{}' -> This is awk’s blocks. First block for reading input files and the later will be execute before awk program exits.
  • 'radiosonde.csv' Set this as input file to feed awk for reading.
  • '{name[$1]=$5}' -> $1 is the first field and $5 is the 5’th one. In this case $1 would be 41620, 41600 and so on and $5 would be ZHOB, SIALKOT and etc. name is an array. When we read the first line, we set name[CODE]=STN_NAME and name[41620]=ZHOB for the second line.
  • END{}' -> After we the set all the variables we needed, we need to rename the files and END{} is one of the block we can used for that purpose.
  • for(;(getline filename < ".tmp")>0;) {} -> This is for reading .tmp file that contains list of files that we want to rename.
  • ori=filename; -> Set variable filename to another variable. This is because we want to use sub() function that will alter the variable but still need filename variable to get the remaining part of the filename.
  • sub(/_.+$/,"",filename); -> This is to remove characters that we don’t want to. In this case from character _ to the end. For example, if filename is 41620_20190806_00.csv, _20190806_00.csv will be removed and filename will become 41620.
  • pre=filename; -> Set filename to another variable called pre for clarity.
  • sub(/^[0-9]+/,"",ori); -> This will remove the leading numbers so ori will become _20190806_00.csv.
  • post=ori; -> Set ori to another variable in this case post.
  • if(name[pre]!="") -> Because radiosonde.csv will be inside .tmp and is not one of the files that we want to rename, we need this if statement so that we don’t receive any error for the next command. name[radiosonde] will be empty.
  • system("mv " pre post " " name[pre] post) -> What this statement does would be renaming your file. If pre is 41620 and post is _20190806_00.csv, this statement can be translate into this "mv 41620_20190806_00.csv ZHOB_20190806_00.csv".
  • rm -f '.tmp' -> Delete .tmp file because we don’t need it anymore.

Ignore my commend below. We do need the if statement.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement