Make awk gsub take value of for loop


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Make awk gsub take value of for loop
# 8  
You still have 2 invocations of awk when you only need 1 and you still have too many braces in your 2nd awk script. Try changing:
Code:
awk -v vkeyIn="$keyIn" -v vkeyOut="$keyOut" ' $0 ~ vkeyIn, $0 ~ vkeyOut {printf $0; }' $XMLdir$XMLfile |
awk -v vdefTagIn="$defTagIn" -v vdefTagOut="$defTagOut" -v tagIn="$tagIn" -v tagOut="$tagOut" -v vkeySepA="$keySepA" -v vkeySepB="$keySepB" -v vdefSep="$defSep" -v vetySepA="$etySepA" -v vetySepB="$etySepB" -v vemSep="$emSep" '
{
	# Separation after main key word
	gsub("<orth>", vkeySepA)
... ... ...
	gsub(/\.\.\. /, "...")

}

{ print "\n" $0 "\n" } '

to:
Code:
awk -v vkeyIn="$keyIn" -v vkeyOut="$keyOut" -v vdefTagIn="$defTagIn" -v vdefTagOut="$defTagOut" -v tagIn="$tagIn" -v tagOut="$tagOut" -v vkeySepA="$keySepA" -v vkeySepB="$keySepB" -v vdefSep="$defSep" -v vetySepA="$etySepA" -v vetySepB="$etySepB" -v vemSep="$emSep" '
$0 ~ vkeyIn, $0 ~ vkeyOut {
	# Separation after main key word
	gsub("<orth>", vkeySepA)
... ... ...
	gsub(/\.\.\. /, "...")
	print "\n" $0 "\n"
}' $XMLdir$XMLfile

It should give you exactly the same results with a single awk instead of two awks piped together.
This User Gave Thanks to Don Cragun For This Post:
# 9  
Quote:
Originally Posted by Don Cragun
You still have 2 invocations of awk when you only need 1 and you still have too many braces in your 2nd awk script. Try changing:
Code:
awk -v vkeyIn="$keyIn" -v vkeyOut="$keyOut" ' $0 ~ vkeyIn, $0 ~ vkeyOut {printf $0; }' $XMLdir$XMLfile |
awk -v vdefTagIn="$defTagIn" -v vdefTagOut="$defTagOut" -v tagIn="$tagIn" -v tagOut="$tagOut" -v vkeySepA="$keySepA" -v vkeySepB="$keySepB" -v vdefSep="$defSep" -v vetySepA="$etySepA" -v vetySepB="$etySepB" -v vemSep="$emSep" '
{
	# Separation after main key word
	gsub("<orth>", vkeySepA)
... ... ...
	gsub(/\.\.\. /, "...")

}

{ print "\n" $0 "\n" } '

to:
Code:
awk -v vkeyIn="$keyIn" -v vkeyOut="$keyOut" -v vdefTagIn="$defTagIn" -v vdefTagOut="$defTagOut" -v tagIn="$tagIn" -v tagOut="$tagOut" -v vkeySepA="$keySepA" -v vkeySepB="$keySepB" -v vdefSep="$defSep" -v vetySepA="$etySepA" -v vetySepB="$etySepB" -v vemSep="$emSep" '
$0 ~ vkeyIn, $0 ~ vkeyOut {
	# Separation after main key word
	gsub("<orth>", vkeySepA)
... ... ...
	gsub(/\.\.\. /, "...")
	print "\n" $0 "\n"
}' $XMLdir$XMLfile

It should give you exactly the same results with a single awk instead of two awks piped together.
I've been wanting to merge the two awks into one, but have not been successful. This code does not make the program run as intended. It just gives a splash of ongoing text from the xml file.

The reason I used two awks is because the xml file has text that is all broken up between lines; I needed first to concatenate those lines into one line (only the ones of the key phrase), and then that line is easy to edit in the second awk; else, I would be having to edit one line of text between several lines, and that is beyond my knowledge at this point. If there is a way to first do one task (concatenate the text), and then do another (the rest of the text manipulation with that concatenated text), that would be great. I've tried several variations and have not been successful.
# 10  
Untested: Instead of printf $0 in the first awk script, concatenate $0 to a working variable, like WRK = WRK " " $0, then assign WRK back to $0 for the further processing.

Untested, and a hint only: Methinks replacing the above five lines with
Code:
# Get rid of potential extra definition markers
     gsub (/(\.|<\/usg>) ?— ?<\/sense>/,    ".")

yields the same result. Same might be true for other opportunitites.
This User Gave Thanks to RudiC For This Post:
# 11  
Quote:
Originally Posted by bedtime
I've been wanting to merge the two awks into one, but have not been successful. This code does not make the program run as intended. It just gives a splash of ongoing text from the xml file.

The reason I used two awks is because the xml file has text that is all broken up between lines; I needed first to concatenate those lines into one line (only the ones of the key phrase), and then that line is easy to edit in the second awk; else, I would be having to edit one line of text between several lines, and that is beyond my knowledge at this point. If there is a way to first do one task (concatenate the text), and then do another (the rest of the text manipulation with that concatenated text), that would be great. I've tried several variations and have not been successful.
Did you figure out what needs to be done based on what RudiC suggested, or do you still need help completing it?
This User Gave Thanks to Don Cragun For This Post:
# 12  
Quote:
Originally Posted by RudiC
Untested: Instead of printf $0 in the first awk script, concatenate $0 to a working variable, like WRK = WRK " " $0, then assign WRK back to $0 for the further processing.
Works perfectly!
Code:
awk -v vkeyIn="$keyIn" -v vkeyOut="$keyOut" -v vdefTagIn="$defTagIn" -v vdefTagOut="$defTagOut" -v tagIn="$tagIn" -v tagOut="$tagOut" -v vkeySepA="$keySepA" -v vkeySepB="$keySepB" -v vdefSep="$defSep" -v vetySepA="$etySepA" -v vetySepB="$etySepB" -v vemSep="$emSep" '
$0 ~ vkeyIn, $0 ~ vkeyOut { WRK = WRK $0; next; }END{

	$0 = WRK

	# Separation after main key word
	sub(/<orth>/, vkeySepA)
	sub(/<\/orth>/, vkeySepB)

... ... ...

	gsub(/^ /,  "" )
	gsub(/\.\.\. /, "...")

	print "\n" $0 "\n"

}  ' $XMLdir$XMLfile


Quote:
Untested, and a hint only: Methinks replacing the above five lines with
Code:
# Get rid of potential extra definition markers
     gsub (/(\.|<\/usg>) ?— ?<\/sense>/,    ".")

yields the same result. Same might be true for other opportunitites.
Hmmm, a 5:1 reduction in code—not bad. Smilie

Quote:
Did you figure out what needs to be done based on what RudiC suggested, or do you still need help completing it?
Yes, above. I've been reading the GNU Awk User's Guide (https://www.gnu.org/software/gawk/manual/gawk.html#), so I haven't been doing as much coding. Smilie

Though atm, I do need help with using an array/variable in SUB. Rudi had pointed out in post #4 https://www.unix.com/303013671-post4.html a working solution which involved this, but I just could not break it down and make it execute properly:

I am trying to use an array of strings/variables (which is working fine) to insert into sub and be replaced (which is not working):
Code:
# Separation after main key word
# sub(/<orth>/, vkeySepA)

{ split ("<orth> "vkeySepA"", VNARR)
		VNARR[0] = 0

		# Will use when other issues are sorted
		# for (a = 1; a <=20; a+=2)

		# VNARR seems to print fine
		print "1: " VNARR[1] "\n2: " VNARR[2]

		# Faulty code below; does not match the data
		sub(VNARR[1], VNARR[2])

}

# 13  
Quote:
Originally Posted by bedtime
Works perfectly!
Code:
awk -v vkeyIn="$keyIn" -v vkeyOut="$keyOut" -v vdefTagIn="$defTagIn" -v vdefTagOut="$defTagOut" -v tagIn="$tagIn" -v tagOut="$tagOut" -v vkeySepA="$keySepA" -v vkeySepB="$keySepB" -v vdefSep="$defSep" -v vetySepA="$etySepA" -v vetySepB="$etySepB" -v vemSep="$emSep" '
$0 ~ vkeyIn, $0 ~ vkeyOut { WRK = WRK $0; next; }END{

	$0 = WRK

	# Separation after main key word
	sub(/<orth>/, vkeySepA)
	sub(/<\/orth>/, vkeySepB)

... ... ...

	gsub(/^ /,  "" )
	gsub(/\.\.\. /, "...")

	print "\n" $0 "\n"

}  ' $XMLdir$XMLfile



Hmmm, a 5:1 reduction in code—not bad. Smilie


Yes, above. I've been reading the GNU Awk User's Guide (The GNU Awk User’s Guide), so I haven't been doing as much coding. Smilie
I'm very happy that you got this to work for you.

But, please stop trying to hide the logic in your code! Make it obvious. Change:
Code:
$0 ~ vkeyIn, $0 ~ vkeyOut { WRK = WRK $0; next; }END{

	$0 = WRK

to:
Code:
$0 ~ vkeyIn, $0 ~ vkeyOut { WRK = WRK $0; next}

END {
	$0 = WRK

Quote:
Though atm, I do need help with using an array/variable in SUB. Rudi had pointed out in post #4 https://www.unix.com/303013671-post4.html a working solution which involved this, but I just could not break it down and make it execute properly:

I am trying to use an array of strings/variables (which is working fine) to insert into sub and be replaced (which is not working):
Code:
# Separation after main key word
# sub(/<orth>/, vkeySepA)

{ split ("<orth> "vkeySepA"", VNARR)
		VNARR[0] = 0

		# Will use when other issues are sorted
		# for (a = 1; a <=20; a+=2)

		# VNARR seems to print fine
		print "1: " VNARR[1] "\n2: " VNARR[2]

		# Faulty code below; does not match the data
		sub(VNARR[1], VNARR[2])

}

It isn't immediately obvious what isn't working in this example and you don't give us any indication of what you think it is doing wrong.

The call to split() could be rewritten more simply as:
Code:
split("<orth> "vkeySepA, VNARR)

and always give you identical results. Given that vkeySepA has been defined to be an empty string in your earlier code and assuming that it still is when you ran this, one might note that the call above would return 1 (not the 2 that you seem to be assuming). But since unassigned array elements (like any other unassigned variables) will have a 0 value if used as a number or an empty string value if used as a string, that won't make any difference in this case. Your calls to sub() with those array values should change the first occurrence of the string <orth> in $0 to an empty string.

Note that the <space> after <orth> will be treated as a field separator, not as part of the string to be replaced. Note also that with many of your search patterns (many of which contain <space>s) and replacement patterns (many of which contain <space>s), using code like the above will give you more than 2 fields in the created array unless you use a different array element separator and add an ERE to your split() call specifying the character(s) in your separator as the element separator.
This User Gave Thanks to Don Cragun For This Post:
# 14  
Quote:
Originally Posted by Don Cragun
But, please stop trying to hide the logic in your code! Make it obvious.
Was not intentional. Just a thing of habit.

Quote:
It isn't immediately obvious what isn't working in this example and you don't give us any indication of what you think it is doing wrong.
Yes, I have to be more clear. I got it working though. It was a silly mistake of not watching the correct text that I was replacing; it was actually working all along. Smilie

Quote:
Note that the <space> after <orth> will be treated as a field separator, not as part of the string to be replaced. Note also that with many of your search patterns (many of which contain <space>s) and replacement patterns (many of which contain <space>s), using code like the above will give you more than 2 fields in the created array unless you use a different array element separator and add an ERE to your split() call specifying the character(s) in your separator as the element separator.
Yes, found this to be the case. And I've decided to keep the code as it was for now—this is a little too deep for me atm and not entirely necessary for the program to work... As for the other code, I've shortened it up abit.

Just a note: I've started to make this program in C++; it seems that it could benefit from the speed and features. Already, the C++ program can open a file and extract, replace, and print a few text combinations; so you may not see me posting for a little bit in the Shell Programming Forum, but I will still be using awk for the many things that it can do! Smilie
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #13
Difficulty: Easy
EBCDIC, ASCII, SIXBIT, and ANSI are methods for encoding text characters on a computer.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using multiple gsub() function under a loop in awk

Hi ALL, I want to replace string occurrence in my file "Config" using a external file named "Mapping" using awk. $cat Config ! Configuration file for RAVI ! Configuration file for RACHANA ! Configuration file for BALLU $cat Mapping ravi:ram rachana:shyam ballu:hameed The... (5 Replies)
Discussion started by: useless79
5 Replies

2. Shell Programming and Scripting

Gsub function in awk

Hello, I had some difficulty to understand the gsub function and maybe the regex in this script to remove all the punctuations: awk 'gsub(//, " ", $0)' text.txtFile text.txt: This is a test for gsub I typed this random text file which contains punctuation like ,.;!'"?/\ etc. The script... (6 Replies)
Discussion started by: yifangt
6 Replies

3. UNIX for Dummies Questions & Answers

awk gsub with variables

Hello, I'm trying to substitute a string with leading zero for all the records except the trailer record using awk command and with variables. The input file test_med1.txt has data like below 1234ABC...........................9200............LF... (2 Replies)
Discussion started by: somu_june
2 Replies

4. Shell Programming and Scripting

awk gsub

Hi, I want to print the first column with original value and without any double quotes The output should look like <original column>|<column without quotes> $ cat a.txt "20121023","19301229712","100397" "20121023","19361629712","100778" "20121030A","19361630412","100838"... (3 Replies)
Discussion started by: ysrini
3 Replies

5. Shell Programming and Scripting

awk gsub with variables?

Hey, I would like to replace a string by a new one. Teh problem is that both strings should be variables to be flexible, because I am having a lot of files (with the same structure, but in different folders) for i in daysim_* do cd $i/5/ folder=`pwd |awk '{print $1}'` awk '{ if... (3 Replies)
Discussion started by: ergy1983
3 Replies

6. Shell Programming and Scripting

Awk gsub error.

I want to replace comma with space and "*646#" with space. I am using the following code: nawk -F"|" '{gsub(","," ",$3); gsub(/\*646\#/"," ",$3);print}' OFS="|" file I am getting following error: Help is appreciated (5 Replies)
Discussion started by: pinnacle
5 Replies

7. Shell Programming and Scripting

Awk Gsub Query

Hi, Can some one please explain the following line please throw some light on the ones marked in red awk '{print $9}' ${FTP_LOG} | awk -v start=${START_DATE} 'BEGIN { FS = "." } { old_line1=$0; gsub(/\-/,""); if ( $3 >= start ) print old_line1 }' | awk -v end=${END_DATE} 'BEGIN { FS="." } {... (3 Replies)
Discussion started by: crosairs
3 Replies

8. Shell Programming and Scripting

awk gsub

Hi all I want to do a simple substitution in awk but I am getting unexpected output. My function accepts a time and then prints out a validation message if the time is valid. However some times may include a : and i want to strip this out if it exists before i get to the validation. I have shown... (4 Replies)
Discussion started by: pxy2d1
4 Replies

9. Shell Programming and Scripting

Help with AWK and gsub

Hello, I have a variable that displays the following results from a JVM.... 1602100K->1578435K I would like to collect the value of 1578435 which is the value after a garbage collection. I've tried the following command but it looks like I can't get the > to work. Any suggestions as... (4 Replies)
Discussion started by: npolite
4 Replies

10. Shell Programming and Scripting

use var in gsub of awk

Hi all, This problem has cost me half a day, and i still do not know how to do. Any help will be appreciated. Thanks advance. I want to use a variable as the first parameters of gsub function of awk. Example: { ... arri]=gsub(i,tolower(i),$1) (which should be ambraced by //) ... } (1 Reply)
Discussion started by: summer_cherry
1 Replies

Featured Tech Videos