Sponsored Content
Top Forums Shell Programming and Scripting Make awk gsub take value of for loop Post 303013677 by bedtime on Sunday 25th of February 2018 05:39:05 PM
Old 02-25-2018
Quote:
Originally Posted by RudiC
- Although braces don't hurt and the parser will understand / eliminate them, too many of them makes the code difficult to read. {if (vn<=10) {vnx=vn } } can be written as if (vn<=10) vnx=vn without sacrifying logics but improving readability.
Yes, I will try to ensure all new code is trimmed down. For now, I don't want to mess with the other braces.

Quote:
- for every single input line, you execute those nested loops 20 x 4 x 26, i.e. 2080 times - quite lengthy for more than a few input lines.
Yes, I know no other working alternative. Smilie

Quote:
- instead of the 16 ifs for the vnx constants assignment, you could use an array.
How eloquent that is! This is the type of thing I was looking for—so many lines saved! Smilie

Quote:
- you seem to execute 2080 gsubs on $0 with different patterns, each and every one overwriting the former ones - not sure if each of those really makes sense and is necessary.
I hadn't known that; my thought was that gsub only executed on a match? Maybe something like (and I've tried to make this work for a while):
Code:
# Make a variable for easy access:
IDvar="<sense " vID "." vid "\" level=\"" vl "\" n=\"" VNARR[vn] "\" opt=\"n\">"

# If that variable exists, then run gsub.
if (/IDvar/) gsub(IDvar, vdefSep)

Quote:
I could imagine that if you explain your problem verbosely in plain English supporting this with a few meaningful examples, people in here could come up with a taylored, crisp proposal on how to improve and accelerate the solution.
Thank you. I try to keep speech to a min, so people aren't overwhelmed...

Quote:
EDIT:
This
Code:
{ /<sense id=\"n.*" level/; {vID = substr($2, 1, length($2)-1)}}

is NOT a pattern {action} pair and will change vID with every new input line. Is that intended? Why then the /<sense id=\"n.*" level/?
It would actually be the same due to how the xml file stores things, but, that said, you are right; it is not necessary to run and rerun that variable, even if the info is constant. I've taken it out of the for loop and put it just before. Smilie


Here is the updated version:
Code:
#!/bin/mksh

# This program requires an xml dictionary file to run. If it is not on your machine,
# this program will automatically download it and store in ~/.config/latin/.

# Name this file as 'latin' and run:
#
# $ chmod +x latin
#
# To run:
# $ ./latin amo
#
# To enable internet auto-decline:
# $ ./latin -d amo
#
# To run with only auto-decline:
# $ ./latin -c amo
#
# Where 'amo' is the term searched.

searchTerm=$2

URL="http://www.perseus.tufts.edu/hopper/morph?l=$searchTerm&la=la"

wFIN='<h4 class="la">'
wFOUT='</h4>'
wDefIn='<span class="lemma_definition">'
wDefOut='</span>'
wFormIn='<td class="la">'$searchTerm'</td>'
wFormOut='<td style="font-size: x-small">'

## Code which connects to perseus to attain 1st per. sg. (needed as key for xml file)
if [[ ("$1" == "-d") ]]; then

	searchTerms=$(wget -q -O- "$URL" | mawk -v vWFIN="$wFIN" -v vWFOUT="$wFOUT" \
	' $0 ~ vWFIN,$0 ~ vWFOUT {printf substr($0,18, length($0)-22)"\n"; next;}')

elif [[ ("$1" == "-c") ]]; then

	wget -q -O- "$URL" | mawk -v vDefIn="$wDefIn" -v vDefOut="$wDefOut" -v vFormIn="$wFormIn" -v vFormOut="$wFormOut" -v vWFIN="$wFIN" -v vWFOUT="$wFOUT" \
	' $0 ~ vWFIN,$0 ~ vWFOUT {printf "\n[ " substr($0,18, length($0)-22)" ]"; next;}   $0 ~ vDefIn,$0 ~ vDefOut {{ if (!/>/) {{$1=$1}1; x+=1; print " "$0"";} }}   $0 ~ vFormIn,$0 ~ vFormOut {{ if (!/td /) {{$1=$1}1;   $0=substr($0,5, length($0)-9); print "-"$0; next;} } }'

else
	searchTerms=$1
fi

if [ "$1" == "-c" ]; then
	exit
fi

XMLfile=Perseus_text_1999.04.0060.xml
XMLdir=~/.config/latin/
XMLlink="http://www.perseus.tufts.edu/hopper/dltext?doc=Perseus:text:1999.04.0060"

if [ ! -e $XMLdir$XMLfile ]; then
        echo "\nFile:" $XMLdir$XMLfile "not found.\n\nDownloading from" $XMLlink "...\n"
	mkdir -p ~/.config/latin
	wget -qO- $XMLlink | tr -d '\r' > $XMLdir$XMLfile
fi

for searchTerm in $searchTerms
do

#echo "Searching for:"$searchTerms

keyIn='key="'$searchTerm'"'	# Which tag shall be searched?
keyOut='</entry>'	#
tagIn='<'		# How are tags to be distinguished?
tagOut='>'		#
keySepA=''		# Separates the main word from its roots
keySepB=','		#
etySepA='['		# Etymology left
etySepB=']\n\n • '	# Etymology right
defSep='\n\n '          # Separates individual definitions
emSep='\n\n • '		# Separates em-dashes

#echo $keyIn

# First concatenate the result into a usable string
awk -v vkeyIn="$keyIn" -v vkeyOut="$keyOut" ' $0 ~ vkeyIn, $0 ~ vkeyOut {printf $0; }' $XMLdir$XMLfile |
awk -v tagIn="$tagIn" -v tagOut="$tagOut" -v vkeySepA="$keySepA" -v vkeySepB="$keySepB" -v vdefSep="$defSep" -v vetySepA="$etySepA" -v vetySepB="$etySepB" -v vemSep="$emSep" '

	# Separation after main key word
	{ gsub("<orth>", vkeySepA) }
	{ gsub("</orth>", vkeySepB) }

	# Add separation for several variations of definitions
	#{gsub(/<etym lang="la" opt="n">/, vetySepA)}
	# Testing
	{ gsub(/<sense id.*><etym lang="la" opt="n">/, vetySepA) }
	{ gsub(/<\/etym>\. —<\/sense>/, "]") }
	{ gsub(/<\/etym>\, <trans opt="n">/, vetySepB) }
	{ gsub(/<\/etym>\.—/, vetySepB) }
	{ gsub(/<\/etym>\. /, "]") }

	# Get rid of potential extra definition markers
	{ gsub(/\.—<\/sense>/, ".") }
	{ gsub(/\.— <\/sense>/, ".") }
	{ gsub(/\. — <\/sense>/, ".") }
	{ gsub(/<\/usg>—<\/sense>/, ".") }

	{ vID = substr($2, 1, length($2)-1) }

BEGIN   { split ("1 2 3 4 5 6 7 8 9 10 I II III IV IV. V V. A B C C. D E F G H", VNARR)
         VNARR[0] = 0
        }

        {

	#If matched then print section divider
	for (vid = 0; vid <= 19; vid++)
	  for (vl = 0; vl <= 3; vl++)
	    for (vn = 0; vn <=26; vn++) {

		#IDvar="<sense " vID "." vid "\" level=\"" vl "\" n=\"" VNARR[vn] "\" opt=\"n\">"
		#print IDvar

		gsub("<sense " vID "." vid "\" level=\"" vl "\" n=\"" VNARR[vn] "\" opt=\"n\">", vdefSep )

		}
	}

	# Add missing dot after gender
	{ gsub(/<\/gen>/, ". ") }

	# Collapse all remaining tags
	{ gsub(tagIn "[^" tagOut "]*" tagOut, "") }

	# Separate em-dash text
	{ if ((!/—\\,/) && (!/[a-zA-Z]—/) && (!/ —/)) {gsub (/—/, vemSep) }}
        { if ((!/—\\,/) ) {gsub (/\.—/, "." vemSep)}}
        { gsub (/ — /, vemSep)}
	{ if (!/—\\,/) {gsub (/\.—/, "." vemSep)}}

	# Remove double spaces and spaces between certain characters
	{ gsub(/ +/,  " ") }
	{ gsub(/ ,/,  ",") }
	{ gsub(/\( /, "(") }
	{ gsub(/ \)/, ")") }
	{ gsub(/ \./, ".") }
	{ gsub(/ \:/, ":") }
	{ gsub(/ \?/, "?") }
	{ gsub(/\‘ /, "‘") }
	{ gsub(/ \’/, "’") }
	{ gsub(/^ /,  "" ) }
	{ gsub(/\.\.\. /, "...") }
	{ NF }

{ print "\n" $0 "\n" }'

done

I had made a version with such great notes, but upon finishing it, there was an error which I could fix. Likely I lost a bracket somewhere. Smilie

Once again, thank you all. I am still (always) open to any other suggestions. Smilie


*EDIT*

Updated script: XML dictionary file is now automatically downloaded to ~/.config/latin/ if not present. There is no manual downloading required. Just run the script and all is done automatically. Smilie

Last edited by bedtime; 02-26-2018 at 04:21 AM.. Reason: Updated script
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

use var in gsub of awk

Hi all, This problem has cost me half a day, and i still do not know how to do. Any help will be appreciated. Thanks advance. I want to use a variable as the first parameters of gsub function of awk. Example: { ... arri]=gsub(i,tolower(i),$1) (which should be ambraced by //) ... } (1 Reply)
Discussion started by: summer_cherry
1 Replies

2. Shell Programming and Scripting

Help with AWK and gsub

Hello, I have a variable that displays the following results from a JVM.... 1602100K->1578435K I would like to collect the value of 1578435 which is the value after a garbage collection. I've tried the following command but it looks like I can't get the > to work. Any suggestions as... (4 Replies)
Discussion started by: npolite
4 Replies

3. Shell Programming and Scripting

awk gsub

Hi all I want to do a simple substitution in awk but I am getting unexpected output. My function accepts a time and then prints out a validation message if the time is valid. However some times may include a : and i want to strip this out if it exists before i get to the validation. I have shown... (4 Replies)
Discussion started by: pxy2d1
4 Replies

4. Shell Programming and Scripting

Awk Gsub Query

Hi, Can some one please explain the following line please throw some light on the ones marked in red awk '{print $9}' ${FTP_LOG} | awk -v start=${START_DATE} 'BEGIN { FS = "." } { old_line1=$0; gsub(/\-/,""); if ( $3 >= start ) print old_line1 }' | awk -v end=${END_DATE} 'BEGIN { FS="." } {... (3 Replies)
Discussion started by: crosairs
3 Replies

5. Shell Programming and Scripting

Awk gsub error.

I want to replace comma with space and "*646#" with space. I am using the following code: nawk -F"|" '{gsub(","," ",$3); gsub(/\*646\#/"," ",$3);print}' OFS="|" file I am getting following error: Help is appreciated (5 Replies)
Discussion started by: pinnacle
5 Replies

6. Shell Programming and Scripting

awk gsub with variables?

Hey, I would like to replace a string by a new one. Teh problem is that both strings should be variables to be flexible, because I am having a lot of files (with the same structure, but in different folders) for i in daysim_* do cd $i/5/ folder=`pwd |awk '{print $1}'` awk '{ if... (3 Replies)
Discussion started by: ergy1983
3 Replies

7. Shell Programming and Scripting

Awk; gsub in fields 3 and 4

I want to transform a log file into input for a database. Here's the log file: Tue Aug 4 20:17:01 PDT 2009 Wireless users: 339 Daily Average: 48.4285 = Tue Aug 11 20:17:01 PDT 2009 Wireless users: 295 Daily Average: 42.1428 = Tue Aug 18 20:17:01 PDT 2009 Wireless users: 294 Daily... (6 Replies)
Discussion started by: Bubnoff
6 Replies

8. Shell Programming and Scripting

awk gsub

Hi, I want to print the first column with original value and without any double quotes The output should look like <original column>|<column without quotes> $ cat a.txt "20121023","19301229712","100397" "20121023","19361629712","100778" "20121030A","19361630412","100838"... (3 Replies)
Discussion started by: ysrini
3 Replies

9. UNIX for Dummies Questions & Answers

awk gsub with variables

Hello, I'm trying to substitute a string with leading zero for all the records except the trailer record using awk command and with variables. The input file test_med1.txt has data like below 1234ABC...........................9200............LF... (2 Replies)
Discussion started by: somu_june
2 Replies

10. Shell Programming and Scripting

Using multiple gsub() function under a loop in awk

Hi ALL, I want to replace string occurrence in my file "Config" using a external file named "Mapping" using awk. $cat Config ! Configuration file for RAVI ! Configuration file for RACHANA ! Configuration file for BALLU $cat Mapping ravi:ram rachana:shyam ballu:hameed The... (5 Replies)
Discussion started by: useless79
5 Replies
All times are GMT -4. The time now is 02:25 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy