As already proposed, if you describe what is needed someone could come up with some nifty trick e.g. regex. Pls be aware that if the substitution has taken place in the first loop, another 2079 loops will be executed nevertheless.
where the tags were defined as '<sense' and '>'. No more need for 2079 loops of madness.
Anyways, nothing was wasted; all the ideas posted will help in future scripting.
As of now, I will be working on merging some gsub commands with regex tricks.
Oh-and about the braces, when I removed certain ones the program would not operate correctly; it would scatter text and such. I just added a brace between the beginning of the program (after the variables) and before { print $0 }, and I was able to remove all the other braces!
If anyone is interested:
latin:
Code:
#!/bin/mksh
# This program requires an xml dictionary file to run. If it is not on your machine,
# this program will automatically download it and store in ~/.config/latin/.
# Name this file as 'latin' and run:
#
# $ chmod +x latin
#
# To run:
# $ ./latin amo
#
# To enable internet auto-decline:
# $ ./latin -d amo
#
# To run with only auto-decline:
# $ ./latin -c amo
#
# Where 'amo' is the term searched.
key=$2
URL="http://www.perseus.tufts.edu/hopper/morph?l=$key&la=la"
wFIN='<h4 class="la">'
wFOUT='</h4>'
wDefIn='<span class="lemma_definition">'
wDefOut='</span>'
wFormIn='<td class="la">'$key'</td>'
wFormOut='<td style="font-size: x-small">'
## Code which connects to perseus to attain 1st per. sg. (needed as key for xml file)
if [[ ("$1" == "-d") ]]; then
searchTerms=$(wget -q -O- "$URL" | mawk -v vWFIN="$wFIN" -v vWFOUT="$wFOUT" \
' $0 ~ vWFIN,$0 ~ vWFOUT {printf substr($0,18, length($0)-22)"\n"; next;}')
elif [[ ("$1" == "-c") ]]; then
wget -q -O- "$URL" | mawk -v vDefIn="$wDefIn" -v vDefOut="$wDefOut" -v vFormIn="$wFormIn" -v vFormOut="$wFormOut" -v vWFIN="$wFIN" -v vWFOUT="$wFOUT" \
' $0 ~ vWFIN,$0 ~ vWFOUT {printf "\n[ " substr($0,18, length($0)-22)" ]"; next;} $0 ~ vDefIn,$0 ~ vDefOut {{ if (!/>/) {{$1=$1}1; x+=1; print " "$0"";} }} $0 ~ vFormIn,$0 ~ vFormOut {{ if (!/td /) {{$1=$1}1; $0=substr($0,5, length($0)-9); print "-"$0; next;} } }'
else
searchTerms=$1
fi
if [ "$1" == "-c" ]; then
exit
fi
XMLfile=Perseus_text_1999.04.0060.xml
XMLdir=~/.config/latin/
XMLlink="http://www.perseus.tufts.edu/hopper/dltext?doc=Perseus:text:1999.04.0060"
if [ ! -e $XMLdir$XMLfile ]; then
echo "\nFile:" $XMLdir$XMLfile "not found.\n\nDownloading from" $XMLlink "...\n"
mkdir -p ~/.config/latin
wget -qO- $XMLlink | tr -d '\r' > $XMLdir$XMLfile
fi
for key in $searchTerms; do
keyIn='key="'$key'"' # Which tag shall be searched?
keyOut='</entry>' #
tagIn='<' # How are tags to be distinguished?
tagOut='>' #
defTagIn='<sense' # How are definitions defined?
defTagOut='>'
keySepA='' # Separates the main word from its roots
keySepB=',' #
etySepA='[' # Etymology left
etySepB=']\n\n • ' # Etymology right
defSep='\n\n ' # Separates individual definitions
emSep='\n\n • ' # Separates em-dashes
# First concatenate the result into a usable string
awk -v vkeyIn="$keyIn" -v vkeyOut="$keyOut" ' $0 ~ vkeyIn, $0 ~ vkeyOut {printf $0; }' $XMLdir$XMLfile |
awk -v vdefTagIn="$defTagIn" -v vdefTagOut="$defTagOut" -v tagIn="$tagIn" -v tagOut="$tagOut" -v vkeySepA="$keySepA" -v vkeySepB="$keySepB" -v vdefSep="$defSep" -v vetySepA="$etySepA" -v vetySepB="$etySepB" -v vemSep="$emSep" '
{
# Separation after main key word
gsub("<orth>", vkeySepA)
gsub("</orth>", vkeySepB)
# Add separation for several variations of definitions
#gsub(/<etym lang="la" opt="n">/, vetySepA)
gsub(/<sense id.*><etym lang="la" opt="n">/, vetySepA)
gsub(/<\/etym>\. -<\/sense>/, "]")
gsub(/<\/etym>\, <trans opt="n">/, vetySepB)
gsub(/<\/etym>\.-/, vetySepB)
gsub(/<\/etym>\. /, "]")
# Get rid of potential extra definition markers
gsub(/\.-<\/sense>/, ".")
gsub(/\.- <\/sense>/, ".")
gsub(/\. - <\/sense>/, ".")
gsub(/<\/usg>-<\/sense>/, ".")
gsub(/<\/usg> -<\/sense>/, ".")
# Add missing dot after gender
gsub(/<\/gen>/, ". ")
# Collapse all definition tags and add formatting in their place
gsub(vdefTagIn "[^" vdefTagOut "]*" vdefTagOut, vdefSep)
# Collapse all remaining tags
gsub(tagIn "[^" tagOut "]*" tagOut, "")
# Separate em-dash text
if ((!/-\\,/) && (!/[a-zA-Z]-/) && (!/ -/)) gsub (/-/, vemSep)
if ((!/-\\,/) ) gsub (/\.-/, "." vemSep)
gsub (/ - /, vemSep)
gsub (/ -/, vemSep)
if (!/-\\,/) gsub (/\.-/, "." vemSep)
# Remove double spaces and spaces between certain characters
gsub(/ +/, " ")
gsub(/ ,/, ",")
gsub(/\( /, "(")
gsub(/ \)/, ")")
gsub(/ \./, ".")
gsub(/ \:/, ":")
gsub(/ \?/, "?")
gsub(/\‘ /, "‘")
gsub(/ \'/, "'")
gsub(/^ /, "" )
gsub(/\.\.\. /, "...")
}
{ print "\n" $0 "\n" } '
done
Hi all,
This problem has cost me half a day, and i still do not know how to do.
Any help will be appreciated. Thanks advance.
I want to use a variable as the first parameters of gsub function of awk.
Example:
{
...
arri]=gsub(i,tolower(i),$1)
(which should be ambraced by //)
...
} (1 Reply)
Hello,
I have a variable that displays the following results from a JVM....
1602100K->1578435K
I would like to collect the value of 1578435 which is the value after a garbage collection. I've tried the following command but it looks like I can't get the > to work. Any suggestions as... (4 Replies)
Hi all
I want to do a simple substitution in awk but I am getting unexpected output. My function accepts a time and then prints out a validation message if the time is valid. However some times may include a : and i want to strip this out if it exists before i get to the validation. I have shown... (4 Replies)
Hi,
Can some one please explain the following line please throw some light on the ones marked in red
awk '{print $9}' ${FTP_LOG} | awk -v start=${START_DATE} 'BEGIN { FS = "." } { old_line1=$0; gsub(/\-/,""); if ( $3 >= start ) print old_line1 }' | awk -v end=${END_DATE} 'BEGIN { FS="." } {... (3 Replies)
I want to replace comma with space and "*646#" with space.
I am using the following code:
nawk -F"|" '{gsub(","," ",$3); gsub(/\*646\#/"," ",$3);print}' OFS="|" file
I am getting following error:
Help is appreciated (5 Replies)
Hey,
I would like to replace a string by a new one. Teh problem is that both strings should be variables to be flexible, because I am having a lot of files (with the same structure, but in different folders)
for i in daysim_*
do
cd $i/5/
folder=`pwd |awk '{print $1}'`
awk '{ if... (3 Replies)
Hi, I want to print the first column with original value and without any double quotes
The output should look like
<original column>|<column without quotes>
$ cat a.txt
"20121023","19301229712","100397"
"20121023","19361629712","100778"
"20121030A","19361630412","100838"... (3 Replies)
Hello,
I'm trying to substitute a string with leading zero for all the records except the trailer record using awk command and with variables. The input file test_med1.txt has data like below
1234ABC...........................9200............LF... (2 Replies)
Hi ALL,
I want to replace string occurrence in my file "Config" using a external file named "Mapping" using awk.
$cat Config
! Configuration file for RAVI
! Configuration file for RACHANA
! Configuration file for BALLU
$cat Mapping
ravi:ram
rachana:shyam
ballu:hameed
The... (5 Replies)