How do I extract parameter value after name="value" accurately?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How do I extract parameter value after name="value" accurately?
# 8  
Old 02-13-2016
Here is pure bash/ksh93 solution. Not used any external commands like awk, perl, grep, sed, ...

If you have too old bash, update it. Older bash parser has bug (ex. 4.1) , it can't parse correctly subprocess case syntax.

Code:
while read line
do
        # grep using case
        case "$line" in
           \<jvmEntries*)  ;;
                *) continue ;;
        esac
        # sed using builtin properties
        line=${line// =/=}
        line=${line//= /=}

        # parse line elements to the array using delimiter $IFS
        elem=($line)

        # create var=value lines and source it
        . /dev/stdin <<< $(
                for e in ${elem[@]}
                do
                        # grep only xxx=xxx and xxx="xxx" values
                        case "$e" in
                                -*) continue ;;
                                *:*=*) continue ;; #
                                *=\"*\") ;; # set value
                                *=\"*) continue ;; #
                                *=*) ;; # set value ...
                                *) continue ;; # something else
                        esac
                        # this was interesting element, take it
                        echo "$e"
                done
                )

        # show the variables
        echo "initialHeapSize $initialHeapSize"
        echo "maximumHeapSize $maximumHeapSize"

done < some.xml

This User Gave Thanks to kshji For This Post:
# 9  
Old 02-13-2016
Nice one! Alas it doesn't extract e.g. debugArgs or enericJvmArguments that have a list of space separated strings enclosed in double quotes.
# 10  
Old 02-13-2016
Thanks kshji, I will try that and see if there are any short comings.
# 11  
Old 02-13-2016
With the GNU utilities that you have , you could:
Code:
sed -r 's/.*initialHeapSize[^0-9]*=[^0-9]*([0-9]+).*/\1/'

or
Code:
grep -Eo 'initialHeapSize[^0-9]*=[^0-9]*[0-9]+' | sed 's/.*[^0-9]//'

or with the perl -P extension:
Code:
grep -Po 'initialHeapSize\D*=\D*\d+' | sed 's/.*[^0-9]//'

# 12  
Old 02-13-2016
This is not working. It is printing all XML
Code:
sed -r 's/.*initialHeapSize[^0-9]*=[^0-9]*([0-9]+).*/\1/' server.xml

below two solutions are working.
Code:
grep -Eo 'initialHeapSize[^0-9]*=[^0-9]*[0-9]+' | sed 's/.*[^0-9]//' server.xml
grep -Po 'initialHeapSize\D*=\D*\d+' | sed 's/.*[^0-9]//'  server.xml

But I think using initialHeapSize[^0-9]*= could select any text preceding =.
Better way to do it is to say there can be spaces between key and =. There can be zero or more spaces after = and a single or double quote followed by digits.

Code:
cat server.xml | grep '<jvmEntries' | grep -iEo 'initialHeapSize[[:space:]]*=[[:space:]]*[\x27"]?[0-9]+'  | sed 's/.*[^0-9]//'
256

If I can use above regex in a single sed statement that would even better to understand and maintain.

I think this also is a great solution, if only I could push grep condition into perl regex. This is case insensitive and tries to be precise with key=value matching with best guess for space, =, quotes
Code:
cat server.xml | grep -i '<jvmEntries' |  perl -ne 'print /initialheapSize[[:space:]]*=[[:space:]]*[\x27"]?(\d+)/i'


Last edited by kchinnam; 02-14-2016 at 12:44 AM.. Reason: corrected text
# 13  
Old 02-14-2016
You could also try this awk script. It can handle single-quoted strings, double-quoted strings, and unquoted strings terminated by a space or ">". It requires an equal-sign (with optional leading and trailing spaces) between keyword and its value. If the value is an empty string, it must be quoted; otherwise the value doesn't need to be quoted unless the value contains a space or ">". Single-quotes can be included in double-quoted strings and double-quotes can be included in single-quoted strings.
Code:
#!/bin/ksh
file="$1"
tag="$2"
shift 2
printf '%s\n' "$@" | awk -v tag="$tag" -v sq="'" -v dq='"' '
FNR == NR {
	# Get keyword list.
	list[++n] = $0
#printf("list[%d] set to %s\n", n, list[n])
}
$1 == "<" tag {
	# Look for the requested keywords in this tag...
	for(i = 1; i <= n; i++) {
		if(match($0, "[: ]" list[i] " *= *") <= 0) {
			# No match for this keyword.
			print "***No match"
			continue
		}
		val1 = RSTART + RLENGTH
		if((c1 = substr($0, val1, 1)) == dq || c1 == sq) {
			# We have a single-quoted string or double-quoted
			# string value.  Find the end of the string value.
			val_len = index(substr($0, val1 + 1), c1) - 1
			# Extract the string value.
			val = substr($0, val1 + 1, val_len)
		} else {# We have a space or ">" terminated value.
			# Find the end of the value.
			val_len = match(substr($0, val1), /[ >]/) - 1
			val = substr($0, val1, val_len)
		}
		print val
	}
}' - "$file" | (
	while [ $# -gt 0 ]
	do	read -r value
		printf 'tag %s keyword %s=%s\n' "$tag" "$1" "$value"
		shift
	done
)

Invoke it with the 1st operand being the name of the XML file to be processed, the 2nd operand being the tag on the line to be processed, and the remaining operands being the keywords on that line whose values are to be printed with one output line for each keyword requested printed in the same order as the keywords were given on the command line.

As always, if you want to try this script on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk. (Note that nawk will not correctly process this script.)

If you have a file named file.xml containing:
Code:
 <NotjvmEntries xmi:id="NotJavaVirtualMachine_1337159909831" verboseModeClass="true" verboseModeGarbageCollection="false" verboseModeJNI="true" initialHeapSize="2560" maximumHeapSize="5120" runHProf="true" hprofArguments="null" debugMode="true" debugArgs='-DDQ=" -Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7777' enericJvmArguments="-DSq=' -Dawt.headless=true -Xjit:disableIdiomRecognition -Dsun.net.inetaddr.ttl=120">
  <jvmEntries xmi:id="JavaVirtualMachine_1337159909831" verboseModeClass="false" verboseModeGarbageCollection="true" verboseModeJNI="false" initialHeapSize="256" maximumHeapSize="512" runHProf="false" hprofArguments="" debugMode="false" debugArgs="-Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7777" enericJvmArguments="-Dawt.headless=true -Xjit:disableIdiomRecognition -Dsun.net.inetaddr.ttl=120">
xmi:id="JavaVirtualMachine_1337159909831" verboseModeClass="false" verboseModeGarbageCollection="true" verboseModeJNI="false" initialHeapSize="256" maximumHeapSize="512" runHProf="false" hprofArguments="" debugMode="false" debugArgs="-Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7777" enericJvmArguments="-Dawt.headless=true -Xjit:disableIdiomRecognition -Dsun.net.inetaddr.ttl=120"
 <test xmi:id=NotJavaVirtualMachine_1337159909831 verboseModeClass=true verboseModeGarbageCollection=false verboseModeJNI=true initialHeapSize=2560 maximumHeapSize=5120 runHProf=true hprofArguments=null debugMode=true debugArgs='-DDQ=" -Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7777' genericJvmArguments="-DSq=' -Dawt.headless=true -Xjit:disableIdiomRecognition -Dsun.net.inetaddr.ttl=120">

and you have saved the above script as an executable script named tester, then the command:
Code:
tester file.xml test id verboseModeClass hprofArguments maximumHeapSize minimumHeapSize initialHeapSize debugArgs genericJvmArguments enericJvmArguments

produces the output:
Code:
tag test keyword id=NotJavaVirtualMachine_1337159909831
tag test keyword verboseModeClass=true
tag test keyword hprofArguments=null
tag test keyword maximumHeapSize=5120
tag test keyword minimumHeapSize=***No match
tag test keyword initialHeapSize=2560
tag test keyword debugArgs=-DDQ=" -Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7777
tag test keyword genericJvmArguments=-DSq=' -Dawt.headless=true -Xjit:disableIdiomRecognition -Dsun.net.inetaddr.ttl=120
tag test keyword enericJvmArguments=***No match

and the command:
Code:
tester file.xml jvmEntries id verboseModeClass hprofArguments maximumHeapSize minimumHeapSize initialHeapSize debugArgs genericJvmArguments enericJvmArguments

produces the output:
Code:
tag jvmEntries keyword id=JavaVirtualMachine_1337159909831
tag jvmEntries keyword verboseModeClass=false
tag jvmEntries keyword hprofArguments=
tag jvmEntries keyword maximumHeapSize=512
tag jvmEntries keyword minimumHeapSize=***No match
tag jvmEntries keyword initialHeapSize=256
tag jvmEntries keyword debugArgs=-Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7777
tag jvmEntries keyword genericJvmArguments=***No match
tag jvmEntries keyword enericJvmArguments=-Dawt.headless=true -Xjit:disableIdiomRecognition -Dsun.net.inetaddr.ttl=120


Last edited by Don Cragun; 02-14-2016 at 03:53 PM.. Reason: Consolidate extraction of quoted string values.
This User Gave Thanks to Don Cragun For This Post:
# 14  
Old 02-14-2016
Thanks Don for taking time to hammer it, this is certainly comprehensive solution.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Beginners Questions & Answers

Extract delta records using with "comm" and "sort" commands combination

Hi All, I have 2 pipe delimited files viz., file_old and file_new. I'm trying to compare these 2 files, and extract all the different rows between them into a new_file. comm -3 < sort file_old < sort file_new > new_file I am getting the below error: -ksh: sort: cannot open But if I do... (7 Replies)
Discussion started by: njny
7 Replies

4. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

5. UNIX for Dummies Questions & Answers

Shc script size limitation and "_SC_ARG_MAX (see sysconf(2))" parameter

I wish to change the parameter (which I do not understand exactly what it is and I wish to) and be able to use shc with very long bash scripts (2 Replies)
Discussion started by: frad
2 Replies

6. Shell Programming and Scripting

Help with error "por: 0403-012 A test command parameter is not valid."

Hi, im asking for help with the next script: echo ^; then if then printf "\033 query1.sh: export TERM=vt100 export ORACLE_TERM=at386 export ORACLE_HOME=/home_oracle8i/app/oracle/product/8.1.7 export ORACLE_BASE=/home_oracle8i/app/oracle export... (8 Replies)
Discussion started by: blacksteel1988
8 Replies

7. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

8. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies

9. UNIX for Dummies Questions & Answers

What is meant by Kernel Parameter "dflssiz" in Digital Unix (OSF)

Hi, We have a Digital Unix Server with OSF. There's a Kernel Parameter "dflssiz" on this server. I just want to know, what it means. Thanks (2 Replies)
Discussion started by: sameerdes
2 Replies
Login or Register to Ask a Question