Cutting all xml tags out of a line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Cutting all xml tags out of a line
# 1  
Old 02-22-2018
Cutting all xml tags out of a line

I would like search and find a word (easily identified by 'key') from an xml file and then cut all of the tags out of the resulting line (anything between a < and a >) and display the remaining material. I am running Debian and mksh shell.

dictionary.sh:
Code:
#!/bin/sh

key='key="'$1'"><form'
tagIn='<'
tagOut='>'

awk -v vkey="$key" -v vtagIn="$tagIn" -vtagOut="$tagOut" \
'$0 ~ vkey {print "Found:     " vkey "\n"}       $0 ~ vtagIn ".*" vtagOut {sub(vtagOut ".*", ""); sub(".*" vtagIn, ""); printf $0;}' \
words.txt

words.txt
Code:
<entry id="n53" type="main" key="abolesco"><form opt="n"><orth extent="full" lang="la" opt="n">abolēscō</orth></form><gramGrp opt="n"><itype opt="n"> olēvī, -, ere, </itype>incept. </gramGrp><sense id="n53.0" level="0" n="0" opt="n"><etym lang="la" opt="n">aboleo</etym>, <trans opt="n"><tr opt="n">to decay gradually, vanish, disappear, die out</tr></trans>: <foreign lang="la">nomen vetustate</foreign>, <usg opt="n">L.</usg>: <foreign lang="la">tanti gratia facti</foreign>, <usg opt="n">V.</usg> </sense></entry>
M
M
M
<entry id="n54" type="main" key="abolitio"><form opt="n"><orth extent="full" lang="la" opt="n">abolitiō</orth></form><gramGrp opt="n"><itype opt="n"> ōnis, </itype><gen opt="n">f</gen> </gramGrp><sense id="n54.0" level="0" n="0" opt="n"><etym lang="la" opt="n">aboleo</etym>, <trans opt="n"><tr opt="n">an abolition</tr></trans>: <foreign lang="la">tributorum</foreign>, <usg opt="n">Ta.</usg>-<trans opt="n"><tr opt="n">An annulling</tr></trans>: <foreign lang="la">sententiae</foreign>, <usg opt="n">Ta.</usg> </sense></entry>

Run as:
Code:
$ ./dictionary abolesco

The result is:
Code:
Found:     key="abolesco"><form

As you can see, the code finds the word, but I don't know how to remove the tags. I feel I'm quite off with this one.

I would like the result to be tagless and only for the word searched. Like this:
Code:
abolēscō olēvī, -, ere, incept. aboleo to decay gradually, vanish, disappear, die out: nomen vetustate, L.: tanti gratia facti, V.

Even if someone could point me in the right direction that would be great.
# 2  
Old 02-22-2018
Try
Code:
awk -v vkey="$key" -v tagIn="$tagIn" -vtagOut="$tagOut" '
$0 ~ vkey       {print "Found:     " vkey "\n"
                 gsub (tagIn "[^" tagOut "]*" tagOut, "")
                 print
                }
' file
Found:     key="abolesco"><form

abolēscō olēvī, —, ere, incept. aboleo, to decay gradually, vanish, disappear, die out: nomen vetustate, L.: tanti gratia facti, V.

This User Gave Thanks to RudiC For This Post:
# 3  
Old 02-22-2018
Hi.

In thread Extract a value from an xml file, post #11, there are examples for extraction using:
Code:
xml_grep /usr/bin/xml_grep version 0.9
xmlstarlet - ( /usr/bin/xmlstarlet, 2014-09-14 )
xmllint: using libxml version 20901
xml2 - ( /usr/bin/xml2, 2012-04-16 )

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
# 4  
Old 02-22-2018
Quote:
Originally Posted by RudiC
Try
Code:
awk -v vkey="$key" -v tagIn="$tagIn" -vtagOut="$tagOut" '
$0 ~ vkey       {print "Found:     " vkey "\n"
                 gsub (tagIn "[^" tagOut "]*" tagOut, "")
                 print
                }
' file
Found:     key="abolesco"><form

abolēscō olēvī, -, ere, incept. aboleo, to decay gradually, vanish, disappear, die out: nomen vetustate, L.: tanti gratia facti, V.

Thank you. This worked absolutely perfectly...until the xml code I was using suddenly changed to grouping between lines and not on the one line, but I got that sorted out and all is fine. Smilie

Quote:
Hi.

In thread https://www.unix.com/shell-programmi...-xml-file.html, post #11, there are examples for extraction using:


Code:
xml_grep /usr/bin/xml_grep version 0.9
xmlstarlet - ( /usr/bin/xmlstarlet, 2014-09-14 )
xmllint: using libxml version 20901
xml2 - ( /usr/bin/xml2, 2012-04-16 )
Thanks. I'm thinking this will come in handy. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Assiging to a variable after cutting from the input line

Hi all, I am reading from the file having entries like below 111.ABC.POT 6477.YHT.OIT Now I need to read each line and cut each line seperated by dot and print into the file . I tried below and it is not working . Please help while read READLINE do eval... (4 Replies)
Discussion started by: Hypesslearner
4 Replies

2. Shell Programming and Scripting

Cutting commas after the second occurrence in a line

Hello everyone, I am manipulating a large CSV file and am trying to read it into a program and started running into trouble. The have manually edited the file trying to make it correctly run through the program and have made progress. However, I am know stuck with an issue involving too many... (3 Replies)
Discussion started by: tastybrownies
3 Replies

3. Shell Programming and Scripting

Shell Command to compare two xml lines while ignoring xml tags

I've got two different files and want to compare them. File 1 : HTML Code: <response ticketId="944" type="getQueryResults"><status>COMPLETE</status><description>Query results fetched successfully</description><recordSet totalCount="1" type="sms_records"><record... (1 Reply)
Discussion started by: Shaishav Shah
1 Replies

4. Shell Programming and Scripting

Insert a new line between the XML tags?.

<TestLog> <TriggerAPI> <StartDate>Nov 16, 2012 6:34:02 AM com.satttest01.Response() </StartDate> <RequestType>SUCCESS: Send :</RequestType> <TranNumber>5210203</TranNumber> <TranId>8585319731207148</TranId> </TriggerAPI> <TriggerAPI> <StartDate>Nov 16, 2012 6:34:02 AM... (3 Replies)
Discussion started by: laknar
3 Replies

5. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

6. Shell Programming and Scripting

Closing XML tags in one line

Any one can help Example having an spml output as below (the complete one has more than 10000 tags): ============= <ts11> <msisdn>123</msisdn> <bcieID>TELEPHON</bcieID> </ts11> <ts21> <msisdn>987</msisdn> </ts21> ... ======= I want to have every tag closed in one line ... (10 Replies)
Discussion started by: hafidiw
10 Replies

7. Shell Programming and Scripting

Cutting a part of line till delimiter

here are the few scenarios... isoSizeKB text NOT NULL, reserved1 varchar(255), KEY `deviceId` (`deviceId`) `d5` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `dHead` enum('HistoryInfo','Diversion') COLLATE utf8_unicode_ci, `ePR` int(11) DEFAULT '0', PRIMARY KEY (`id`) ... (7 Replies)
Discussion started by: vivek d r
7 Replies

8. Shell Programming and Scripting

Cutting columns starting at the end of each line...

Hi Guys, Can you help me with a sed or a csh script that will have an output from the input below. Cutting the columns starting from the end of the line and not from the start of the line? Sample1 - The underscore character "_" is actually a space...i need to put it as underscore here coz... (2 Replies)
Discussion started by: elmer1503
2 Replies

9. Shell Programming and Scripting

Cutting number from range in xml file

Hi folks, I need to find the following value: First,I need to find the starting section by finding the line: <process-type id="OC4J_RiGHTv_${SCHEMA_NAME}" module-id="OC4J"> Second,under this line I need to find the following line: <port id="rmi" range="3765-3776"/> And third,from this line... (4 Replies)
Discussion started by: nir_s
4 Replies
Login or Register to Ask a Question