Extracting data between continuous non empty xml tags


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extracting data between continuous non empty xml tags
# 1  
Old 09-14-2015
Extracting data between continuous non empty xml tags

Hi,

I need help in extracting only the phone numbers between the continuous non empty xml tags in unix. I searched through a lot of forum but i did not get exact result for my query. Please help

Given below is the sample pipe delimited file. I have a lot of tags before and after ...<phone>...</phone>... tags

Sample file:

Code:
|<phone>|<number>1234567890</number>|<type>primary</type>|</phone>|

|<phone>|<number>2345678999</number>|<type >primary</type>|</phone>|

|<phone>|<number>3214325432</number>|<type>primary</type>|</phone>|

|<phone>|<number>9876543210</number>|<type>primary</type>|</phone>|

|<phone>|<number>4567896789</number>|<type>primary</type>|</phone>|

Expected Output:
Code:
1234567890
2345678999
3214325432
9876543210
4567896789


Last edited by zen01234; 09-14-2015 at 06:16 PM.. Reason: Formatting
# 2  
Old 09-14-2015
To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)



Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums
# 3  
Old 09-14-2015
I tried these commands, it worked!
1st command
sed -n '/phone/{s/.*<phone>\(.*\)<\/phone>.*/\1/;p}' file1 > file2

I got below output:
|<number>0000000000</number>|<type>primary</type>|

2nd command
awk -F '[<>]' '{n = split($3,array," "); print array[n] >> "file3" }' file2

I got the expected output:
1234567890
2345678999
3214325432
9876543210
4567896789
# 4  
Old 09-15-2015
If we take your two commands (modified to create file3 instead of append to it):
Code:
sed -n '/phone/{s/.*<phone>\(.*\)<\/phone>.*/\1/;p;}' file1 > file2  
awk -F '[<>]' '{n = split($3,array," "); print array[n] > "file3" }' file2

you get what you want in file3. If you change the 2nd command to:
Code:
awk -F '[<>]' '{print $3}' file2 > file4

you get the same output in file4. And, if you change the 1st command to:
Code:
sed -n '/phone/{s/.*<number>\(.*\)<\/number>.*/\1/;p;}' file1 > file5

you get the same output in file5 with one command instead of two. And, if you prefer to use awk instead of sed, with your sample input, the following:
Code:
awk -F '[<>]' '/phone/{print $5}' file1 > file6

produces the same output in file6. But, if there could be other tags between the phone tags, the following would be safer:
Code:
awk -F '<number>|</number>' '/phone/{print $2}' file1 > file7

still producing the same output in file7.

Of course, the 1st two of the above work only if the number tags are on the same line as the starting and closing phone tags; the others work as long as the starting and closing number tags are on the same line as a starting or closing phone tag.

None of the above work if the starting and closing number tags are not on the same line. And, none of the above will find all of the numbers you want if there is more than one set of number tags on a single line.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 09-15-2015
Thank you Don Cragon for your code but unfortunately it doesn't worked.. My file is a huge file and it has same fields repeating over and over with different tag headings as shown below.

Code:
<store><number>1234</number></store>
<phone><number>1234567890</number><type>primary</type></phone>
<address><street>abc</street><appt><number>11</number></address>

I tried your command

1st command
Code:
sed -n '/phone/{s/.*<number>\(.*\)<\/number>.*/\1/;p;}' file1 > file5

Command is searching for the first occurrence of number tag instead of searching number tag after phone tag

2nd command also searching for the first occurrence of number tag instead of searching number tag after phone tag.

awk -F '<number>|</number>' '/phone/{print $2}' file1 > file7

I got the output from both commands as 1234 instead of 1234567890
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) when showing full line and especially when showing multi-line output.

Last edited by Don Cragun; 09-15-2015 at 08:56 PM.. Reason: Fix tags.
# 6  
Old 09-15-2015
Quote:
Originally Posted by zen01234
Thank you Don Cragon for your code but unfortunately it doesn't worked.. My file is a huge file and it has same fields repeating over and over with different tag headings as shown below.

Code:
<store><number>1234</number></store>
<phone><number>1234567890</number><type>primary</type></phone>
<address><street>abc</street><appt><number>11</number></address>

I tried your command

1st command
Code:
sed -n '/phone/{s/.*<number>\(.*\)<\/number>.*/\1/;p;}' file1 > file5

Command is searching for the first occurrence of number tag instead of searching number tag after phone tag

2nd command also searching for the first occurrence of number tag instead of searching number tag after phone tag.

awk -F '<number>|</number>' '/phone/{print $2}' file1 > file7

I got the output from both commands as 1234 instead of 1234567890
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) when showing full line and especially when showing multi-line output.
With the three lines of sample input shown above in file1, both the sed command and the awk command shown above will only write:
Code:
1234567890

into files file5 and file7, respectively, not:
Code:
1234
1234567890
11

Neither script explicitly checks that <number>string</number> appears after <phone>, but they both only look for number tags on lines that contain the string phone.

Please reread the last two paragraphs I wrote in post #4 in this thread. They clearly state the limitations of the scripts presented above (and the other suggested scripts in that post).

If these results don't match what you get on your system with the above data, please show us the exact output you are getting and tell us what operating system and shell you are using.

If the code does work for this example, but fails for some other data, show us an example of the exact input lines that are producing the wrong output and shows the exact output produced, AND clearly describe the exact arrangement of tags on lines in the files you are trying to process. Everything you have shown us says that lines that you want to process have <phone> as the first tag on a line, the string you want to retrieve between <number> and </number> tags, and the last tag on the line is </phone>. And, all of the suggestions that have been provided assume this is an accurate description of the lines that need to be processed.
# 7  
Old 09-15-2015
Got Perl?
Code:
$ perl -nle '/phone.+number>(\d+)<\/number/ and print $1' zen01234.file
1234567890
2345678999
3214325432
9876543210
4567896789
1234567890

Code:
$ cat zen01234.file
|<phone>|<number>1234567890</number>|<type>primary</type>|</phone>|
|<phone>|<number>2345678999</number>|<type >primary</type>|</phone>|

|<phone>|<number>3214325432</number>|<type>primary</type>|</phone>|

|<phone>|<number>9876543210</number>|<type>primary</type>|</phone>|

|<phone>|<number>4567896789</number>|<type>primary</type>|</phone>|
<store><number>1234</number></store>
<phone><number>1234567890</number><type>primary</type></phone>
<address><street>abc</street><appt><number>11</number></address>

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need Help in extracting data from XML File

Hi All My input file is an XML and it has some tags and data rows at end. Starting of data rows is <rs:data> and ending of data rows is </rs:data>. Within sample data rows (2 rows) shown below, I want to extract data value after equal to sign (until space or "/" sign). So if XML data... (7 Replies)
Discussion started by: vx04
7 Replies

2. Shell Programming and Scripting

print xml data without the tags.

Hi All, I'm trying to extract data from an xml file but without the codes. I've achieved it but i was wondering if there's a better way to do this. sample data: $ cat xmlfile <code> <to>tove</to> <from>jani</from> <heading>reminder</heading> <body>dont forget me</body> </code> ... (4 Replies)
Discussion started by: Irishboy24
4 Replies

3. Shell Programming and Scripting

Data between XML Tags

<?xml version="1.0" encoding="iso-8859-1" ?> <TABLE> <TEST> <ID> 123 </ID> <name> abc </name> </TEST> <TEST> <ID> 123 </ID> <name> abc2 </name> </TEST> </TABLE> <TABLE> <TEST> <ID> 456 </ID> <name> def </name> </TEST> <TEST> ... (8 Replies)
Discussion started by: eskay
8 Replies

4. Shell Programming and Scripting

Extracting Data between Tags with square Brackets

Hello @all, first, sorry for my bad english language. I try to extract with bash an text inside of a html page witch is finding between two tags. There is only one Tag in this file. Here is an example: Wert... (2 Replies)
Discussion started by: hennerich
2 Replies

5. Shell Programming and Scripting

Need help in extracting data from xml file

Hello, This is my first post in here, so excuse me if I sound too noob here! I need to extract the path "/apps/mp/installedApps/V61/HRO/hrms_01698_A_qa.ear" from the below xml extract. The path will always appear with the key "binariesURL" <deployedObject... (6 Replies)
Discussion started by: abhishek2386
6 Replies

6. UNIX for Dummies Questions & Answers

Extracting data from an xml file

Hello, Please can someone assist. I have the following xml file: <?xml version="1.0" encoding="utf-8" ?> - <PUTTRIGGER xmlns:xsd="http://www.test.org/2001/XMLSchema" xmlns:xsi="http://www.test.org/2001/XMLSchema-instance" APPLICATIONNUMBER="0501160" ACCOUNTNAME="Mrs S Test"... (15 Replies)
Discussion started by: Dolph
15 Replies

7. Shell Programming and Scripting

Extracting data between tags based on search string from unix file

Input file is on Linux box and the input file has data in just one line with 1699741696 characters. Sample Input: <xxx><document coll="uspatfull" version="0"><CMSdoc>xxxantivirus</CMSdoc><tag1>1</tag1></document><document coll="uspatfull"... (5 Replies)
Discussion started by: gaya
5 Replies

8. Shell Programming and Scripting

How to update data between xml tags

Is there a way to modify Non Null data between <host> and </host> tags to a new value ?- may be using sed/awk? I tried this sed 's|.*<host>\(?*\)</host>.*|\<host>xxx</host>|' but it is updating the host which has null value - want opposite of this - Thanks in advance for you help!! For... (2 Replies)
Discussion started by: harry_todd
2 Replies

9. Shell Programming and Scripting

how to get data from xml files tags(from data tags)

i have a file like <fruits> <apple>redcolor<\apple> <bana:rolleyes:na>yellow color and it is<\banana> </fruits> i need a text between apple and bannana ans so on.... how to read a text between a tags it multiple tags with differnt names (9 Replies)
Discussion started by: pvr_satya
9 Replies

10. Shell Programming and Scripting

Extracting Data from xml file

Hi ppl out there... Can anyone help me with the shell script to extract data from an xml file. My xml file looks like : - <servlet> <servlet-name>FrontServlet</servlet-name> <display-name>FrontServlet</display-name> ... (3 Replies)
Discussion started by: nishana
3 Replies
Login or Register to Ask a Question