XML - Split And Extract String between Chars


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting XML - Split And Extract String between Chars
# 1  
Old 12-19-2012
XML - Split And Extract String between Chars

Hi,

I am trying to read the records from file and split into multiple files.
SourceFile.txt
Code:
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>John</Recipient><BookID>20121212160208080</BookID><Created>2012-12-12T16:02:08.080-08:00</Created><VName>VN_010203.xml</VName></Info>........
...........................
............................</BOOK>
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>zz</Recipient><BookID>20111212160208080</BookID><Created>2011-12-12T16:02:08.080-08:00</Created><VName>VN_010004.xml</VName></Info>........
...........................
............................</BOOK>
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>bb</Recipient><BookID>20101212160208080</BookID><Created>2010-12-12T16:02:08.080-08:00</Created><VName>VN_000001.xml</VName></Info>........
...........................
............................</BOOK>
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>J</Recipient><BookID>20091212160208080</BookID><Created>2009-12-12T16:02:08.080-08:00</Created><VName>VN_999999.xml</VName></Info>........
...........................
............................</BOOK>
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>cc</Recipient><BookID>20071212160208080</BookID><Created>2007-12-12T16:02:08.080-08:00</Created><VName>VN_011111.xml</VName></Info>........
...........................
............................</BOOK>
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>dd</Recipient><BookID>20081212160208080</BookID><Created>2008-12-12T16:02:08.080-08:00</Created><VName>VN_022222.xml</VName></Info>........
...........................
............................</BOOK>

My Output should be:
VN_010203.xml
Code:
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>John</Recipient><BookID>20121212160208080</BookID><Created>2012-12-12T16:02:08.080-08:00</Created><VName>VN_010203.xml</VName></Info>........
...........................
............................</BOOK>

VN_010004.xml
Code:
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>zz</Recipient><BookID>20111212160208080</BookID><Created>2011-12-12T16:02:08.080-08:00</Created><VName>VN_010004.xml</VName></Info>........
...........................
............................</BOOK>

VN_000001.xml
Code:
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>bb</Recipient><BookID>20101212160208080</BookID><Created>2010-12-12T16:02:08.080-08:00</Created><VName>VN_000001.xml</VName></Info>........
...........................
............................</BOOK>

VN_999999.xml
Code:
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>J</Recipient><BookID>20091212160208080</BookID><Created>2009-12-12T16:02:08.080-08:00</Created><VName>VN_999999.xml</VName></Info>........
...........................
............................</BOOK>

VN_011111.xml
Code:
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>cc</Recipient><BookID>20071212160208080</BookID><Created>2007-12-12T16:02:08.080-08:00</Created><VName>VN_011111.xml</VName></Info>........
...........................
............................</BOOK>

VN_022222.xml
Code:
<?xml version="1.0" encoding="UTF-8"?>
<BOOK><Info><Sender>O'Relly</Sender><Recipient>dd</Recipient><BookID>20081212160208080</BookID><Created>2008-12-12T16:02:08.080-08:00</Created><VName>VN_022222.xml</VName></Info>........
...........................
............................</BOOK>

below is the code what I have tried so far but I am unable to store the Filename, is there any way to get the solution in a single script or command.

Code:
awk 'NR%2==1{x="VName_" ++i;}{print > x}' input
 
awk -F"[\>\<]" '/<VName>/ {print $5}' input

Thanks in advance
--Ulf
# 2  
Old 12-19-2012
Code:
awk -F "<VName>|</VName>" '{s=s?s"\n"$0:$0}
/^<BOOK><Info>/{fn=$2}
/<\/BOOK>$/{print s > fn ;s=""}' file

This User Gave Thanks to pamu For This Post:
# 3  
Old 12-19-2012
Thanks a lot its working fine
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to split data with a delimiter having chars and special chars

Hi Team, I have a file a1.txt with data as follows. dfjakjf...asdfkasj</EnableQuotedIDs><SQL><SelectStatement modified='1' type='string'><! The delimiter string: <SelectStatement modified='1' type='string'><! dlm="<SelectStatement modified='1' type='string'><! The above command is... (7 Replies)
Discussion started by: kmanivan82
7 Replies

2. Shell Programming and Scripting

How can I extract XML block around matching search string?

I want to extract XML block surrounding search string Ex: print XML block for string "myapp1-ear" surrounded by "<application> .. </application>" Input XML: <?xml version="1.0" encoding="UTF-8"?> <deployment-request> <requestor> <first-name>kchinnam</first-name> ... (16 Replies)
Discussion started by: kchinnam
16 Replies

3. Shell Programming and Scripting

Split xml file into multiple xml based on letterID

Hi All, We need to split a large xml into multiple valid xml with same header(2lines) and footer(last line) for N number of letterId. In the example below we have first 2 lines as header and last line as footer.(They need to be in each split xml file) Header: <?xml version="1.0"... (5 Replies)
Discussion started by: vx04
5 Replies

4. Shell Programming and Scripting

Extract strings from XML files and create a new XML

Hello everybody, I have a double mission with some XML files, which is pretty challenging for my actual beginner UNIX knowledge. I need to extract some strings from multiple XML files and create a new XML file with the searched strings.. The original XML files contain the source code for... (12 Replies)
Discussion started by: milano.churchil
12 Replies

5. Shell Programming and Scripting

To extract a string between two words in XML file

i need to extract the string between two tags, input file is <PersonInfoShipTo AddressID="446311709" AddressLine1="" AddressLine2="" AddressLine3="" AddressLine4="" AddressLine5="" AddressLine6="" AlternateEmailID="" Beeper="" City="" Company="" Country="" DayFaxNo="" DayPhone="" Department=""... (5 Replies)
Discussion started by: Padmanabhan
5 Replies

6. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

7. Shell Programming and Scripting

Extract string from XML

Hi, I wish to grep for the first instance of <listen-address> value between the first <server></server> tag in an xml file. Sample xml: ......... <timeout-seconds>1500</timeout-seconds> </jta> <server> <name>Adminserver_DEV</name> ... (9 Replies)
Discussion started by: mohtashims
9 Replies

8. UNIX for Dummies Questions & Answers

Remove Unicode/special chars from XML

Hi, We are receiving an XML file in Unix which has some special characters between tags like '^' etc <Tag> 1e^O7f%<2304e.$d8f57e8^Bf-&e.^Zh7/327e^O7 </Tag> We need to remove all special characters like ^ ones and also any '&' or '<' or '>' being sent within the start and close tags i.e.... (6 Replies)
Discussion started by: dsrookie7
6 Replies

9. Shell Programming and Scripting

Split text in all chars

Hello, I need a shell script to split a text to all chars. The text is: Hello World But i need it: H e l l o W o r l d (7 Replies)
Discussion started by: WSyS
7 Replies

10. Shell Programming and Scripting

sed to extract first two uppercase chars in targeted lines

Hello, I have a file temp.txt: ------------------------- HELLO WORLD This is a temp file. TENCHARSHEre no beginning UPPERCHARS HI There ------------------------- What is a sed egrep command that will target lines that begin with 3-10 uppercase chars, and output the first 2 chars?... (5 Replies)
Discussion started by: str8danked
5 Replies
Login or Register to Ask a Question