Using SED/AWK to extract xml at end of file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using SED/AWK to extract xml at end of file
# 1  
Old 10-26-2010
Using SED/AWK to extract xml at end of file

Hello everyone,

Firstly i do not require alot of help.. i am right at the end of finishing my scipt but cannot find a solution to the last part.

What i need to do is, prompt the user for a file to work with, which i have done.
promt the user for an output file - which is done.
Code:
#!/bin/bash
echo "Get my XML"
echo -n "Enter the source file name : "
read infile
echo -n "Enter output file name : "
read outfile
sed -n 1433,1615p $infile >> $outfile
echo "Data should be in $outfile if this compiled correctly"


The file, is .txt and is massive, i only need the last 200 lines or so which is XML... I know i can use SED to specify what line numbers to extract to the output file, but not all documents that use this script will require the last 200, it could be the midlle 50.

Which leads me on to my problem, using SED or AWK i would like to extract all the xml after 'Sending XML' which is consistant accross all documents, up until the words ' Message sending ended.'

I have been reading various articles/forums which have helped and has lead me to providing my current example. Although, using line numbers is not feasible, they will differ accross the documents, whereas the words above are always present.

I really hope someone can help as i have spent far too much time on this!

Thanks!

H

Last edited by hugh86; 10-26-2010 at 12:00 PM.. Reason: english
# 2  
Old 10-26-2010
Code:
sed -n '/Sending XML/,/Message sending ended/p' ${infile} > ${outfile}

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 10-26-2010
I cannot thank you enough.. i have posted on so many forums and no one ever gets back to me! I will be using this more often!

I now have another issue, The XML i have been left with has line breaks, see below:

Code:
 Sending XML          :

 <document> <docRequestID>2010-10-22-11.57.22.903813</docRequestID><docStylesh

 eet>Thunderhead</docStylesheet><requestType>claim</requestType><level0Object>

  <objectType>transaction</objectType><objectID>900</objectID><objectSeq>1</ob
 Line break has effected tag
 jectSeq><level1Object> <objectType>lifelite</objectType><objectID>901</object

As you can see, the line break has effected this tag half is on the line below and this XML cannot be used like that. I would like for it to remove the linespaces at the start of the line, looking like : </object>

Thank you so much for your help!!

Last edited by hugh86; 10-26-2010 at 12:17 PM.. Reason: code tags wrong
# 4  
Old 10-26-2010
Code:
nawk '/Sending XML/,/Message sending ended/' ${infile} | nawk 'NF && /^ *</{printf("%s%c", $0, (/ *<.*>$/)?ORS:"");next}NF' > ${outfile}

This User Gave Thanks to vgersh99 For This Post:
# 5  
Old 10-26-2010
Thank you so much, i will try it shortly! I dont have to install nawk do i? is it standard tech?



---------- Post updated at 07:43 PM ---------- Previous update was at 05:11 PM ----------




I tried running that code you gave me and i get an error saying that it cannot revert the file... Unexpected error: Invalid UTF-8 sequence in input.

Im having a look online to see what that means. If you have any ideas let me kno.

cheers
# 6  
Old 10-26-2010
Quote:
Originally Posted by hugh86
Thank you so much, i will try it shortly! I dont have to install nawk do i? is it standard tech?



---------- Post updated at 07:43 PM ---------- Previous update was at 05:11 PM ----------




I tried running that code you gave me and i get an error saying that it cannot revert the file... Unexpected error: Invalid UTF-8 sequence in input.

Im having a look online to see what that means. If you have any ideas let me kno.

cheers
What OS are you on?
If you have nawk/gawk - use either one.
# 7  
Old 10-26-2010
im on ubuntu... i tried using nawk and gawk.. didnt work!



---------- Post updated at 09:01 PM ---------- Previous update was at 08:48 PM ----------




i have tried using nawk and gawk and no luck. i am running ubuntu and the awk is 3.1.6
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

2. Shell Programming and Scripting

sed - extract text from xml file

hi, please help, i have an xml file, e.g: ... <tag> test text asdas="${abc}" xvxvbs:asdas${222}sdad asasa="${aa_bb_22}" </tag> ... i want to extract all "${...}", e.g: ${abc} ${222} ${aa_bb_22} thank you. (2 Replies)
Discussion started by: gioni
2 Replies

3. Shell Programming and Scripting

Use grep sed or awk to extract string from log file and put into CSV

I'd like to copy strings from a log file and put them into a CSV. The strings could be on different line numbers, depending on size of log. Example Log File: File = foo.bat Date = 11/11/11 User = Foo Bar Size = 1024 ... CSV should look like: "foo.bat","11/11/11","Foo Bar","1024" (7 Replies)
Discussion started by: chipperuga
7 Replies

4. Shell Programming and Scripting

Extract XML message from a log file using awk

Dear all I have a log file and the content like this file name: temp.log <?xml version="1.0" encoding="cp850"?> <!DOCTYPE aaabbb SYSTEM '/dtdpath'> <aaabbb> <tranDtl> <msgId>000001</msgId> </tranDtl> ..... </aaabbb> ... ... (1 Reply)
Discussion started by: on9west
1 Replies

5. Shell Programming and Scripting

sed extract from xml

I have an xml file that generally looks like this: "<row><dnorpattern>02788920</dnorpattern><description/></row><row><dnorpattern>\+ 44146322XXXX</dnorpattern><description/></row><row><dnorpattern>40XXX</dnorpattern><description/></row><row><dnorpattern>11</dn... (4 Replies)
Discussion started by: garboon
4 Replies

6. Shell Programming and Scripting

reformatting xml file, sed or awk I think (possibly perl)

I have some xml files that cannot be read using a standard parser, or I am using the wrong parser. The issues seems to be spaces in some of the tags. Here is a sample,<UgUn 2 > <Un> -0.426753 </Un> </UgUn>The parser isn't able to find the number 2, so that information is lost, etc. It seems... (16 Replies)
Discussion started by: LMHmedchem
16 Replies

7. UNIX for Dummies Questions & Answers

Extract a specific number from an XML file based on the start and end tags

Hello People, I have the following contents in an XML file ........... ........... .......... ........... <Details = "Sample Details"> <Name>Bob</Name> <Age>34</Age> <Address>CA</Address> <ContactNumber>1234</ContactNumber> </Details> ........... ............. .............. (4 Replies)
Discussion started by: sushant172
4 Replies

8. Shell Programming and Scripting

SED extract XML value

I have the following string: <min-pool-size>2</min-pool-size> When I pipe the string into the following code I am expcting for it to return just the value "2", but its just reurning the whole string. Why?? sed -n '/<min-pool-size>/,/<\/min-pool-size>/p' Outputting:... (13 Replies)
Discussion started by: ArterialTool
13 Replies

9. UNIX for Dummies Questions & Answers

Using sed to extract a substring at end of line

This is the line that I am using: sed 's/^*\({3}*$\)/\1 /' <test.txt >results.txt and suppose that test.txt contains the following lines: http://www.example.com/200904/AUS.txt http://www.example.com/200903/_RUS.txt http://www.example.com/200902/.FRA.txt What I expected to see in results.txt... (6 Replies)
Discussion started by: figaro
6 Replies

10. Shell Programming and Scripting

sed or awk to extract data from Xml file

Hi, I want to get data from Xml file by using sed or awk command. I want to get the following result : mon titre 1;Createur1;Dossier1 mon titre 1;Createur1;Dossier1 and save it in cvs file (fichier.cvs). FROM this Xml file (test.xml): <playlist version="1"> <trackList> <track>... (1 Reply)
Discussion started by: yeclota
1 Replies
Login or Register to Ask a Question