Using SED/AWK to extract xml at end of file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using SED/AWK to extract xml at end of file
# 8  
Old 10-27-2010
Can you check if there are any non-printable characters in the XML portion ? Especially around the line break that has affected the xml tag.

tyler_durden
# 9  
Old 10-27-2010
Quote:
Originally Posted by durden_tyler
Can you check if there are any non-printable characters in the XML portion ? Especially around the line break that has affected the xml tag.

tyler_durden
there isnt i dont think... just a whitespace that seperates the tags in some instances. Not all.

Cheers
# 10  
Old 10-27-2010
What's the output of this command ?

Code:
sed -n '/Sending XML/,/Message sending ended/p' your_file | od -bc

tyler_durden
# 11  
Old 10-27-2010
Hello, i added your code into my script, im not sure what file you were referring to so i have attached what i used.

Code:
#!/bin/bash
echo "getXML"

echo -n "Enter the source file name WITH extension : "
read infile 
echo "Processing... : " 
sleep 1 
echo -n "Enter output file name (extenstion not applicable) : "
read outfile
sed -n '/Sending XML/,/Message sending ended/p' $outfile | od -bc
echo "Processing XML... : "
sleep 1
echo "Success..Data should be in '$outfile' if compiled correctly"


The outcome...
Unexpected error: Incomplete multibyte sequence in input when i open the outfile created.

On the terminal i got loads of different numbers fly accross the screen. Im not sure if they are even related to the infile i have.. attached below...

Code:
e   l   d   I   D   >   <   f   i   e   l   d   N   a   m
0031640 145 076 144 141 164 145 117 015 012 040 146 102 151 162 164 150
          e   >   d   a   t   e   O  \r  \n       f   B   i   r   t   h
0031660 074 057 146 151 145 154 144 116 141 155 145 076 074 146 151 145
          <   /   f   i   e   l   d   N   a   m   e   >   <   f   i   e
0031700 154 144 126 141 154 165 145 057 076 074 057 157 142 152 145 143
          l   d   V   a   l   u   e   /   >   <   /   o   b   j   e   c
0031720 164 106 151 145 154 144 076 074 157 142 152 145 143 164 106 151
          t   F   i   e   l   d   >   <   o   b   j   e   c   t   F   i
0031740 145 154 144 076 040 074 146 151 145 154 144 111 104 076 061 065
          e   l   d   >       <   f   i   e   l   d   I   D   >   1   5
0031760 061 067 074 057 146 151 145 015 012 040 154 144 111 104 076 074
          1   7   <   /   f   i   e  \r  \n       l   d   I   D   >   <
0032000 146 151 145 154 144 116 141 155 145 076 154 151 146 145 164 151
          f   i   e   l   d   N   a   m   e   >   l   i   f   e   t   i
0032020 155 145 123 154 141 101 155 157 165 156 164 074 057 146 151 145
          m   e   S   l   a   A   m   o   u   n   t   <   /   f   i   e
0032040 154 144 116 141 155 145 076 074 146 151 145 154 144 126 141 154
          l   d   N   a   m   e   >   <   f   i   e   l   d   V   a   l
0032060 165 145 076 061 070 060 060 060 060 060 074 057 146 151 145 154

Thanks,

H

Last edited by hugh86; 10-27-2010 at 12:13 PM.. Reason: code tags
# 12  
Old 10-27-2010
Quote:
Originally Posted by hugh86
Hello, i added your code into my script, im not sure what file you were referring to so i have attached what i used.
What I wanted was you executing my command on your command prompt (the Linux dollar-prompt).

The file I was refering to was the source file. That is, the one that is being read in your Bash script.

Quote:
Code:
#!/bin/bash
echo "getXML"

echo -n "Enter the source file name WITH extension : "
read infile 
...

Since you are going to test your Bash script, I am sure you know the name of the source file that you'll enter at the prompt above. That file name will be assigned to the variable "infile" in your script.

Now, let's say the source file name you have in mind is "abc.txt".

This file has some XML stuff embedded in it. My hunch is that there are Unicode characters in that XML stuff.

Try this on your Linux dollar-prompt -

Code:
perl -lne 'binmode(STDOUT, ":utf8"); while(/(.)/g){print $.,"\t",$1,"\t",ord($1) if ord($1) > 255}' abc.txt

Replace the string "abc.txt" by the actual name of your source file name.

tyler_durden
# 13  
Old 10-27-2010
i tried that and replaced the file with my source file, in my case it was trace.txt i am not sure where the output file is though? I checked trace.txt and it was the same doc, do i not need to specify where the output is?

sorry if im being slow, i only started learning three weeks ago
# 14  
Old 10-27-2010
Quote:
Originally Posted by hugh86
i tried that and replaced the file with my source file, in my case it was trace.txt i am not sure where the output file is though? I checked trace.txt and it was the same doc, do i not need to specify where the output is?
...
No, you do not need to specify the output file name. The output will be displayed right after your command.

(A) If you have Ubuntu Gnome, then open up "Gnome Terminal" or "Terminal".

(B) If you have Ubuntu KDE (Kubuntu?), then open up "Konsole".

You'll see a dollar prompt in the terminal window.

Type in the following command at the prompt, in a single line.

Code:
$ perl -lne 'binmode(STDOUT, ":utf8"); while(/(.)/g){print $.,"\t",$1,"\t",ord($1) if ord($1) > 255}' trace.txt

Don't type that $ symbol. That's just for you to know that the stuff from "perl -lne .... " has to be typed at the $ prompt.

You could, alternatively, copy+paste the perl command from this webpage.

When you press the Enter or Return key after "trace.txt" the output will be displayed right there on the terminal window - right below your command.

Copy your command and the output from the terminal window and post them over here.

(Put that Bash script aside for the time being. You'd want to investigate the contents of the trace.txt source file first.)

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

2. Shell Programming and Scripting

sed - extract text from xml file

hi, please help, i have an xml file, e.g: ... <tag> test text asdas="${abc}" xvxvbs:asdas${222}sdad asasa="${aa_bb_22}" </tag> ... i want to extract all "${...}", e.g: ${abc} ${222} ${aa_bb_22} thank you. (2 Replies)
Discussion started by: gioni
2 Replies

3. Shell Programming and Scripting

Use grep sed or awk to extract string from log file and put into CSV

I'd like to copy strings from a log file and put them into a CSV. The strings could be on different line numbers, depending on size of log. Example Log File: File = foo.bat Date = 11/11/11 User = Foo Bar Size = 1024 ... CSV should look like: "foo.bat","11/11/11","Foo Bar","1024" (7 Replies)
Discussion started by: chipperuga
7 Replies

4. Shell Programming and Scripting

Extract XML message from a log file using awk

Dear all I have a log file and the content like this file name: temp.log <?xml version="1.0" encoding="cp850"?> <!DOCTYPE aaabbb SYSTEM '/dtdpath'> <aaabbb> <tranDtl> <msgId>000001</msgId> </tranDtl> ..... </aaabbb> ... ... (1 Reply)
Discussion started by: on9west
1 Replies

5. Shell Programming and Scripting

sed extract from xml

I have an xml file that generally looks like this: "<row><dnorpattern>02788920</dnorpattern><description/></row><row><dnorpattern>\+ 44146322XXXX</dnorpattern><description/></row><row><dnorpattern>40XXX</dnorpattern><description/></row><row><dnorpattern>11</dn... (4 Replies)
Discussion started by: garboon
4 Replies

6. Shell Programming and Scripting

reformatting xml file, sed or awk I think (possibly perl)

I have some xml files that cannot be read using a standard parser, or I am using the wrong parser. The issues seems to be spaces in some of the tags. Here is a sample,<UgUn 2 > <Un> -0.426753 </Un> </UgUn>The parser isn't able to find the number 2, so that information is lost, etc. It seems... (16 Replies)
Discussion started by: LMHmedchem
16 Replies

7. UNIX for Dummies Questions & Answers

Extract a specific number from an XML file based on the start and end tags

Hello People, I have the following contents in an XML file ........... ........... .......... ........... <Details = "Sample Details"> <Name>Bob</Name> <Age>34</Age> <Address>CA</Address> <ContactNumber>1234</ContactNumber> </Details> ........... ............. .............. (4 Replies)
Discussion started by: sushant172
4 Replies

8. Shell Programming and Scripting

SED extract XML value

I have the following string: <min-pool-size>2</min-pool-size> When I pipe the string into the following code I am expcting for it to return just the value "2", but its just reurning the whole string. Why?? sed -n '/<min-pool-size>/,/<\/min-pool-size>/p' Outputting:... (13 Replies)
Discussion started by: ArterialTool
13 Replies

9. UNIX for Dummies Questions & Answers

Using sed to extract a substring at end of line

This is the line that I am using: sed 's/^*\({3}*$\)/\1 /' <test.txt >results.txt and suppose that test.txt contains the following lines: http://www.example.com/200904/AUS.txt http://www.example.com/200903/_RUS.txt http://www.example.com/200902/.FRA.txt What I expected to see in results.txt... (6 Replies)
Discussion started by: figaro
6 Replies

10. Shell Programming and Scripting

sed or awk to extract data from Xml file

Hi, I want to get data from Xml file by using sed or awk command. I want to get the following result : mon titre 1;Createur1;Dossier1 mon titre 1;Createur1;Dossier1 and save it in cvs file (fichier.cvs). FROM this Xml file (test.xml): <playlist version="1"> <trackList> <track>... (1 Reply)
Discussion started by: yeclota
1 Replies
Login or Register to Ask a Question