Removing spaces between XML tags<XX XX> -> <XXXX>


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Removing spaces between XML tags<XX XX> -> <XXXX>
# 1  
Old 04-09-2008
Removing spaces between XML tags<XX XX> -> <XXXX>

hey guys, i have an XML like this:

<documents>
<document>
<Object ID>100114699999</Object ID>
<Object Create Date Time>2008-04-07T00:00:00</Object Create Date Time>
</document>
<documents>

I need all my tags within the XML to not include any spaces. i.e. everything between <t a g> in the whole doc should be space free..

already my file is being altered with this command:

for line in `cat $fileName |nawk '{gsub(/></,">\n<");print}'` (new line between all tags..)

can it be done?
# 2  
Old 04-09-2008
Hammer & Screwdriver Tricky one indeed

To take out spaces between the XML tags. What platform are you on and how far have you gotten?
# 3  
Old 04-09-2008
This flies in the face of any proper use of XML but I guess you already knew that.

Are you sure you don't have any attributes anywhere? You can't squeeze out the space in something like <img src="fnord.gif"> without wrecking it.

Code:
perl -pe 's/<\s+/</g; s/\s+>/>\n/g; 1 while s/(<[^\s<>]+)\s+/$1/' filename

This adds the newlines between tags as well, in a slightly different fashion.

Neither perl nor nawk need to be spoon-fed with a cat -- just specify the file name as an argument.
# 4  
Old 04-10-2008
Hey guys! thanks for the replies!

era: Thanks for the solution! I won't wreck anything because all that the following process does is extract 17 arguments where it knows the tags..The tags with spaces are useless to me but are causing an erorr...I can't use an XML parser because I don't have the correct perl package. The solution you proposed works great but i am having a small problem now:

Here's part of my script:

Code:
for line in `perl -pe 's/<\s+/</g; s/\s+>/>\n/g; 1 while s/(<[^\s<>]+)\s+/$1/' $readyDir/$fileName`
    do

case "$line" in
    "</document>")

          if [ $currentNdocs -gt $maxDocs ]
               then
                 strFile="<DMS><documents>"$strFile"</documents></DMS>";
                 cd ${CLARIFY_DIR}/rulemanager
                 ./cbbatch -f ../jobs/dms/DMS_Integration.cbs -r ParseXML -as ${strFile} >> ${CLARIFY_DIR}/jobs/dms/OUT
                 strFile=""
                 currentNdocs=0
               else
                 strFile=$strFile$line;
               fi
           ;;

        *)
               strFile=$strFile$line;

Basically when the script detects the end of a an xml document. It sends whatever it has recovered from the strFile concatenated with 'line' (i.e. everything before the the </document> including it). My problem arises when i have spaces between the text. The "line" function, when it finds a space it is concatinating it into strFile, the problem is it eliminates the space. (strFile = $strFile$line #the space is lost). Now, after solving the <ta g> spaces, I need a way the line will ignore the <tag>My Coments</tag> mid space and accept it so the overall strFile contains that space. The batch that receives the document cannot handle spaces between tag and tag but it CAN handle spaces within the tags...

Any suggestions as to what I can do? A replacement for line?

A typical xml doc i receive is:

Code:
<?xml  version = "1.0" encoding = "UTF-8"?><DMS xmlns:xsi="http://www.w3.com/XSD/DMSMessage.xsd"><documents>
<document><DMSObjectID>10011468999</DMSObjectID><DMSObjectCreateDateTime>2008-04-08T18:00:00</DMSObjectCreateDateTime>
<DMSObjectFileType>pdf</DMSObjectFileType><DMSObjectLink>http://aa.tie.ch.n1=10011468734&amp</DMSObjectLink><DMSObjectType>Contract</DMSObjectType>
<DMSObjectSubType>Audio Contract</DMSObjectSubType><ClarifyCustomerID>0703203</ClarifyCustomerID><ClarifyActionCode>0</ClarifyActionCode>
<WebOrderID>99933</WebOrderID><ClarifyPartRequestID/><POAPhoneNumber/><POAPartialPorting/><POAPortingWishDate></POAPortingWishDate><SignatureDate/>
<DMSObjectSubject></DMSObjectSubject><DMSObjectProductLine></DMSObjectProductLine><DMSObjectLanguage>de</DMSObjectLanguage>
<DMSAdditionalComment>My comments</DMSAdditionalComment></document></documents></DMS>

The line function, when it arrives at the tag DMSObjectSubType the Audio Contract turns into AudioContract in the strFile.

Another problem is the script should also be prepared to receive the XML's with spaces between the tags (</document> </documents>) and also with line terminators (</document>
</documents>)

any suggestions will help!!

Last edited by sharoff; 04-10-2008 at 05:59 AM..
# 5  
Old 04-10-2008
basically the XML shown is the ideal format. I could pass that directly onto my parser and it would work provided the spaces between the <> where removed (which was done with the perl script era provided). Problem comes as i mentioned before, when i have spaces or \n between tags..
# 6  
Old 04-10-2008
Could you rephrase what the problem is, and please edit your posting to add markup so that the code fits on one screen -- it's quite unreadable as it is now.

You are not quoting your arguments properly. I'm not sure if this will help but try adding double quotes around every variable reference. Like replace $strFile with "$strFile" throughout, and ditto for other variables.

Your case statement appears to have invalid syntax. That should be something like "case $line in " rather than just "case".
# 7  
Old 04-10-2008
Quote:
Originally Posted by era
Could you rephrase what the problem is, ..
Hey era,

My problem is my batch file needs the format as follows:

1) No spaces within XML tags (your perl script did the job)
2) No spaces between an XML tag and another (</DMSObject> <DMSType> is not accepted yet </DMSObject><DMSType> is.)
3) Spaces allowed within XML inital and final tag (<DMSObject>Hello there</DMSObject>)

Right now 1 and 2 are being done, but 3 keeps on failing, and if i solve 3 then 1 and 2 are not solved Smilie

The XML file that i receive has multiple <document> 's within it, the batch process will process all of them provided they have no spaces between the tags.. but if the have tabs or spaces or \n's it fails..

Last edited by sharoff; 04-10-2008 at 06:16 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing unwanted tags from xml file

I have a XML file given as below: "<ProductUOMAlternativeDetails> <removetag> <UOMCode>EA</UOMCode> <numeratorForConversionToBaseUOM>1</numeratorForConversionToBaseUOM> <denominatorForConversionToBaseUOM>1</denominatorForConversionToBaseUOM> <length>0.59</length> <width>0.96</width> ... (3 Replies)
Discussion started by: vikingh
3 Replies

2. UNIX for Dummies Questions & Answers

Script or SED command for [[Xxxx Xxxx Xxxx]] to [[Xxxx xxx xx]]

Hi, To comply with a new naming convention on a mediawiki site we have to run a SED or other PERL command to change all instances of ] or ] or ] to ] Can someone please explain how to do this... It has to be done on a mysql dump, so if there is a way to do this in mysql even... (2 Replies)
Discussion started by: lawstudent
2 Replies

3. Shell Programming and Scripting

Removing blank spaces, tab spaces from file

Hello All, I am trying to remove all tabspaces and all blankspaces from my file using sed & awk, but not getting proper code. Please help me out. My file is like this (<b> means one blank space, <t> means one tab space)- $ cat file NARESH<b><b><b>KUMAR<t><t>PRADHAN... (3 Replies)
Discussion started by: NARESH1302
3 Replies

4. Shell Programming and Scripting

Help in removing xml tags

Hi, I have a input xml file like this <postalAddress:>379 PROSPECT ST </postalAddress:> <street:>STE B </street:> <l:>TORRINGTON </l:> <st:>CT</st:> <postalCode:>067905238</postalCode:>... (5 Replies)
Discussion started by: pintoo
5 Replies

5. UNIX for Dummies Questions & Answers

Removing spaces...

Hey, I'm using the command from this thread https://www.unix.com/unix-dummies-questions-answers/590-converting-list-into-line.html to convert vertical lines to horzontal lines. But I need to remove the spaces that is created. Unfortunately I can't figure out where the space is in the code.. I... (2 Replies)
Discussion started by: lost
2 Replies

6. Shell Programming and Scripting

removing spaces

hey.. i had a problem with the unix command when i want to remove the white spaces in a string..i guess i cud do it with a sed command but i get an error when i give space in the square brackets.. string="nh hjh llk" p=`echo $string | sed 's/ //g'` i donno how to give space charater and... (2 Replies)
Discussion started by: sahithi_khushi
2 Replies

7. Shell Programming and Scripting

Removing spaces at particular position

I have a file with delimiter ~ ABC~12~43~TR ~890~poi~YU ~56~65 What I want is to remove spaces from column 4,7 and other columns as it is So, the final file becomes ABC~12~43~TR~890~poi~YU~56~65 (7 Replies)
Discussion started by: superprogrammer
7 Replies

8. Shell Programming and Scripting

removing spaces after sperator

Hi friends i have problem 6000000001| CDC049| 109| CDC| 02/02/2006| Auto| New Add| 02/03/2006 6000000002| CDC033| 109| CDC| 02/02/2006| Auto| New Add| 02/03/2006 6000000003| CDC037| 109| CDC| 02/02/2006| Auto| New Add| 02/03/2006 6000000004| CDC031| ... (6 Replies)
Discussion started by: vishnu_vaka
6 Replies

9. UNIX for Dummies Questions & Answers

rm: Unable to remove directory xxxx/xxxx: File exists

Hi Everyone, I am having problem to delete an "empty" folder ( messages attached ). It displays "total 12" when i typed "ls -lart" on the fnxroot44 folder, but i can't view any file. Is there any way to view those unseen files ? I don't know why option "a" is not working this time. Would... (1 Reply)
Discussion started by: deejay
1 Replies

10. UNIX for Dummies Questions & Answers

Removing leading and trailing spaces of data between the tags in xml.

I am having xml document as below. <transactionid> 00 </transactionid> <tracknumber> 0 </tracknumber> <key> N/A </key> But the data contains leading and trailing spaces between the tags. Please let me know how can i remove these leading and trailing spaces between the tags.... (2 Replies)
Discussion started by: jhmr7
2 Replies
Login or Register to Ask a Question