Help using SED to comment XML elements


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help using SED to comment XML elements
# 1  
Old 06-02-2009
Help using SED to comment XML elements

I'm trying to write a script to help automate some VERY tedious manual tasks.

I have groups of fairly large XML files (~3mb+) that I need to edit.

I need to look through the files and parse the XML looking for a certain flag contained in a field. If I find this flag (an integer value) I need to insert XML comments around the entire element (in the <!-- --> style) so that another XML parser will skip over them. After doing that, I later need to remove all the comments from the file (which I think I have).

I found this thread:
https://www.unix.com/shell-programmin...ile-lines.html

Which explains how to insert the comments using SED based on finding the element tag in the file. This is helpful, but I only need to comment elements that contain the "flag" (an int value). Unfortunately, the elements have various names, and aren't in any sort of order.

I was thinking about using PHP (what I'm most familiar with) or maybe Ruby to help parse through the XML to find matching flags, which I'm comfortable with. My problem is how to use/invoke SED once I find a element that needs commenting, and doing so.

This might be something easy, but at the moment I'm having a hard time figuring out which direction to go in. Does anyone have any guidance they'd share with me? Does it sound like I'm heading in the right direction, or am I totally off? Am I overlooking some obvious answer?

I'd much appreciate any help.

- Jeremy
# 2  
Old 06-03-2009
If you post your input and the expected output , then surely you can expect more responses .
# 3  
Old 06-03-2009
Quote:
Originally Posted by J-Hon
I was thinking about using PHP (what I'm most familiar with) - Jeremy
if you know PHP, all the better. There are XML parsers for PHP you can use. just google for PHP XML parser. Or if you want to do it by hand, use the normal fopen(), fclose(), fgets() to read files, the suite of preg_* functions (or str* functions) for string manipulations etc..... See the PHP documentation site for examples..
# 4  
Old 06-03-2009
My apologies if my original example wasn't very clear. Here's some sample data pulled from the XML files:

Code:
<analysisMessages.js>
        <Source_Eng_Old />
        <Source_Eng_New>Time Graph Base Properties - Analyze activities relative to time.</Source_Eng_New>
        <Source_Trans_Old />
        <Target_Trans_New>Time Graph Base Properties - Analyze activities relative to time.</Target_Trans_New>
        <NumOfKeys>1</NumOfKeys>
        <Translation_Type>1</Translation_Type>
        <ResourceId>timeGraph€€€common_props_title</ResourceId>
</analysisMessages.js>
<ganttChartMessages.js>
        <Source_Eng_Old />
        <Source_Eng_New>Show</Source_Eng_New>
        <Source_Trans_Old />
        <Target_Trans_New>Show</Target_Trans_New>
        <NumOfKeys>1</NumOfKeys>
        <Translation_Type>1</Translation_Type>
        <ResourceId>drawGanttChartButtonCaption</ResourceId>
</ganttChartMessages.js>
<peMessages.js>
        <Source_Eng_Old />
        <Source_Eng_New>Rerun the selected job</Source_Eng_New>
        <Source_Trans_Old />
        <Target_Trans_New>Rerun the selected job</Target_Trans_New>
        <NumOfKeys>1</NumOfKeys>
        <Translation_Type>9</Translation_Type>
        <ResourceId>jobGrid€€€tooltip€€€RERUN</ResourceId>
</peMessages.js>


I'm looking specifically at the <Translation_Type> attribute, and deciding whether to comment the entire element or not based upon what that integer is.

In the sample above, I'd want the last element to be commented out in this format:

Code:
<!--<peMessages.js>
        <Source_Eng_Old />
        <Source_Eng_New>Rerun the selected job</Source_Eng_New>
        <Source_Trans_Old />
        <Target_Trans_New>Rerun the selected job</Target_Trans_New>
        <NumOfKeys>1</NumOfKeys>
        <Translation_Type>9</Translation_Type>
        <ResourceId>jobGrid€€€tooltip€€€RERUN</ResourceId>
 </peMessages.js>-->

I'm familiar with PHP, but I've only done a little XML parsing with it before. I was under the impression that while I could parse through the XML and easily match/do logic on the Translation_Type values, I wouldn't be able to drop in those comments before and after the element? I'll certainly go back and look through my notes, but I was remembering using a multi-dimensional array type data structure to access the various elements, not actually editing the raw line-by-line file itself (that part being hidden via the PHP class, obviously you can add/drop/change the XML attributes).

Last edited by J-Hon; 06-03-2009 at 10:17 AM..
# 5  
Old 06-03-2009
Sorry but sed or awk are not appropriate tools for massaging XML data (documents) except for very simple files. Instead you need to transform the document using a XSLT stylesheet processor.

BTW Translation_Type is an element and not an attribute. Assuming that your document is valid and well-formed, here is a stylesheet which will do the transformation that you want.

The only change I made was to add a top-level node called "root" for well-formedness. You should change "root" to the name of your top-level element.

Here is the stylesheet:
Code:
<?xml version="1.0" encoding="UTF8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:template match="node()">
        <xsl:if test="./Translation_Type != '9'" >
            <xsl:copy-of select="."/>
        </xsl:if>
        <xsl:if test="./Translation_Type = '9'" >
           <xsl:text disable-output-escaping="yes">&lt;!-- </xsl:text>
           <xsl:copy-of select="." />
           <xsl:text disable-output-escaping="yes"> --></xsl:text>
        </xsl:if>
        <xsl:text>&#xa;</xsl:text>
    </xsl:template>

    <xsl:template match="root">
       <xsl:element name="{ name() }" >
           <xsl:text>&#xa;</xsl:text>
           <xsl:apply-templates select="*"/>
       </xsl:element>
    </xsl:template>

</xsl:stylesheet>

Note you may have to change the encoding to suit your data set.

Assuming your document is called test.xml and your stylesheet is called test.xsl, invoking xsltproc test.xsl test.xsl gives the following output:
Code:
<?xml version="1.0"?>
<root>
<analysisMessages.js>
        <Source_Eng_Old/>
        <Source_Eng_New>Time Graph Base Properties - Analyze activities relative to time.</Source_Eng_New>
        <Source_Trans_Old/>
        <Target_Trans_New>Time Graph Base Properties - Analyze activities relative to time.</Target_Trans_New>
        <NumOfKeys>1</NumOfKeys>
        <Translation_Type>1</Translation_Type>
        <ResourceId>timeGraph???common_props_title</ResourceId>
</analysisMessages.js>
<ganttChartMessages.js>
        <Source_Eng_Old/>
        <Source_Eng_New>Show</Source_Eng_New>
        <Source_Trans_Old/>
        <Target_Trans_New>Show</Target_Trans_New>
        <NumOfKeys>1</NumOfKeys>
        <Translation_Type>1</Translation_Type>
        <ResourceId>drawGanttChartButtonCaption</ResourceId>
</ganttChartMessages.js>
<!-- <peMessages.js>
        <Source_Eng_Old/>
        <Source_Eng_New>Rerun the selected job</Source_Eng_New>
        <Source_Trans_Old/>
        <Target_Trans_New>Rerun the selected job</Target_Trans_New>
        <NumOfKeys>1</NumOfKeys>
        <Translation_Type>9</Translation_Type>
        <ResourceId>jobGrid???tooltip???RERUN</ResourceId>
</peMessages.js> -->
</root>

Note there are some question marks in the output elements. That is because I did not bother setting up the correct locale and code-set on my system to suit your sample data.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How can we extract specific elements from XML?

Hi, I have a requirement to extract specific element value dynamically from XML message. Here is the sample message: <File> <List> <main> <dir>doc/store834/archive</dir> <count>5</count> </main> <main> <dir>doc/store834/extract</dir> <count>6</count> </main> <main> ... (3 Replies)
Discussion started by: renukeswar
3 Replies

2. AIX

Comment out crontab using sed command

I am trying to comment out the crontab entries using sed. I want to comment it out for a particular environment say '/mypath/scripts/'. Using the full path as pattern, it is working. but using variable it is not working. i have tried double quotes too. but no luck! $ crontab -l ... (3 Replies)
Discussion started by: SKhan
3 Replies

3. Shell Programming and Scripting

Extract only required elements from XML.

Hi , I have an XML like this. <Request> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"> <version>v44</version><messageId>7247308192</messageId><timeToLive>72000000000</timeToLive> </Request>. I want to extract on version and messageId. As in my output... (13 Replies)
Discussion started by: chetan.c
13 Replies

4. Shell Programming and Scripting

Replacing part of XML code inside comment tags

Hello! I'd like to modify custom values in a XML config file between comment tags using bash script. <feature> <keyboardshortcut>C-m</keyboardshortcut> <option1>disabled</option2> <option2>enabled</option2> </feature> <!-- bash script features START --> <feature> ... (2 Replies)
Discussion started by: prism1
2 Replies

5. Shell Programming and Scripting

Parsing XML elements and store them in array

Hi Friends Im so confused with using 'for' loop in ksh. I have a xml like the following: <serviceProvider> <serviceProviderID>1</serviceProviderID> <serviceProviderName>Balesh</serviceProviderName> <serviceFeeAmount>30.00</serviceFeeAmount> </serviceProvider>... (2 Replies)
Discussion started by: balesh
2 Replies

6. Shell Programming and Scripting

Script to put block comment after finding regex in xml file

hi, i need my bash script to find regex in xml file.. and comment 2 lines before and after the line that contains regex.. can't use # needs to be <!-- at the beginning and --> and the end of the comment. so eg.. first block <filter> <filter-name>MyRegEx</filter-name> ... (11 Replies)
Discussion started by: Poki
11 Replies

7. Shell Programming and Scripting

Comment a line with SED

I have around 25 hosts and each hosts has 4 instance of jboss and 4 different ip attached to it . I need to make some changes to the startup scripts. Any tips appreciated. I have total of 100 instances which bind to 100 different ip address based on instance name. For example File1 ... (1 Reply)
Discussion started by: gubbu
1 Replies

8. Shell Programming and Scripting

Using sed to comment out line in /etc/vfstab

I am running a script remotely to do the following 1. Kill all processes by a user 2. Uninstall certain packages 3. FTP over a new file 4. Kill a ldap process that is not allowing my /devdsk/c0t0d0s7 slice to un-mount 5. Unmount /h 6. comment out the slice in vfstab 7. newfs the... (9 Replies)
Discussion started by: deaconf19
9 Replies

9. Shell Programming and Scripting

get rid of xml comment by grep or sed

Hi, I would like to get rid of all comment in an xml file by grep or sed command: The content seem like this: <!-- ab cd ef gh ij kl --> Anyone can help? Thanks and Regards (3 Replies)
Discussion started by: RonLii
3 Replies

10. Shell Programming and Scripting

Read elements of a xml file??????

Hi, What is a good way to read elements of an xml file? i did try xmllint it doesnt provide a function to output values of a tree. In the below example when i specify from Family2 I need the name of the father then the output should be DAVE. Appreciate any help provided in this regards. Many... (6 Replies)
Discussion started by: ahmedwaseem2000
6 Replies
Login or Register to Ask a Question