Delete a record in a xml file using shell scripting

06-16-2012

Registered User

2, 0

Join Date: Jun 2012

Last Activity: 6 July 2012, 7:02 AM EDT

Posts: 2

Thanks Given: 1

Thanked 0 Times in 0 Posts

Delete a record in a xml file using shell scripting

find pattern, delete line with pattern and 3 lines above and 8 lines below the pattern. The pattern is "isup". The entire record with starting tag <record> and ending tag </record> containing the pattern is to be deleted and the rest to be retained.

Code:

<record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>16068</linkid>
    <si>sccp</si>
    <mtp>
      <opc>3020</opc>
      <dpc>8034</dpc>
    </mtp>
    <sccp>
    </sccp>
    <map>
      <opcode>36</opcode>
    </map>
    <msucount>1</msucount>
    <octcount>83</octcount>
  </record>

  <record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>37</linkid>
    <si>isup</si>
    <mtp>
      <opc>8469</opc>
      <dpc>10336</dpc>
    </mtp>
    <msucount>168</msucount>
    <octcount>3069</octcount>
  </record>

<record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>46</linkid>
    <si>sccp</si>
    <mtp>
      <opc>287</opc>
      <dpc>24</dpc>
    </mtp>
    <sccp>
      <cgpadigits>966540142007</cgpadigits>
      <cdpadigits>919434099997</cdpadigits>
    </sccp>
    <msucount>1</msucount>
    <octcount>53</octcount>
  </record>

Last edited by pludi; 06-16-2012 at 06:07 PM.. Reason: code tags

sdesstp

View Public Profile for sdesstp

Find all posts by sdesstp

06-16-2012

Registered User

5,521, 335

Join Date: Dec 2008

Last Activity: 28 March 2014, 8:35 AM EDT

Location: Vienna, Austria, Earth

Posts: 5,521

Thanks Given: 38

Thanked 335 Times in 308 Posts

First of all, what have you tried, and where are you stuck?
Second, that's XML, so it's probably not guaranteed to have the tag containing isup at the same line position every time, so a simple "remove 3 lines before, and 8 lines after" might not yield the desired result every time.

pludi

View Public Profile for pludi

Find all posts by pludi

06-17-2012

Banned

68, 9

Join Date: May 2012

Last Activity: 7 August 2015, 4:00 PM EDT

Posts: 68

Thanks Given: 7

Thanked 9 Times in 9 Posts

I would suggest the following approach:

Code:

sed 's:</record>:&*:g' file1 | awk 'index($0,"isup")==0{print $0}' RS='*'

Would not work with nested tags and such.

You may want to surround isup with > and < if you do not want to match it with tag contents or as a substring of some other value

This User Gave Thanks to jawsnnn For This Post:

jawsnnn

View Public Profile for jawsnnn

Find all posts by jawsnnn

06-21-2012

Registered User

2, 0

Join Date: Jun 2012

Last Activity: 6 July 2012, 7:02 AM EDT

Posts: 2

Thanks Given: 1

Thanked 0 Times in 0 Posts

The script is working fine but when the first record itself contains isup , the initial tags are also deleted. How to overcome this and also the script should work for a batch of xml files in a folder
Kindly help. Thanks in advance

---------- Post updated at 11:30 AM ---------- Previous update was at 11:02 AM ----------

The script is running fine but when the isup pattern appears in the first record itself the initial tags also getting deleted . Also the script has to run for a batch of xml files . Kindly help
The initial tags are like below

Code:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ecapreport SYSTEM "ecapreport.dtd">
<ecapreport>
  <stp>bsnlstpekm</stp>
  <collector>ekmecap1A</collector>
  <startdate>07082011</startdate>
  <starttime>235500</starttime>
  <enddate>08082011</enddate>
  <endtime>000000</endtime>

  <record>
    <signallingstandard>ITU-I</signallingstandard>
    <linkid>1225</linkid>
    <si>isup</si>
    <mtp>
      <opc>004-009-004</opc>
      <dpc>004-048-000</dpc>
    </mtp>
    <msucount>2</msucount>
    <octcount>72</octcount>
  </record>

  <record>
    <signallingstandard>ITU-I</signallingstandard>
    <linkid>1225</linkid>
    <si>isup</si>
    <mtp>
      <opc>004-009-004</opc>
      <dpc>002-056-000</dpc>
    </mtp>
    <msucount>56</msucount>
    <octcount>1009</octcount>
  </record>

Last edited by pludi; 06-21-2012 at 05:32 AM..

sdesstp

View Public Profile for sdesstp

Find all posts by sdesstp

06-21-2012

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Actually sed is the perfect tool to do this sort of things, but it is a little tricky to understand its workings. Let's start with this first and write the program later:

sed works line-oriented: it reads the first line of the input and then applies one command after the other to it, until it reaches the end of its script. Then the next line of input is read and this process starts over, until the last line of input, upon which it stops.

You see from this explanation, that "delete the x lines before" is difficult to do, because when sed gets to decide if a line is to be deleted it isn't it its scope any more. The solution is to make the part we want to examine "one line" somehow.

When i said sed "reads a line" i was not completely correct: there is a data structure called "pattern space", which actually holds this line. Every change sed does is done on this pattern space. If sed is called without the "-n" command line option the resulting content of the pattern space is printed automatically at the end of the script. If the pattern space becomes completely empty as a result of the manipulations the rest of the script is skipped and the process starts over with the next line of input. There is a special command ("N") to add the next line of input to the contents of this pattern space, separated by a linefeed character ("\n").

With this device it is simple to construct your filter: upon finding a line with "<record>" we add all following lines to the pattern space until encountering a line with "</record>". Because we add to the pattern space this "pattern space contains <record>" is true always from now on. (Here the line-oriented nature of sed is showing.) This will give us the whole XML "record" in our pattern space.

When we find the "</record>" in our search string we know we have read the whole record. Now we search for the search term in this and - when we find it - do NOT print this record, otherwise it gets printed (the "!" is a logical NOT). Afterwards the pattern space is deleted and the cycle starts over again. Because we haven't switched off the default "print" action with the "-n" command line switch all the pattern spaces (~lines) containing neither a "<record>" nor a "</record>" (that is: all the lines outside of <record>..</record> structures) are being printed automatically.

Code:

sed ':start
     /<\/record>/ {
          /isup/!p
          d
     }
     /<record>/ {
          N
          b start
    }' /path/to/inputfile

Regarding the "batch of input files": construct a loop a list of filenames or use "find" if there is some filemask you can apply:

Code:

#!/bin/ksh

typeset file=""

# this will process all files in the file "/path/to/list" and put the results in files named like the input files but with an appended ".processed"
cat /path/to/list | while read file ; do
     sed ':start
          /<\/record>/ {
               /isup/!p
               d
          }
          /<record>/ {
               N
               b start
         }' $file > ${file}.processed
done

If you can construct a file mask (like "*.file", etc.) to find the files you can use "find" to do the work. Save the sed-script in a file "script.sed" and:

Code:

find /path/to/input/files -type f -name "*mask*" -exec file={} ; sed -f script.sed $file > ${file}.processed \;

I hope this helps.

bakunin

Last edited by bakunin; 06-21-2012 at 07:12 AM..

bakunin

View Public Profile for bakunin

Find all posts by bakunin

UNIX for Dummies Questions & Answers

Delete a record in a xml file using shell scripting

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete all CONSECUTIVE text lines from file shell scripting

Discussion started by: RJSKR28

2. Shell Programming and Scripting

Need code for updating second record to first record in shell scripting

Discussion started by: Samah

3. UNIX for Advanced & Expert Users

How can i populate the record in to table using shell scripting?

Discussion started by: ankit.mca.aaidu

4. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

Discussion started by: vsraju

5. Shell Programming and Scripting

How to extract data from XML file using shell scripting?

Discussion started by: arun_kohan

6. Shell Programming and Scripting

How to extract data from xml file using shell scripting?

Discussion started by: arun_kohan

7. Shell Programming and Scripting

How to add trailer record at the end of the flat file in the unix ksh shell scripting?

Discussion started by: srikanth_sagi

8. Solaris

XML to Text file Parsing Using shell scripting

Discussion started by: tech_frk

9. Shell Programming and Scripting

XML to Text file Parsing Using shell scripting

Discussion started by: tech_frk

10. Shell Programming and Scripting

Shell script for searching a record,copy to a file and then delete it

Discussion started by: kumara2010