Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 06-16-2012
Registered User
 
Join Date: Jun 2012
Posts: 2
Thanks: 1
Thanked 0 Times in 0 Posts
Delete a record in a xml file using shell scripting

find pattern, delete line with pattern and 3 lines above and 8 lines below the pattern. The pattern is "isup". The entire record with starting tag <record> and ending tag </record> containing the pattern is to be deleted and the rest to be retained.


Code:
<record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>16068</linkid>
    <si>sccp</si>
    <mtp>
      <opc>3020</opc>
      <dpc>8034</dpc>
    </mtp>
    <sccp>
    </sccp>
    <map>
      <opcode>36</opcode>
    </map>
    <msucount>1</msucount>
    <octcount>83</octcount>
  </record>

  <record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>37</linkid>
    <si>isup</si>
    <mtp>
      <opc>8469</opc>
      <dpc>10336</dpc>
    </mtp>
    <msucount>168</msucount>
    <octcount>3069</octcount>
  </record>

<record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>46</linkid>
    <si>sccp</si>
    <mtp>
      <opc>287</opc>
      <dpc>24</dpc>
    </mtp>
    <sccp>
      <cgpadigits>966540142007</cgpadigits>
      <cdpadigits>919434099997</cdpadigits>
    </sccp>
    <msucount>1</msucount>
    <octcount>53</octcount>
  </record>


Last edited by pludi; 06-16-2012 at 05:07 PM.. Reason: code tags
Sponsored Links
    #2  
Old 06-16-2012
pludi's Avatar
pludi pludi is offline Forum Staff  
Cat herder
 
Join Date: Dec 2008
Location: Vienna, Austria, Earth
Posts: 5,519
Thanks: 38
Thanked 333 Times in 306 Posts
First of all, what have you tried, and where are you stuck?
Second, that's XML, so it's probably not guaranteed to have the tag containing isup at the same line position every time, so a simple "remove 3 lines before, and 8 lines after" might not yield the desired result every time.
Sponsored Links
    #3  
Old 06-17-2012
Registered User
 
Join Date: May 2012
Posts: 58
Thanks: 5
Thanked 9 Times in 9 Posts
I would suggest the following approach:


Code:
sed 's:</record>:&*:g' file1 | awk 'index($0,"isup")==0{print $0}' RS='*'

Would not work with nested tags and such.

You may want to surround isup with > and < if you do not want to match it with tag contents or as a substring of some other value
The Following User Says Thank You to jawsnnn For This Useful Post:
sdesstp (06-18-2012)
    #4  
Old 06-21-2012
Registered User
 
Join Date: Jun 2012
Posts: 2
Thanks: 1
Thanked 0 Times in 0 Posts
The script is working fine but when the first record itself contains isup , the initial tags are also deleted. How to overcome this and also the script should work for a batch of xml files in a folder
Kindly help. Thanks in advance

---------- Post updated at 11:30 AM ---------- Previous update was at 11:02 AM ----------

The script is running fine but when the isup pattern appears in the first record itself the initial tags also getting deleted . Also the script has to run for a batch of xml files . Kindly help
The initial tags are like below

Code:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ecapreport SYSTEM "ecapreport.dtd">
<ecapreport>
  <stp>bsnlstpekm</stp>
  <collector>ekmecap1A</collector>
  <startdate>07082011</startdate>
  <starttime>235500</starttime>
  <enddate>08082011</enddate>
  <endtime>000000</endtime>

  <record>
    <signallingstandard>ITU-I</signallingstandard>
    <linkid>1225</linkid>
    <si>isup</si>
    <mtp>
      <opc>004-009-004</opc>
      <dpc>004-048-000</dpc>
    </mtp>
    <msucount>2</msucount>
    <octcount>72</octcount>
  </record>

  <record>
    <signallingstandard>ITU-I</signallingstandard>
    <linkid>1225</linkid>
    <si>isup</si>
    <mtp>
      <opc>004-009-004</opc>
      <dpc>002-056-000</dpc>
    </mtp>
    <msucount>56</msucount>
    <octcount>1009</octcount>
  </record>


Last edited by pludi; 06-21-2012 at 04:32 AM..
Sponsored Links
    #5  
Old 06-21-2012
bakunin bakunin is offline Forum Staff  
Bughunter Extraordinaire
 
Join Date: May 2005
Location: In the leftmost byte of /dev/kmem
Posts: 3,291
Thanks: 27
Thanked 450 Times in 351 Posts
Actually sed is the perfect tool to do this sort of things, but it is a little tricky to understand its workings. Let's start with this first and write the program later:

sed works line-oriented: it reads the first line of the input and then applies one command after the other to it, until it reaches the end of its script. Then the next line of input is read and this process starts over, until the last line of input, upon which it stops.

You see from this explanation, that "delete the x lines before" is difficult to do, because when sed gets to decide if a line is to be deleted it isn't it its scope any more. The solution is to make the part we want to examine "one line" somehow.

When i said sed "reads a line" i was not completely correct: there is a data structure called "pattern space", which actually holds this line. Every change sed does is done on this pattern space. If sed is called without the "-n" command line option the resulting content of the pattern space is printed automatically at the end of the script. If the pattern space becomes completely empty as a result of the manipulations the rest of the script is skipped and the process starts over with the next line of input. There is a special command ("N") to add the next line of input to the contents of this pattern space, separated by a linefeed character ("\n").

With this device it is simple to construct your filter: upon finding a line with "<record>" we add all following lines to the pattern space until encountering a line with "</record>". Because we add to the pattern space this "pattern space contains <record>" is true always from now on. (Here the line-oriented nature of sed is showing.) This will give us the whole XML "record" in our pattern space.

When we find the "</record>" in our search string we know we have read the whole record. Now we search for the search term in this and - when we find it - do NOT print this record, otherwise it gets printed (the "!" is a logical NOT). Afterwards the pattern space is deleted and the cycle starts over again. Because we haven't switched off the default "print" action with the "-n" command line switch all the pattern spaces (~lines) containing neither a "<record>" nor a "</record>" (that is: all the lines outside of <record>..</record> structures) are being printed automatically.


Code:
sed ':start
     /<\/record>/ {
          /isup/!p
          d
     }
     /<record>/ {
          N
          b start
    }' /path/to/inputfile


Regarding the "batch of input files": construct a loop a list of filenames or use "find" if there is some filemask you can apply:


Code:
#!/bin/ksh

typeset file=""

# this will process all files in the file "/path/to/list" and put the results in files named like the input files but with an appended ".processed"
cat /path/to/list | while read file ; do
     sed ':start
          /<\/record>/ {
               /isup/!p
               d
          }
          /<record>/ {
               N
               b start
         }' $file > ${file}.processed
done

If you can construct a file mask (like "*.file", etc.) to find the files you can use "find" to do the work. Save the sed-script in a file "script.sed" and:


Code:
find /path/to/input/files -type f -name "*mask*" -exec file={} ; sed -f script.sed $file > ${file}.processed \;

I hope this helps.

bakunin

Last edited by bakunin; 06-21-2012 at 06:12 AM..
Sponsored Links
Closed Thread

Tags
delete lines above and below a pattern

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to add trailer record at the end of the flat file in the unix ksh shell scripting? srikanth_sagi Shell Programming and Scripting 3 05-18-2012 03:23 PM
How to delete 1 record in large file! nikki1200 Shell Programming and Scripting 3 12-07-2011 02:17 AM
How to delete first record from all the file? NirajThakar Shell Programming and Scripting 3 02-11-2011 03:51 AM
Shell script for searching a record,copy to a file and then delete it kumara2010 Shell Programming and Scripting 5 06-16-2010 10:33 AM
How to delete a record from a csv file Rajeev Agrawal UNIX for Dummies Questions & Answers 1 02-04-2006 12:43 PM



All times are GMT -4. The time now is 03:37 AM.