Delete a record in a xml file using shell scripting


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Delete a record in a xml file using shell scripting
# 1  
Old 06-16-2012
Delete a record in a xml file using shell scripting

find pattern, delete line with pattern and 3 lines above and 8 lines below the pattern. The pattern is "isup". The entire record with starting tag <record> and ending tag </record> containing the pattern is to be deleted and the rest to be retained.

Code:
<record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>16068</linkid>
    <si>sccp</si>
    <mtp>
      <opc>3020</opc>
      <dpc>8034</dpc>
    </mtp>
    <sccp>
    </sccp>
    <map>
      <opcode>36</opcode>
    </map>
    <msucount>1</msucount>
    <octcount>83</octcount>
  </record>

  <record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>37</linkid>
    <si>isup</si>
    <mtp>
      <opc>8469</opc>
      <dpc>10336</dpc>
    </mtp>
    <msucount>168</msucount>
    <octcount>3069</octcount>
  </record>

<record>
    <signallingstandard>ITU-N</signallingstandard>
    <linkid>46</linkid>
    <si>sccp</si>
    <mtp>
      <opc>287</opc>
      <dpc>24</dpc>
    </mtp>
    <sccp>
      <cgpadigits>966540142007</cgpadigits>
      <cdpadigits>919434099997</cdpadigits>
    </sccp>
    <msucount>1</msucount>
    <octcount>53</octcount>
  </record>


Last edited by pludi; 06-16-2012 at 06:07 PM.. Reason: code tags
# 2  
Old 06-16-2012
First of all, what have you tried, and where are you stuck?
Second, that's XML, so it's probably not guaranteed to have the tag containing isup at the same line position every time, so a simple "remove 3 lines before, and 8 lines after" might not yield the desired result every time.
# 3  
Old 06-17-2012
I would suggest the following approach:

Code:
sed 's:</record>:&*:g' file1 | awk 'index($0,"isup")==0{print $0}' RS='*'

Would not work with nested tags and such.

You may want to surround isup with > and < if you do not want to match it with tag contents or as a substring of some other value
This User Gave Thanks to jawsnnn For This Post:
# 4  
Old 06-21-2012
The script is working fine but when the first record itself contains isup , the initial tags are also deleted. How to overcome this and also the script should work for a batch of xml files in a folder
Kindly help. Thanks in advance

---------- Post updated at 11:30 AM ---------- Previous update was at 11:02 AM ----------

The script is running fine but when the isup pattern appears in the first record itself the initial tags also getting deleted . Also the script has to run for a batch of xml files . Kindly help
The initial tags are like below
Code:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ecapreport SYSTEM "ecapreport.dtd">
<ecapreport>
  <stp>bsnlstpekm</stp>
  <collector>ekmecap1A</collector>
  <startdate>07082011</startdate>
  <starttime>235500</starttime>
  <enddate>08082011</enddate>
  <endtime>000000</endtime>

  <record>
    <signallingstandard>ITU-I</signallingstandard>
    <linkid>1225</linkid>
    <si>isup</si>
    <mtp>
      <opc>004-009-004</opc>
      <dpc>004-048-000</dpc>
    </mtp>
    <msucount>2</msucount>
    <octcount>72</octcount>
  </record>

  <record>
    <signallingstandard>ITU-I</signallingstandard>
    <linkid>1225</linkid>
    <si>isup</si>
    <mtp>
      <opc>004-009-004</opc>
      <dpc>002-056-000</dpc>
    </mtp>
    <msucount>56</msucount>
    <octcount>1009</octcount>
  </record>


Last edited by pludi; 06-21-2012 at 05:32 AM..
# 5  
Old 06-21-2012
Actually sed is the perfect tool to do this sort of things, but it is a little tricky to understand its workings. Let's start with this first and write the program later:

sed works line-oriented: it reads the first line of the input and then applies one command after the other to it, until it reaches the end of its script. Then the next line of input is read and this process starts over, until the last line of input, upon which it stops.

You see from this explanation, that "delete the x lines before" is difficult to do, because when sed gets to decide if a line is to be deleted it isn't it its scope any more. The solution is to make the part we want to examine "one line" somehow.

When i said sed "reads a line" i was not completely correct: there is a data structure called "pattern space", which actually holds this line. Every change sed does is done on this pattern space. If sed is called without the "-n" command line option the resulting content of the pattern space is printed automatically at the end of the script. If the pattern space becomes completely empty as a result of the manipulations the rest of the script is skipped and the process starts over with the next line of input. There is a special command ("N") to add the next line of input to the contents of this pattern space, separated by a linefeed character ("\n").

With this device it is simple to construct your filter: upon finding a line with "<record>" we add all following lines to the pattern space until encountering a line with "</record>". Because we add to the pattern space this "pattern space contains <record>" is true always from now on. (Here the line-oriented nature of sed is showing.) This will give us the whole XML "record" in our pattern space.

When we find the "</record>" in our search string we know we have read the whole record. Now we search for the search term in this and - when we find it - do NOT print this record, otherwise it gets printed (the "!" is a logical NOT). Afterwards the pattern space is deleted and the cycle starts over again. Because we haven't switched off the default "print" action with the "-n" command line switch all the pattern spaces (~lines) containing neither a "<record>" nor a "</record>" (that is: all the lines outside of <record>..</record> structures) are being printed automatically.

Code:
sed ':start
     /<\/record>/ {
          /isup/!p
          d
     }
     /<record>/ {
          N
          b start
    }' /path/to/inputfile


Regarding the "batch of input files": construct a loop a list of filenames or use "find" if there is some filemask you can apply:

Code:
#!/bin/ksh

typeset file=""

# this will process all files in the file "/path/to/list" and put the results in files named like the input files but with an appended ".processed"
cat /path/to/list | while read file ; do
     sed ':start
          /<\/record>/ {
               /isup/!p
               d
          }
          /<record>/ {
               N
               b start
         }' $file > ${file}.processed
done

If you can construct a file mask (like "*.file", etc.) to find the files you can use "find" to do the work. Save the sed-script in a file "script.sed" and:

Code:
find /path/to/input/files -type f -name "*mask*" -exec file={} ; sed -f script.sed $file > ${file}.processed \;

I hope this helps.

bakunin

Last edited by bakunin; 06-21-2012 at 07:12 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete all CONSECUTIVE text lines from file shell scripting

Hi I have a text file like below. THe content of the text will vary. Entire text file have four consecutive lines followed with blank line. I want to delete the occurrence of the two consicutive lines in the text file. I don't have pattern to match and delete. Just i need to delete all... (5 Replies)
Discussion started by: RJSKR28
5 Replies

2. Shell Programming and Scripting

Need code for updating second record to first record in shell scripting

Hi,, I have requirement that i need to get DISTINCT values from a table and if there are two records i need to update it to one record and then need to submit INSERT statements by using the updated value as a parameter. Here is the example follows.. SELECT DISTINCT ID FROM OFFER_GROUP WHERE... (1 Reply)
Discussion started by: Samah
1 Replies

3. UNIX for Advanced & Expert Users

How can i populate the record in to table using shell scripting?

LOG_DIR=/app/rpx/jobs/scripts/just/logs sendEmail() { pzCType="$1"; pzTitle="$2"; pzMsg="$3"; pzFrom="$4"; pzTo="$5"; pzFiles="$6"; pzReplyTo="$7" ( echo "From: $pzFrom\nTo: $pzTo\nSubject: $pzTitle${pzReplyTo:+\nReply-To: $pzReplyTo}" ] && echo... (1 Reply)
Discussion started by: ankit.mca.aaidu
1 Replies

4. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14... (2 Replies)
Discussion started by: vsraju
2 Replies

5. Shell Programming and Scripting

How to extract data from XML file using shell scripting?

Hi , I have input file as XML. following are input data #complex.xml Code: <?xml version="1.0" encoding="UTF-8"?><TEST_doc xmlns="http://www.w3.org/2001/XMLSchema-instance"> <ENTRY uid="123456"> <protein> <name>PROT001</name> <organism>Human</organism> ... (1 Reply)
Discussion started by: arun_kohan
1 Replies

6. Shell Programming and Scripting

How to extract data from xml file using shell scripting?

Hi evry1, This is my 1st post in this forum.Pls help me I want to extract some data froma xml file which has 2000 lines using shell scripting. Actually my xml file has some "audio and video codes" which i need to arrange in a column wise format after extracting it using shell scripting.I... (4 Replies)
Discussion started by: arun_kohan
4 Replies

7. Shell Programming and Scripting

How to add trailer record at the end of the flat file in the unix ksh shell scripting?

Hi, How to add trailer record at the end of the flat file in the unix ksh shell scripting can you please let me know the procedure Regards Srikanth (3 Replies)
Discussion started by: srikanth_sagi
3 Replies

8. Solaris

XML to Text file Parsing Using shell scripting

Hi, I want to parse an XML File using Shell Script preferably by using awk command, I/P file is : <gn:ExternalGsmCell id="016P3A"> <gn:attributes> <gn:mnc>410</gn:mnc> <gn:mcc>310</gn:mcc> <gn:lac>8016</gn:lac> ... (2 Replies)
Discussion started by: tech_frk
2 Replies

9. Shell Programming and Scripting

XML to Text file Parsing Using shell scripting

Hi folks, Need some help with XML to text file parsing , the following is the content of the XML File. <xn:SubNetwork id="SNJNPRZDCR0R03"> <xn:MeContext id="PRSJU0005"> <xn:VsDataContainer id="PRSJU0005"> <xn:attributes> ... (6 Replies)
Discussion started by: tech_frk
6 Replies

10. Shell Programming and Scripting

Shell script for searching a record,copy to a file and then delete it

Hi, I have a requirement in hand: I have a file with millions of records say file 1.I have another file, say file 2 which has 2000 records in it. The requirement is to read file2 , and remove the read record from file 1 and move i to a seperate file, file 3. For eg: Read file 2, get the... (5 Replies)
Discussion started by: kumara2010
5 Replies
Login or Register to Ask a Question