Help with missing XML tag


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with missing XML tag
# 1  
Old 11-30-2015
Help with missing XML tag

Hello All,

I am struggling with many huge XML files with lots of Account details including at least one Membership tag, in that Membership tag one xml tag was missed that is MembershipIdentifier:
(There are many Account tags with at least one Membership tag are there in each file)

Code:
......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
        <MembershipIdentifier>PB00000000212799753</MembershipIdentifier>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......

but some how MembershipIdentifier tag is compleatly missing. After missing its look like below:

Code:
......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......

How can i find which AccountIdentifier missed MembershipIdentifier and if possible i need to replace with default MembershipIdentifier like PB00000000123456

So far i have tried with this for finding missed MembershipIdentifiers, but it didn't work:

Code:
awk '{if ($0 ~ /<//EnrollmentDate>/) {triggered=1;}if (triggered) {print; if ($0 ~ /<//Membership>/) { exit;}}}' filenames

Can somebody help me?

Thanks in advance...

Last edited by VasuKukkapalli; 11-30-2015 at 07:06 PM..
# 2  
Old 11-30-2015
This seems to work:
Code:
awk -v DMI="PB00000000123456789" '
/<Membership>/ {
	MIfound = 0
}
/<MembershipIdentifier>/ {
	MIfound = 1
}
/<\/Membership>/ && !MIfound {
	print "        <MembershipIdentifier>" DMI "</MembershipIdentifier>"
}
1' filenames

If a file named filenames contains:
Code:
......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
        <MembershipIdentifier>PB00000000212799753</MembershipIdentifier>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......
......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......

the above script produces the output:
Code:
......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
        <MembershipIdentifier>PB00000000212799753</MembershipIdentifier>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......
......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
        <MembershipIdentifier>PB00000000123456789</MembershipIdentifier>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......

adding the line marked in red.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 12-01-2015
Thank you So much Don for your time...

As i am dealing with large files(each file contains more that 10K Accounts and file count is more than 1000), this script just appending MI tag and displaying the output on the screen but not to the actual files, i am not sure how to add to actual files.

Also i would need to know which file and which AccountIdentifier(At least file name) missed MembershipIdentifier tag.

Could you please help me to get this done?
Thank in advance...
# 4  
Old 12-02-2015
Making the wild assumption that the exec family of functions on your system can handle more than 1000 filenames in an argument list, the following should do what you want:
Code:
#!/bin/ksh
IAm=${0##*/}
tmpf="$IAm.$$"
awk -v DMI="PB00000000123456789" -v tmpf="$tmpf" '
function copyback() {
	if(oldf) {
		close(tmpf)
		if(cc) {
			cc = 0
			cmd = "cp \"" tmpf "\" \"" oldf "\""
			print "Running: " cmd
			if(system(cmd))
				failed++
		}
	}
	oldf = FILENAME
}
FNR == 1 {
	copyback()
}
/<AccountIdentifier>/ {
	split($0, AI, /<|>/)
}
/<Membership>/ {
	MIfound = 0
}
/<MembershipIdentifier>/ {
	MIfound = 1
}
/<\/Membership>/ && !MIfound {
	cc++
	print "MembershipIdentifier missing for AccountIdentifier " AI[3] \
	    " in file \"" oldf "\"."
	print "        <MembershipIdentifier>" DMI \
	    "</MembershipIdentifier>" > tmpf
}
{	print > tmpf
}
END {	copyback()
	if(failed) {
		print "*** Updating contents of " failed " files failed."
		exit 1
	}
}' "$@"
exit_code=$?
rm -rf "$tmpf"
exit $exit_code

Call this script with a list of files to be processed as operands. If your system can't handle an arg list that long, use xargs to invoke this script multiple times with subsets of the argument list.

This was written and tested using the Korn shell, but will work with any shell that understands basic POSIX shell parameter expansions (including ash, bash, dash, ksh, and zsh) but will not work with a legacy Bourne shell and will not work with shells based on csh syntax.

And, as always, if you want to try this on a Solaris system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 12-02-2015
This snippet saves any .xml file as .xml.rebuilt and it adds a default MembershipIndentifier. It logs the details in a file named rebuilt.log in the current directory, reporting the file name, the line number and the account missing the tag. Updates will be printed to your screen.

Save as VasuKukkapalli.pl and run as perl VasuKukkapalli.pl file1.xml file2.xml file3.xml ...
or
perl VasuKukkapalli.pl *.xml

Code:
#!/usr/bin/perl

use strict;
use warnings;

sub account_id
{
    my $account_line = shift;
    my ($id) = $account_line =~ /<AccountIdentifier>(\d+)</;
    return $id;
}

sub writefile
{
    my $filename = shift || die;
    print "Creating $filename\n";
    open my $fh, '>', $filename || die "Could not create $filename: $!\n";
    return $fh;
}

my @account = ();
my $membership =
    "<MembershipIdentifier>PB00000000123456789</MembershipIdentifier>\n";

my $current_file = $ARGV[0];
my $log = writefile("rebuilt.log");
my $tmp = writefile("$current_file.rebuilt");

while(<>){
    if($current_file ne $ARGV){
        close $tmp;
        $current_file = $ARGV;
        $tmp = writefile("$current_file.rebuilt");
        $. = 1;
    }
    push @account, [] if /<Account>/;
    if(exists $account[0]){
        push @{$account[0]}, $_;
        push @{$account[1]}, $.;
    }
    else{
        print $tmp "$_";
    }
    if(/<\/Account>/){
        if(!(@{$account[0]}[8] =~ /<MembershipIdentifier>/)){
            my ($spaces) = @{$account[0]}[7] =~ /(^\s+)/;
            splice @{$account[0]}, 8, 0, "$spaces$membership";
            my $id = account_id(@{$account[0]}[1]);
            print $log "File $ARGV: ",
                       "Line @{$account[1]}[8]: ",
                       "Account $id missing MembershipIdentifier\n";
        }
        print $tmp "@{$account[0]}";
        @account = ();
    }
}
print "Your files have been saved with the extension .rebuilt\n";
print "For details of missing MembershipIdentifier, please,",
      "look into rebuilt.log\n";

close $log;
close $tmp;

The file rebuilt.log will have something similar to:

Code:
File v2.xml: Line 26: Account 23123 missing MembershipIdentifier
File v2.xml: Line 41: Account 23125 missing MembershipIdentifier
File v3.xml: Line 26: Account 23123 missing MembershipIdentifier
File v3.xml: Line 41: Account 23125 missing MembershipIdentifier

This User Gave Thanks to Aia For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

2. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

3. Shell Programming and Scripting

Need help in adding missing tag in php pages

hi, I am still a newbie on ssh but trying hard. my friends website was hit by some virus which included a long encrypted malware code on each and every php file she had. I was able to use sed command via ssh to remove the malware codes but now most pages don't have a opening <?php tag. i... (8 Replies)
Discussion started by: netatma
8 Replies

4. Shell Programming and Scripting

To search for a particular tag in xml and collate all similar tag values and display them count

I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help File: <xml><object1>house</object1><object2>child</object2>... (9 Replies)
Discussion started by: srkmish
9 Replies

5. Shell Programming and Scripting

Need to replace XML TAG

As per the requirement I need to replace XML tag with old to new on one of the XML file. Old<com : DEM>PHI</com : DEM> New<com : DEM>PHM</com : DEM> Please someone provide the sed command to replace above mentioned old XML tag with new XML tag (2 Replies)
Discussion started by: siva83
2 Replies

6. Shell Programming and Scripting

XML Parse between to tag with upper tag

Hi Guys Here is my Input : <?xml version="1.0" encoding="UTF-8"?> <xn:MeContext id="01736"> <xn:VsDataContainer id="01736"> <xn:attributes> <xn:vsDataType>vsDataMeContext</xn:vsDataType> ... (12 Replies)
Discussion started by: pareshkp
12 Replies

7. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

8. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

9. Shell Programming and Scripting

XML tag replacement from different XML file

We have 2 XML file 1. ORIGINAL.xml file and 2. ATTRIBUTE.xml files, In the ORIGINAL.xml we need some modification as <resourceCode>431048</resourceCode>under <item type="Manufactured"> tag - we need to grab the 431048 value from tag and pass it to database table in unix shell script to find the... (0 Replies)
Discussion started by: balrajg
0 Replies

10. Shell Programming and Scripting

how to get xml tag..

Sorry to trouble you guys again.....but i encounter this problem: My textfile contains this: 2006-01-12 01:12:08,290 INFO - The XML message **************<PM_ARRIVAL xmlns:xsi= "http://www.w3.org/2001/XMLSchemainstance"><system_c>GMS</system_c><trans_c>ARLC</trans_c></<PM_ARRIVAL> 2006-01-12... (8 Replies)
Discussion started by: forevercalz
8 Replies
Login or Register to Ask a Question