Perl: update lastmod in xml file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl: update lastmod in xml file
# 1  
Old 04-12-2010
Perl: update lastmod in xml file

I'm trying to write a perl script that I can run as a cron job in root of my web server that will look for .shtml files get their last modified date and replace it in the sitemap_test.xml file. the problem is the substitution doesn't work and when I print to MYFILE it adds the lastmod to the end of the file. Any help would be appreciated.

The Script:
Code:
#!/usr/bin/perl

use warnings;
use strict;
use POSIX;

my @shtml_files = <*\.shtml>;

my $filename = "sitemap_test.xml";


foreach my $shtml_files (@shtml_files)
{
    my $lastmod = strftime("%Y-%m-%dT%H:%M:%S-06:00", localtime((stat($shtml_files))[9]));

        open (MYFILE, "+<$filename") or die "Cannot open file: $!";
        while (<MYFILE>)
            {
            if (m/$shtml_files/) 
                {
                
                my $nextLine = <MYFILE>;
            
                my $nextLine1 = <MYFILE>;
           
                my $sub = $nextLine1;
                $sub =~ s/$nextLine1/<lastmod>$lastmod<\/lastmod>/;
                print MYFILE "$sub\n";
                
                }

            }
        
        close (MYFILE);
        
}

The sitemap_test.xml:
Code:
<?xml version="1.0" encoding="UTF-8"?> 

<urlset 

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9

http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" 

> 

<url> 

<loc>http://www.saildallas.com/index.shtml</loc> 

<lastmod>2010-03-15T14:27:00-05:00</lastmod> 

<priority>1.0</priority> 

<changefreq>weekly</changefreq> 

</url> 

<url> 

<loc>http://www.saildallas.com/events.shtml</loc> 

<lastmod>2010-02-15T14:27:00-05:00</lastmod> 

<priority>0.5</priority> 

<changefreq>daily</changefreq> 

</url>

<url> 

<loc>http://www.saildallas.com/yachtsales.shtml</loc> 

<lastmod>2010-02-15T14:27:00-05:00</lastmod> 

<priority>0.6</priority> 

<changefreq>daily</changefreq> 

</url>

</urlset>

# 2  
Old 04-12-2010
A few thoughts here -

(a) Your program is quite inefficient since it populates the array "@shtml_files" and then parses the entire file for each array element. If your array has, say, 100 elements then you'll be opening, parsing and closing the "sitemap_test.xml" file 100 times.

(b) A more efficient way would be to populate the array and then walk through the xml file just once. Whenever you encounter the <loc> tag in the file, look up the filename in the array. You can use "grep" operator for that. (Perl 6 will have an operator to emulate "in" i.e. "element x in <array>" that you find in awk/pascal/sql etc.)

(c) The in-place modification of the xml file via the update mode ("+<") works well for small files, but doesn't scale well. For huge files, it could jam your system's virtual memory. Instead, use a temporary file that uses much less memory and creates a backup file.

Here's the updated script that implements all the thoughts above -

Code:
$ 
$ 
$ cat -n process.pl
     1    #!/usr/bin/perl
     2    use warnings;
     3    use strict;
     4    use POSIX;
     5    
     6    my $lastmod;
     7    my @shtml_files;
     8    foreach (<*\.shtml>) {
     9      push @shtml_files, $_.strftime("%Y-%m-%dT%H:%M:%S-06:00", localtime((stat($_))[9]));
    10    }
    11    
    12    my $old = "sitemap_test.xml";
    13    my $old_orig = "sitemap_test_orig.xml";
    14    my $new = "sitemap_test_new.xml";
    15    
    16    open (OLD, "< $old") or die "Can't open $old: $!";
    17    open (NEW, "> $new") or die "Can't open $new: $!";
    18    select(NEW);   # new default filehandle
    19    while (<OLD>) {
    20      # process line, change $_ if required and then print to NEW filehandle
    21      if (/<loc>(.*?)<\/loc>/ and grep {/^$1/} @shtml_files) {
    22        $lastmod = (grep {/^$1/} @shtml_files)[0];
    23        $lastmod =~ s/^$1//;
    24      } elsif (/<lastmod>(.*?)<\/lastmod>/ and $lastmod ne "") {
    25        $_ = "<lastmod>$lastmod<\/lastmod>\n";
    26        $lastmod = "";
    27      }
    28      print NEW $_;
    29    }
    30    close (OLD) or die "Can't close $old: $!";
    31    close (NEW) or die "Can't close $new: $!";
    32    rename($old, $old_orig) or die "Can't rename $old to $old_orig: $!";
    33    rename($new, $old) or die "Can't rename $new to $old: $!";
    34    
$ 
$

HTH,
tyler_durden
# 3  
Old 04-13-2010
Code:
#! /usr/bin/perl

use strict;
use warnings;
use POSIX qw~strftime~;

my @shtmls = glob ( '*.shtml' );
my %filemtime;

for ( @shtmls ) {

   my $lastmod = strftime("%Y-%m-%dT%H:%M:%S-06:00", localtime((stat($_))[9]));
   $filemtime{$_} = $lastmod;

}

open ( F, "sitemap.xml" ) || die "$!\n";
my @xml = <F>;
close F;

my ($new, $ON);

for ( @xml ) {

    for my $fn ( keys %filemtime ) { if ( /<loc>.*$fn.*<\/loc>/ ) { $ON = $fn; }}
    if ( $ON && s/(<lastmod>).*?(<\/lastmod>)/$1$filemtime{$ON}$2/ ) { $ON = (); }
    $new .= $_;

}

open ( G, ">new.xml" ) || die "$!\n";
print G $new;
close G;

Submitted for your perusal. Tyler's is more stable. I was working on this while he posted his....
# 4  
Old 04-13-2010
Thank You Tyler and Deindorfer, I knew there was a more efficient solution than my amateur effort. I'll try both of these and have already learned a lot about scripting in perl. Thanks again for your time and effort helping me out.

Last edited by skilodge; 04-13-2010 at 12:46 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Update particular tag in a XML file

Below is the content in my XML file <name>XXX</name> <eventType>Uptime</eventType> <eventType>Delay</eventType> <eventType>Delay</eventType> <name>YYY</name> <eventType>Uptime</eventType> <eventType>Delay</eventType> ... (12 Replies)
Discussion started by: Viswanatheee55
12 Replies

2. Shell Programming and Scripting

How to retrieve values from XML file and update them in the same position! PLEASE HELP?

Good Day All Im quiet new to ksh scripting and need a bit of your help. I am attempting to write a script that reads in an XML and extracts certain field values from an XML file. The values are all alphanumeric and consist of two components: e.g "Test 1". I need to to create a script that... (2 Replies)
Discussion started by: JulioAmerica
2 Replies

3. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

4. UNIX for Advanced & Expert Users

Convert CSV file to nested XML file using UNIX/PERL?

we have a CSV which i need to convert to XML using Perl or Unix shell scripting. I was able to build this XML in oracle database. However, SQL/XML query is running for long time. Hence, I'm considering to write a Perl or shell script to generate this XML file. Basically need to build this XML... (3 Replies)
Discussion started by: laknar
3 Replies

5. Shell Programming and Scripting

Find and update line in xml file

Hi, I have a xml file that I need to modify 1 line to change some value from 2 to 10 (or any number). Sample input: <!-- some text here> . . . <message:test name="ryan"> <message:sample-channel charset="UTF-8" max-value="2" wait="20"> ... (5 Replies)
Discussion started by: brichigo
5 Replies

6. Shell Programming and Scripting

Positional Update of XML File

Hello, I have a XML file and need to update the data for a specific XML Attribute in the file. I need a Perl or Awk command to look for <INTERCHANGE_CONTROL_NO>000000601</INTERCHANGE_CONTROL_NO> in the XML file and change the first two 0 of the value to 9. For instance ... (4 Replies)
Discussion started by: Praveenkulkarni
4 Replies

7. Shell Programming and Scripting

Script need to do update xml file

<avp name="CC-Request-Type" value="1"> </avp> <avp name="CC-Request-Number" value="0"> </avp> <avp name="Subscription-Id"> <avp name="Subscription-Id-Type" value="0"></avp> <avp name="Subscription-Id-Data" value="4081234567"></avp> </avp> <avp... (5 Replies)
Discussion started by: gstar
5 Replies

8. Shell Programming and Scripting

perl script to update a xml file

Hi experts, I have a set of xml files in folder which has the below field. <mm:sessionID>157.235.206.12900397BE4:A</mm:sessionID>, I need to update this field regularly with new session id, which I have it from a login file. Can anyone tell me how to add a new value in <mm:sessionID>... (3 Replies)
Discussion started by: amvarma77
3 Replies

9. Shell Programming and Scripting

XML file using Perl

Hi All, Does anyone know how to generate an XML file using perl scripting? Thanks in advance. Regards, P AGARWAL (3 Replies)
Discussion started by: agarwal
3 Replies

10. Shell Programming and Scripting

How to parse a XML file using PERL and XML::DOm

I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies
Login or Register to Ask a Question