Copying the Header & footer Information to the Outfile.

08-25-2011

Registered User

61, 0

Join Date: Sep 2010

Last Activity: 4 November 2013, 1:17 PM EST

Posts: 61

Thanks Given: 40

Thanked 0 Times in 0 Posts

Copying the Header & footer Information to the Outfile.

Hi

I am writing a perl script which checks for the specific column values from a file and writes to the OUT file.

So the feed file has a header information and footer information.

I header information isaround107 lines i.e.
Starts with

Code:

START-OF-FILE
....... 
so on ....

TIMESTARTED=Thu Aug 25 01:03:50 BST 2011
START-OF-DATA
# PRODUCT=Corp/Pfd

After the last line "# PRODUCT=Corp/Pfd" the actual data would start.

The footer information is 4 lines i.e.

Code:

END-OF-DATA
DATARECORDS=1275983
TIMEFINISHED=Thu Aug 25 02:27:02 BST 2011
END-OF-FILE

Now, My perl script is as below:

Code:

#!/usr/bin/perl

$file='file';
open(FILE,$file)|| die ("could not open file $file: $!");  # note minor changes in this line, too
open(OUT1,'>','badfile');
open(OUT2,'>','goodfile');
my @fields;
$line = $_;

while (<FILE>) {

$line = $_;
@fields = split (/\|/, $line);
<<<<<< 1)  Here Before going to check the column values, I need to write the HEADER and FOOTER information to the Goodfile. >>>>>>>>>

if( $fields[32] eq "N.A."  && $fields[33] eq "N.A." && $fields[34] eq "N.A." && $fields[38] eq "N.A." && ($fields[62] eq "N.A." ||  $fields[62] eq " "))
{
print OUT1 $line;   -----> Badfile
}

else
{
    print OUT2 $line;                ----> Goodfile
}
}
close FILE;
close OUT1;
close OUT2;

1)Here Before going to check the column values, I need to write the HEADER and FOOTER information to the Goodfile

2) Also, I need to calculate the Number of Records in the Good file and then change the FOOTER Information as:

Code:

END-OF-DATA
DATARECORDS=1275983   --> New Rowcount from the Goodfile
TIMEFINISHED=Thu Aug 25 02:27:02 BST 2011
END-OF-FILE

Could anyone please help me out in solving this. Help would be really appreciated.

filter

View Public Profile for filter

Find all posts by filter

08-25-2011

Registered User

1,000, 237

Join Date: Jun 2011

Last Activity: 2 August 2017, 9:27 AM EDT

Location: From far

Posts: 1,000

Thanks Given: 21

Thanked 237 Times in 231 Posts

Hi!

The simplest way is to read the whole file in array, split it to four parts, process then and write the result in the output file. Because it's really simple and quick then perhaps you should do it in that way. There are a lot of things in the world else you can do or improve or learn.

But... There is always but, you know. :-) It is definitely not "unix way". Why?

Well. From the famous "The UNIX Time-Sharing System": "... there have always been fairly severe size constraints on the system and its software. Given the partially antagonistic desires for reasonable efficiency and expressive power, the size constraint has encouraged not only economy, but also a certain elegance of design."

You don't believe if I say what recourses did have the first Unix hosts. So I wouldn't - but the word "severe" says for itself. At those time the famous "unix philosophy" was born.

Doug McIlroy summarized it in this way: "This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

You can read more here - in the free and good book The Art of Unix Programming.

And what relation does all this stuff have to your question? Just see:

1. You need the header:

Code:

sed -n '/^START-OF-FILE/,/^START-OF-DATA/p' INPUTFILE >/tmp/header.$$

2. The footer:

Code:

sed -n '/^END-OF-DATA/,$p' INPUTFILE >/tmp/footer.$$

3. You can process your file with your perl script but print the name of your good file in the end of the script:

Code:

goodfile=$(perl process.pl)

Or you can print both names - good and bad one and then split them. Or you can give this name as the argument to the script. You just need to know this name.

4. What is the number of records(lines) in the goodfile?

Code:

goodrecs=$(wc -l "$goodfile")

5. The new footer:

Code:

sed 's/^DATARECORDS=.*$/DATARECORDS='"$goodrecs"'/' /tmp/footer.$$ >/tmp/newfooter.$$

6. Now

Code:

cat /tmp/header.$$ "$goodfile" /tmp/newfooter.$$ >OUTPUTFILE

7. And don't forget to clean after you:

Code:

rm /tmp/header.$$ /tmp/*footer.$$ # maybe the goodfile too

The beauty of the shell programming that you can do it incremental, in small pieces. You can test and debug your steps separately. And then, when you get the result, you just append your steps in a small, elegant, and really unix program - a shell script.

Regards,
Andrey (yazu)

===

Well. Sorry for my English. This post was really my English exercise. :-)

Last edited by yazu; 08-25-2011 at 11:29 PM..

These 2 Users Gave Thanks to yazu For This Post:

yazu

View Public Profile for yazu

Find all posts by yazu

08-25-2011

Registered User

61, 7

Join Date: Mar 2008

Last Activity: 1 April 2013, 11:42 AM EDT

Posts: 61

Thanks Given: 2

Thanked 7 Times in 7 Posts

@Andrey (yazu), thanks for the link to the book. I appreciate it.

- GP

g.pi

View Public Profile for g.pi

Find all posts by g.pi

08-26-2011

Registered User

61, 0

Join Date: Sep 2010

Last Activity: 4 November 2013, 1:17 PM EST

Posts: 61

Thanks Given: 40

Thanked 0 Times in 0 Posts

Hi yazu,

Really appreciate for your post. Thanks a lot for your answer and thoughts.

Code:

The simplest way is to read the whole file in array, split it to four  parts, process then and write the result in the output file. Because  it's really simple and quick then perhaps you should do it in that way.  There are a lot of things in the world else you can do or improve or  learn.

yes you are correct. I did tried the logic to save the entire file into an array and then tried to divide the parts.

But I was struck to do the following points Inside the script:
1) How to write the footer information into the goodfile inside the perl script.
2) Thought of using a counter to calculate the number of lines and then how do I substitute the number in the footer information.

Really appreciate your thoughts using Unix and I did learn a lot from your post.

Is there any way we can do the same in Perl Script itself.

Thanks a lot for your replies.

filter

View Public Profile for filter

Find all posts by filter

08-26-2011

Registered User

1,000, 237

Join Date: Jun 2011

Last Activity: 2 August 2017, 9:27 AM EDT

Location: From far

Posts: 1,000

Thanks Given: 21

Thanked 237 Times in 231 Posts

Ok. Let's take a such example file:

Code:

cat INPUTFILE
START-OF-FILE
....... 
so on ....

TIMESTARTED=Thu Aug 25 01:03:50 BST 2011
START-OF-DATA
# PRODUCT=Corp/Pfd
a
b
1
c
d
3
END-OF-DATA
DATARECORDS=1275983
TIMEFINISHED=Thu Aug 25 02:27:02 BST 2011
END-OF-FILE

Good lines are numbers and all others are bad lines. So here a sketch:

Code:

perl -e '                                                              :( 
use warnings;
use strict;

my $goodfile = "goodfile";
my $footer_len = 4;
my $datarec_line = 1;

my (@whole, @header, @footer, @goodlines, @badlines);
my $line;

@whole = <>;

do {
  $line = shift @whole;
  push @header, $line;
} while $line !~ /^START-OF-DATA/;

@footer = splice @whole, -$footer_len;

for (@whole) {
  if (/\d/) {
    push @goodlines, $_;
  } else {
    push @badlines, $_;
  }
}

$footer[$datarec_line] =~ s/\d+/scalar @goodlines/e;

open my $fh, ">", $goodfile;
print $fh @header, @footer, @goodlines;
close $fh;

print @badlines
' INPUTFILE

Good records go to the goodfile and bad ones to the stdout. The footer is before good records.
You can change this sketch (the definition of good lines, the order of output, the output of bad lines) as you want.

This User Gave Thanks to yazu For This Post:

yazu

View Public Profile for yazu

Find all posts by yazu

08-26-2011

Registered User

61, 0

Join Date: Sep 2010

Last Activity: 4 November 2013, 1:17 PM EST

Posts: 61

Thanks Given: 40

Thanked 0 Times in 0 Posts

Hi Yazu,

Really Excellent logic when I have seen your code. Thank you very much for your time and for your thoughts.

I have modified the logic accordingly and below is the code:

Code:

#!/usr/bin/perl

$file='feedfile';
open(FILE,$file)|| die ("could not open file $file: $!");


my $goodfile = "goodfile";
my $badfile = "badfile";
my $footer_len = 4;
my $datarec_line = 1;

my (@whole, @header, @footer, @goodlines, @badlines, @fields);
my $line;
$line = $_;

@whole = <FILE>;

do {
  $line = shift @whole;
  push @header, $line;
} while $line !~ /^# PRODUCT/;

@footer = splice @whole, -$footer_len;


foreach (@whole) {
$line = $_;
@fields = split (/\|/, $line);

if( $fields[57] eq " ")
{
 push @badlines, $line;
}

elsif( $fields[32] eq "N.A."  && $fields[33] eq "N.A." && $fields[34] eq "N.A." && $fields[38] eq "N.A." && ($fields[62] eq "N.A." ||  $fields[62] eq " "))
{
push @badlines, $line;
}

else
{
push @goodlines, $line;
}

}

$footer[$datarec_line] =~ s/\d+/scalar @goodlines/e;

open my $fh, ">", $goodfile;
print $fh @header, @goodlines, @footer;
close $fh;

open my $fh1, ">", $badfile;
print $fh1 @badlines;
close $fh1

After running the code I have found that there are 4 lines in between the data records that are differentiate the data.
i.e.

Code:

grep -n "#  PRODUCT"  feedfile
1206675:# PRODUCT=Convertible 
1261566:# PRODUCT=Nationals
1270395:# PRODUCT= Agencies
1274335:# PRODUCT=Regionals

As above we can see that these 4 lines are invalid records.

Now, while calculating the Rowcount we need to ignore these 4 records. i.e.

Code:

$footer[$datarec_line] =~ s/\d+/scalar @goodlines/e;

Here while calculating the rowcount and substituting the new count, we have to ignore the above 4 lines(records).

May be reducing the array by 4. not sure though.

How can we reduce the row count by 4 so that we can get the actual count.

Really appreciate your time and thoughts.

---------- Post updated at 01:40 PM ---------- Previous update was at 01:24 PM ----------

Finally,

I did the following :

Code:

$footer[$datarec_line] =~ s/\d+/(scalar @goodlines - 4)/e;

Thanks a lot Yazu. I am really Very much thankful to you.

filter

View Public Profile for filter

Find all posts by filter

08-26-2011

Registered User

1,000, 237

Join Date: Jun 2011

Last Activity: 2 August 2017, 9:27 AM EDT

Location: From far

Posts: 1,000

Thanks Given: 21

Thanked 237 Times in 231 Posts

Code:

my $n = @goodlines;
$n -= grep {/^# PRODUCT/} @goodlines; # or just $n -= 4 but it's not good
$footer[$datarec_line] =~ s/\d+/$n/;

This User Gave Thanks to yazu For This Post:

yazu

View Public Profile for yazu

Find all posts by yazu

Shell Programming and Scripting

Copying the Header & footer Information to the Outfile.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Strip header and footer

Discussion started by: samrat dutta

2. Shell Programming and Scripting

Add header and footer with record count in footer

Discussion started by: itsranjan

3. Shell Programming and Scripting

Removing header and footer

Discussion started by: Tom Sawyer

4. Shell Programming and Scripting

Header and Footer...

Discussion started by: suresh_target

5. Shell Programming and Scripting

copying file information using awk & grep

Discussion started by: nrjrasaxena

6. UNIX for Dummies Questions & Answers

Help with the Header and Footer check

Discussion started by: Sunny_teotia

7. Shell Programming and Scripting

How to add header and footer?

Discussion started by: ken002

8. Shell Programming and Scripting

Inserting Header and footer

Discussion started by: digitalrg

9. Shell Programming and Scripting

rowcnt except Header & Footer

Discussion started by: vsubbu1000

10. Shell Programming and Scripting

remove header and footer rows

Discussion started by: seaky