The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Removing comma after 3rd column buddyme UNIX for Dummies Questions & Answers 13 03-17-2008 07:44 AM
C Headers biosdos High Level Programming 0 01-22-2006 11:48 AM
kernel-headers rpm Negm Linux 2 04-05-2005 04:40 AM
removing a column from list jxh461 Shell Programming and Scripting 3 10-09-2002 12:20 PM
removing trailing spaces of a particular column in a file rooh UNIX for Dummies Questions & Answers 2 01-12-2002 08:34 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 01-29-2008
Registered User
 

Join Date: Jan 2008
Posts: 14
Removing Headers and a Column

I have a text file in unix with a layout like this

Column 1 - 1-12
Column 2 - 13-39
Column 3 - 40-58
Column 4 - 59-85
Column 5 - 86-120
Columbn 6 - 121-131

The file also has a header on the first 6 lines of each page. Each page is 51 lines long. So I want to remove the header from each page first off and rewrite the file (which I have been able to figure out for the most part, it just isn't very clean yet)

All I have been able to do is rewrite the header with blank spaces, which just gives me a mess in the first place.

But here is the ultimate goal. Remove the first 6 lines of each page and remove column 4 from the entire report. Rewrite the report with a new header and just put all of the data in a new report, excluding column 4.

Honestly I haven't really done anything quite this complex (at least it seems complex to me), so I am not really sure where to get started. The most I have really done is just rewriting strings in a text file or removing specific words.

Any help would be appreciated with this.

One thing I forgot to mention and I am not even sure that this is possible. At the very end of the report is a totals section. It has a specific word identifying where it starts and i'd need to reprint from where it starts down at the end of the new report. I am not even sure if this is possible because removing column 4 would obviously cut into that section. So it seems the only way to save it would be to write that one area into a new file and appending it at the end of new report once it has been completed. I understand the concept, the matter is figuring out how to do it.

Thinking about it a little more, if it was possible to leave the headers and just ignore removing column 4 from that section on each page, that would work as well. I actually do not mind the headers there and it may create printing problems if I do not keep each page 51 lines long. I imagine i'll deal with it but if there was another way I am sure I could handle that as well. Also this may work for the final report at the very end. Just telling it to ignore the last 12 lines of the report, somehow. Just a thought.


Thank you

Last edited by DerangedNick; 01-29-2008 at 04:04 PM. Reason: Forgot something
Reply With Quote
Forum Sponsor
  #2  
Old 01-29-2008
Smiling Dragon's Avatar
Disorganised User
 
Join Date: Nov 2007
Location: New Zealand
Posts: 734
Post

Code:
#!/usr/bin/perl -w
$PAGESIZE=51;
$HEADERSIZE=6;
$linenumber=0;
$intotals=0;
while (<>) {
  $linenumber++;
  if (/^whatever line indicates the start of the "totals" section$/) {
    $intotals=1;
  }
  if ($intotals) {
    print $_;
  } elsif ($linenumber % $PAGESIZE >= $HEADERSIZE) {
    if (/^(.{58}).{27}(.*)$/) {
      print "$1$2\n";
    } elsif (/^(.{58}).{1-27}$) {
      print $1\n";
    } else {
      print $_;
    }
  }
}
Untested and you'll have to replace "whatever line indicates the start of the "totals" section" with something sensible.

Last edited by Smiling Dragon; 01-29-2008 at 06:26 PM. Reason: Fixed a few bugs
Reply With Quote
  #3  
Old 01-29-2008
Registered User
 

Join Date: Jan 2008
Posts: 14
I am currently looking at trying to use the script you provided. However my knowledge of running this against the file is rather slim since most of the commands I have run in the past do not call a script into it. If you wouldn't mind providing some more information on how to get this to run against the file i'd appreciate it. In the mean time I will continue messing with it to see if I can get anything. Thanks for the help. (Ignore above)


I seem to have gotten it to run ok, but i am getting these errors currently.


syntax error at testscript line 12, near "<>"
syntax error at testscript line 15, near "} else"
Execution of testscript aborted due to compilation errors.

Last edited by DerangedNick; 01-29-2008 at 05:10 PM. Reason: Running
Reply With Quote
  #4  
Old 01-29-2008
Smiling Dragon's Avatar
Disorganised User
 
Join Date: Nov 2007
Location: New Zealand
Posts: 734
Oops, my bad, have fixed it in the original post
(change the <> to !=)
Reply With Quote
  #5  
Old 01-29-2008
Registered User
 

Join Date: Jan 2008
Posts: 14
The script ran through the file and gave me a output, however it didn't remove the 4th column. Everything seems to be there that was there originally but it is scattered all over the place instead of in columns. Not really sure.

What part of the request was the script addressing? I will keep playing with it for the time being to see if I can get different results. Thanks for the help

Looking over the file again it does seem to have removed something but i am not quite sure at which point yet. Will
update once I know. I do know that alot of the data that I wanted removed is still in place however.

::Update::
Ok what it appears to be doing is once it removes the columns on the first line, it is then pulling the second line up to the first line and going to the second line and removing that same section on the second line and so on down the entire document. The totals section appears to be in tact, however it did lose its formating so it is rather hard to tell since it is scattered.

Thanks

Last edited by DerangedNick; 01-29-2008 at 05:25 PM. Reason: Findings
Reply With Quote
  #6  
Old 01-29-2008
Smiling Dragon's Avatar
Disorganised User
 
Join Date: Nov 2007
Location: New Zealand
Posts: 734
Yeah, I had some bugs :/ It should do everything you are after (I hope)

Fixed more bugs in the orginal:
Added \n to the print $1$2 line
Replaced the / symbol in the pagebreak calculation with % (modulo arithmatic)

Edit: Woops, didn't read your request right - I've been removing the first line of each page, not the first 6... Will fix...
Reply With Quote
  #7  
Old 01-29-2008
Registered User
 

Join Date: Jan 2008
Posts: 14
Ok this one looks alot better. Totals are in tact however it needs to start cutting off 1 character earlier (which I think I may be able to change).

The problem however now is that some lines do not have data at the beginning of the lines, but column 4 does have data in it (so 1,2,3,5,6 are blank). This is still being printed it is just moving over into what was column 5.

The next part is that it is just cutting sections of the header out, i don't know if this can be fixed or not.

I will try to fix the width issue. I am not sure where to start on getting it to cut out the other parts of column 4 though

Thanks alot for all the help.

I'd rather not remove the first 6 of the lines if we can just ignore those lines somehow? They all start with the same thing (except there are multiple starts to each line of the header.)


This is how the first 6 lines of each page look
Line 1: (this has a square control character) I imagine it is used as the page sep
Line 2: XXXXXX (always the same, different word obviously)
Line 3: ALL
Line 4: (blank line)
Line 5: ACCOUNT (4 blank spaces before this)
Line 6: --------- (4 blank spaces before this)

Line 7 is blank and data starts under that. That is how the header begins on each page. If it was possible to ignore that the entire way down that would be ideal.


Last error:

Name "main::HEADERSIZE" used only once: possible typo at testfile line 3.

Last edited by DerangedNick; 01-29-2008 at 05:53 PM. Reason: other thoughts
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 06:55 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0