The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Finding out how long a script runs for and exit reason. daydreamer Shell Programming and Scripting 2 01-28-2009 05:32 PM
Free Linux Memory by Dropping Caches Neo Linux 0 11-29-2008 11:29 AM
Sed command dropping last record in File bheeke Shell Programming and Scripting 5 09-11-2008 04:41 PM
why my script stopped- any reason(urgent please) krishna9 Shell Programming and Scripting 1 05-21-2008 12:55 PM
strintercept dropping message on unixware kapilverma_udr UNIX for Advanced & Expert Users 2 05-31-2005 05:47 AM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 10-30-2009
mkastin mkastin is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 47
Dropping Records for unknown reason in awk script

Hi,

I have written the following it is pretty sloppy but I don't see any reason why I should be losing 54 records from a 3.5 million line file after using it.

What I am doing:

I have a 3.5 million record file with about 80,000 records need a correction. They are missing the last data from an append because they didn't have a match. I need to insert defaulted data on these records. My script worked at intended, however I have 54 less output records than input records and I don't know why they were dropped.



Code:
#!/bin/ksh

myFile="${1}"
myOutput="${2}"

awk '{
 match_flag=substr($0,63,2);
 if (NR == 1) insert_data=substr($0,41,22);
 if (match_flag == "  ") {strt=substr($0,1,40); print strt insert_data "\ \ \ \ \ \ \ \ \ \ \ NM\ X";}
else print $0;}' "${myFile}" >> "${myOutput}"

Basically what I am doing is appending a long string a data to any records that are missing a value in position 3064-3065.

Since this file is soo large I can't really provide sample data but I'll attempt to reproduce a short version below.


Code:

INPUT:
0001  Ronald   McDonald  01 H81 0001256 0100111               V VEEEFKFS SP X
0002  Elmo     St. Elmo  02 H82 0089621  001  10 11 01 1      0000WWDFCWWSP X
0003  Cookie   Monster   01 H81 0887141    1  .  0   0  .  1  BBB000 QWFJSP X
0004  Tfer     Harris    04 H84 0985512 0000000000000000000000BBE00122933NM X
0005  Oscar    Grouche   03 H83 0364471                   110.VVMWEWGODWFDA X
0006  Dumb     Name      02 H82 0000233   111 00 1111 00000000F23202233FFDA X
0007  Butter   Face      04 H84 0014666 1111111111111111111111M012291122FDA X
0008  Ford     F150      01 H81 0000001 00111 110 110  0011 ..S1102234SSMSP X
0009  Bar      Foo       03 H83 7741668 0 1 0 1 0 1 0 1 0 1 0 P019441MEWEDA X
0010  ChoCho   Train     04 H84 0014669 1111111111111111111111POWA1224023OB X
0011  Stone    Stone     04 H84 0014566 1111111111111111111111M12301MANWEOB X
0012  Problem  Record    04 H84 0000000 

OUTPUT:
0001  Ronald   McDonald  01 H81 0001256 0100111               V VEEEFKFS SP X
0002  Elmo     St. Elmo  02 H82 0089621  001  10 11 01 1      0000WWDFCWWSP X
0003  Cookie   Monster   01 H81 0887141    1  .  0   0  .  1  BBB000 QWFJSP X
0004  Tfer     Harris    04 H84 0985512 0000000000000000000000BBE00122933NM X
0005  Oscar    Grouche   03 H83 0364471                   110.VVMWEWGODWFDA X
0006  Dumb     Name      02 H82 0000233   111 00 1111 00000000F23202233FFDA X
0007  Butter   Face      04 H84 0014666 1111111111111111111111M012291122FDA X
0008  Ford     F150      01 H81 0000001 00111 110 110  0011 ..S1102234SSMSP X
0009  Bar      Foo       03 H83 7741668 0 1 0 1 0 1 0 1 0 1 0 P019441MEWEDA X
0010  ChoCho   Train     04 H84 0014669 1111111111111111111111POWA1224023OB X
0011  Stone    Stone     04 H84 0014566 1111111111111111111111M12301MANWEOB X
0012  Problem  Record    04 H84 0000000 0000000000000000000000           NM X

File is fixed length no delimiters.

Last edited by mkastin; 4 Weeks Ago at 11:28 AM.. Reason: Fixing all examples and adujusting code to fit examples properly.
  #2 (permalink)  
Old 4 Weeks Ago
mkastin mkastin is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 47
Please Help!
  #3 (permalink)  
Old 4 Weeks Ago
steadyonabix steadyonabix is online now
Registered User
  
 

Join Date: Oct 2009
Location: UK
Posts: 185
It would be helpful if you rewrote your example code to work on the sample input you provide. At the moment there is no way of knowing what you are expecting in position 3064. Although your assumption that it is two empty spaces may be at the root of your problem.

I also don't understand your print statement: -


Code:
print strt insert_data "\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0NM\ X"

When I try: -


Code:
nawk ' BEGIN{
  print "\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0NM\ X"
} '

I get: -


Code:
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0NM\ X

as my output not: -


Code:
0000000000000000000000NMX

My advise is scale down your example awk to work with your sample file and maybe someone will reply. Simply reposting the same request without changing it at all seems to be getting you nowhere.

Good luck
  #4 (permalink)  
Old 4 Weeks Ago
mkastin mkastin is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 47
Haha, wow, just realized how horrible my question was. Okay, I adjusted everything and it should hopefully be clearer now.


Code:
$ awk ' BEGIN{
  print "\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0NM\ X"
} '
awk: cmd. line:1: warning: escape sequence `\ ' treated as plain ` '
                                    0NM X

This statement works fine for me, although the escape sequence isn't necessary.
  #5 (permalink)  
Old 4 Weeks Ago
binlib binlib is offline
Registered User
  
 

Join Date: Aug 2009
Location: New Jersey
Posts: 69
Is there a pattern for the missing records, e.g. at the end?
Since your output format is the same as the input, do

Code:
cmp -l infile outfile

Look for difference that doesn't look like your intended one. The expected difference is you replace blanks of input with fixed values on the output. Try to spot visually (or mechanically) the unintended differences.
  #6 (permalink)  
Old 4 Weeks Ago
steadyonabix steadyonabix is online now
Registered User
  
 

Join Date: Oct 2009
Location: UK
Posts: 185
Another approach is to diff the input and output files and redirect the differences to a file. Then open the file and look to see why the matches in your awk fail for those lines. You can go to the character postions and confirm if the patterns you are trying to match are what you expect.Good luck
  #7 (permalink)  
Old 4 Weeks Ago
mkastin mkastin is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 47
Quote:
Originally Posted by binlib View Post
Is there a pattern for the missing records, e.g. at the end?
Since your output format is the same as the input, do

Code:
cmp -l infile outfile

Look for difference that doesn't look like your intended one. The expected difference is you replace blanks of input with fixed values on the output. Try to spot visually (or mechanically) the unintended differences.
I ran a diff on the files and I got over 160,000 lines returned I couldn't tell from this what lines went missing or if there was a discernible pattern to them. What I could tell from the diff was that appending the data onto the records I wanted to did work. I don't know if some of these records disappeared or if it was other fully intact lines.
Reply

Bookmarks

Tags
awk, ksh

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:13 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0