The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Parsing of file for Report Generation (String parsing and splitting) umar.shaikh Shell Programming and Scripting 8 03-02-2009 01:38 AM
parsing fixed length field with yacc/bison sungita High Level Programming 1 01-27-2009 11:27 AM
Read a string with leading spaces and find the length of the string dayamatrix UNIX for Dummies Questions & Answers 2 11-13-2008 10:08 AM
Parsing a variable length record Barb UNIX for Dummies Questions & Answers 17 10-01-2004 09:37 AM
Parsing data and retaining the full length of variable app4dxh Shell Programming and Scripting 3 11-22-2002 12:04 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-21-2009
ppat7046 ppat7046 is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 24
Parsing 286 length Character string

Hi Friends,

I have .txt file which has 13000 records.
Each record is 278 character long.

I am using below code to extract the string and it takes almost 10 minutes.
Any suggestion please.

cat filename.txt|while read line
do

f1=`echo $line|awk '{print substr($1,1,9)}'`
f2=`echo $line|awk '{print substr($1,10,20)}'`
f3=`echo $line|awk '{print substr($1,30,50)}'`
f4=`echo $line|awk '{print substr($1,80,10)}'`
f5=`echo $line|awk '{print substr($1,90,50)}'`
f6=`echo $line|awk '{print substr($1,140,10)}'`
f7=`echo $line|awk '{print substr($1,150,50)}'`
f8=`echo $line|awk '{print substr($1,200,10)}'`
f9=`echo $line|awk '{print substr($1,210,50)}'`
f10=`echo $line|awk '{print substr($1,260,10)}'`
f11=`echo $line|awk '{print substr($1,270,8)}'`
f12=`echo $line|awk '{print substr($1,278,8)}'`

s1=`echo $f1"|"$f2"|"$f3"|"$f4"|"$f5"|"`
s2=`echo $f6"|"$f7"|"$f8"|"`
s3=`echo $f9"|"$f10"|"`
s4=`echo $f11"|"$f12`

echo $s1$s2$s3$s4 >> FinalResult.txt
done
  #2 (permalink)  
Old 04-21-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,122
nawk -f fieldwidth.awk filename.txt > FinalResul.txt

fieldwidth.awk:
Code:
function setFieldsByWidth(   i,n,FWS,start,copyd0) {
  # Licensed under GPL Peter S Tillier, 2003
  # NB corrupts $0
  copyd0 = $0                             # make copy of $0 to work on
  if (length(FIELDWIDTHS) == 0) {
    print "You need to set the width of the fields that you require" > "/dev/stderr"
    print "in the variable FIELDWIDTHS (NB: Upper case!)" > "/dev/stderr"
    exit(1)
  }

  if (!match(FIELDWIDTHS,/^[0-9 ]+$/)) {
    print "The variable FIELDWIDTHS must contain digits, separated" > "/dev/stderr"
    print "by spaces." > "/dev/stderr"
    exit(1)
  }

  n = split(FIELDWIDTHS,FWS)

  if (n == 1) {
    print "Warning: FIELDWIDTHS contains only one field width." > "/dev/stderr"
    print "Attempting to continue." > "/dev/stderr"
  }

  start = 1
  for (i=1; i <= n; i++) {
    $i = substr(copyd0,start,FWS[i])
    start = start + FWS[i]
  }
}

#Note that the "/dev/stderr" entries in some lines have wrapped.

#I then call setFieldsByWidth() in my main awk code as follows:
BEGIN {
  #FIELDWIDTHS="7 6 5 4 3 2 1" # for example
  # adjust the FIELDWIDTHS values as you see fit.
  FIELDWIDTHS="9 21 51 11 51 11 51 11 51 11 9 9" # for example
  OFS="|"
}
!/^[  ]*$/ {
  saveDollarZero = $0 # if you want it later
  setFieldsByWidth()
  # now we can manipulate $0, NF and $1 .. $NF as we wish
  # print $0 OFS
  print $1,$2,$3,$4,$5,$6,$7,$9,$10,$11,$12
  next
}
  #3 (permalink)  
Old 04-21-2009
JerryHone JerryHone is offline
Registered User
  
 

Join Date: Nov 2006
Location: UK
Posts: 178
A simpler method is to create script parse.awk

Code:
{
f1=substr($1,1,9);
f2=substr($1,10,20);
f3=substr($1,30,50);
f4=substr($1,80,10);
f5=substr($1,90,50);
f6=substr($1,140,10);
f7=substr($1,150,50);
f8=substr($1,200,10);
f9=substr($1,210,50);
f10=substr($1,260,10);
f11=substr($1,270,8);
f12=substr($1,278,8);

OFS="|";
print f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12;
print "\n";
}
Then run

Code:
awk -f parse.awk filename.txt > FinalResult.txt
I believe that your original code is taking a long time as each backtick, echo and awk is spawning a new process
  #4 (permalink)  
Old 04-22-2009
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,361
Code:
awk '{
 print  substr($1,1,9) "|" \
        substr($1,10,20) "|" \
        substr($1,30,50) "|" \
        substr($1,80,10) "|" \
        substr($1,90,50) "|" \
        substr($1,140,10) "|" \
        substr($1,150,50) "|" \
        substr($1,200,10) "|" \
        substr($1,210,50) "|" \
        substr($1,260,10) "|" \
        substr($1,270,8) "|" \
        substr($1,278,8)
}' filename.txt > FinalResult.txt
  #5 (permalink)  
Old 04-22-2009
amitmathapati amitmathapati is offline
Registered User
  
 

Join Date: Apr 2009
Posts: 1
Hi ppl..

what if I have the line like this
A BCD

which indicates that first field is f1=A, f2= f3=BCD
i.e. second field has 6 blank characters. So now if I use the above script, I am not able to get the fields in that case.
Can you please suggest in that case how to go about it?

Cheers Amit
  #6 (permalink)  
Old 04-22-2009
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,361

Use $0 instead of $1 (which is what I should have used):

Code:
awk '{
 print  substr($0,1,9) "|" \
        substr($0,10,20) "|" \
        substr($0,30,50) "|" \
        substr($0,80,10) "|" \
        substr($0,90,50) "|" \
        substr($0,140,10) "|" \
        substr($0,150,50) "|" \
        substr($0,200,10) "|" \
        substr($0,210,50) "|" \
        substr($0,260,10) "|" \
        substr($0,270,8) "|" \
        substr($0,278,8)
}' filename.txt > FinalResult.txt
  #7 (permalink)  
Old 04-22-2009
ppat7046 ppat7046 is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 24
Thank you all for your reply.

I used suggestion provided by cfajohnson and now it takes only 20 secconds to parse the 800,000 records.

Thank you very much,
Prashant
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 09:58 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0