Parsing 286 length Character string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing 286 length Character string
# 1  
Old 04-21-2009
Parsing 286 length Character string

Hi Friends,

I have .txt file which has 13000 records.
Each record is 278 character long.

I am using below code to extract the string and it takes almost 10 minutes.
Any suggestion please.

cat filename.txt|while read line
do

f1=`echo $line|awk '{print substr($1,1,9)}'`
f2=`echo $line|awk '{print substr($1,10,20)}'`
f3=`echo $line|awk '{print substr($1,30,50)}'`
f4=`echo $line|awk '{print substr($1,80,10)}'`
f5=`echo $line|awk '{print substr($1,90,50)}'`
f6=`echo $line|awk '{print substr($1,140,10)}'`
f7=`echo $line|awk '{print substr($1,150,50)}'`
f8=`echo $line|awk '{print substr($1,200,10)}'`
f9=`echo $line|awk '{print substr($1,210,50)}'`
f10=`echo $line|awk '{print substr($1,260,10)}'`
f11=`echo $line|awk '{print substr($1,270,8)}'`
f12=`echo $line|awk '{print substr($1,278,8)}'`

s1=`echo $f1"|"$f2"|"$f3"|"$f4"|"$f5"|"`
s2=`echo $f6"|"$f7"|"$f8"|"`
s3=`echo $f9"|"$f10"|"`
s4=`echo $f11"|"$f12`

echo $s1$s2$s3$s4 >> FinalResult.txt
done
# 2  
Old 04-21-2009
nawk -f fieldwidth.awk filename.txt > FinalResul.txt

fieldwidth.awk:
Code:
function setFieldsByWidth(   i,n,FWS,start,copyd0) {
  # Licensed under GPL Peter S Tillier, 2003
  # NB corrupts $0
  copyd0 = $0                             # make copy of $0 to work on
  if (length(FIELDWIDTHS) == 0) {
    print "You need to set the width of the fields that you require" > "/dev/stderr"
    print "in the variable FIELDWIDTHS (NB: Upper case!)" > "/dev/stderr"
    exit(1)
  }

  if (!match(FIELDWIDTHS,/^[0-9 ]+$/)) {
    print "The variable FIELDWIDTHS must contain digits, separated" > "/dev/stderr"
    print "by spaces." > "/dev/stderr"
    exit(1)
  }

  n = split(FIELDWIDTHS,FWS)

  if (n == 1) {
    print "Warning: FIELDWIDTHS contains only one field width." > "/dev/stderr"
    print "Attempting to continue." > "/dev/stderr"
  }

  start = 1
  for (i=1; i <= n; i++) {
    $i = substr(copyd0,start,FWS[i])
    start = start + FWS[i]
  }
}

#Note that the "/dev/stderr" entries in some lines have wrapped.

#I then call setFieldsByWidth() in my main awk code as follows:
BEGIN {
  #FIELDWIDTHS="7 6 5 4 3 2 1" # for example
  # adjust the FIELDWIDTHS values as you see fit.
  FIELDWIDTHS="9 21 51 11 51 11 51 11 51 11 9 9" # for example
  OFS="|"
}
!/^[  ]*$/ {
  saveDollarZero = $0 # if you want it later
  setFieldsByWidth()
  # now we can manipulate $0, NF and $1 .. $NF as we wish
  # print $0 OFS
  print $1,$2,$3,$4,$5,$6,$7,$9,$10,$11,$12
  next
}

# 3  
Old 04-21-2009
A simpler Smilie method is to create script parse.awk

Code:
{
f1=substr($1,1,9);
f2=substr($1,10,20);
f3=substr($1,30,50);
f4=substr($1,80,10);
f5=substr($1,90,50);
f6=substr($1,140,10);
f7=substr($1,150,50);
f8=substr($1,200,10);
f9=substr($1,210,50);
f10=substr($1,260,10);
f11=substr($1,270,8);
f12=substr($1,278,8);

OFS="|";
print f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12;
print "\n";
}

Then run

Code:
awk -f parse.awk filename.txt > FinalResult.txt

I believe that your original code is taking a long time as each backtick, echo and awk is spawning a new process
# 4  
Old 04-22-2009
Code:
awk '{
 print  substr($1,1,9) "|" \
        substr($1,10,20) "|" \
        substr($1,30,50) "|" \
        substr($1,80,10) "|" \
        substr($1,90,50) "|" \
        substr($1,140,10) "|" \
        substr($1,150,50) "|" \
        substr($1,200,10) "|" \
        substr($1,210,50) "|" \
        substr($1,260,10) "|" \
        substr($1,270,8) "|" \
        substr($1,278,8)
}' filename.txt > FinalResult.txt

# 5  
Old 04-22-2009
Hi ppl..

what if I have the line like this
A BCD

which indicates that first field is f1=A, f2= f3=BCD
i.e. second field has 6 blank characters. So now if I use the above script, I am not able to get the fields in that case.
Can you please suggest in that case how to go about it?

Cheers Amit
# 6  
Old 04-22-2009

Use $0 instead of $1 (which is what I should have used):

Code:
awk '{
 print  substr($0,1,9) "|" \
        substr($0,10,20) "|" \
        substr($0,30,50) "|" \
        substr($0,80,10) "|" \
        substr($0,90,50) "|" \
        substr($0,140,10) "|" \
        substr($0,150,50) "|" \
        substr($0,200,10) "|" \
        substr($0,210,50) "|" \
        substr($0,260,10) "|" \
        substr($0,270,8) "|" \
        substr($0,278,8)
}' filename.txt > FinalResult.txt

# 7  
Old 04-22-2009
Thank you all for your reply.

I used suggestion provided by cfajohnson and now it takes only 20 secconds to parse the 800,000 records.

Thank you very much,
Prashant
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Add string based on character length

Good day, I am a newbie here and thanks for accepting me I have a task to modify input data where my input data looks like 123|34567|CHINE 1|23|INDIA 34512|21|USA 104|901|INDIASee that my input has two columns with different character length but max length is 5 and minimum length is 0 which... (1 Reply)
Discussion started by: fastlearner
1 Replies

2. Shell Programming and Scripting

Remove 3rd character from the end of a random-length string

Hi, I hope someone can share there scripting fu on my problem, I would like to delete the 3rd character from a random length of string starting from the end Example Output Hope you can help me.. Thanks in advance.. (3 Replies)
Discussion started by: jao_madn
3 Replies

3. Shell Programming and Scripting

Parsing a variable length file

Hi I am new to shell scripting. I need to parse a file which contains the header and detail records and split into n of file based on dept ID, for ex. INPUT FILE: DEPT ID: 1 EMPNAME: XYZ EMPAddress: XYZZZ DEPT ID: 2 EMPNAME: ABC EMPAddress: ABCD DEPT ID: 1 EMPNAME: PQR EMPAddress:... (6 Replies)
Discussion started by: singhald
6 Replies

4. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

5. Programming

parsing fixed length field with yacc/bison

How to specify the token length in a yacc file? sample input format <field1,data type ans,fixed length 6> followed by <field2,data type ans,fixed length 3> Example i/p and o/p Sample Input: "ab* d2 9o" O/p : "Field1 Field2 " yacc/bison grammar: record :... (1 Reply)
Discussion started by: sungita
1 Replies

6. UNIX for Dummies Questions & Answers

Read a string with leading spaces and find the length of the string

HI In my script, i am reading the input from the user and want to find the length of the string. The input may contain leading spaces. Right now, when leading spaces are there, they are not counted. Kindly help me My script is like below. I am using the ksh. #!/usr/bin/ksh echo... (2 Replies)
Discussion started by: dayamatrix
2 Replies

7. Shell Programming and Scripting

read string, check string length and cut

Hello All, Plz help me with: I have a csv file with data separated by ',' and optionally enclosed by "". I want to check each of these values to see if they exceed the specified string length, and if they do I want to cut just that value to the max length allowed and keep the csv format as it... (9 Replies)
Discussion started by: ozzy80
9 Replies

8. Shell Programming and Scripting

sed problem - replacement string should be same length as matching string.

Hi guys, I hope you can help me with my problem. I have a text file that contains lines like this: 78 ANGELO -809.05 79 ANGELO2 -5,000.06 I need to find all occurences of amounts that are negative and replace them with x's 78 ANGELO xxxxxxx 79... (4 Replies)
Discussion started by: amangeles
4 Replies

9. UNIX for Dummies Questions & Answers

Parsing a variable length record

I need to pick a field out of a variable record - the field is always found 4 fields after a certain text string, but it can be on any line of the record and in any position across the record on a line. I have had no luck through any of the Unix editors being able to cut a field that isn't always... (17 Replies)
Discussion started by: Barb
17 Replies

10. Shell Programming and Scripting

Parsing data and retaining the full length of variable

Here's is an example of what I want to do: var1="Horse " var2="Cat " var3="Fish " for animals in "$var1" "$var2" "$var3" do set $animals pet=$1 ## Ok, now I want to get the values of $pet, but ## I want to retain the full length it was... (3 Replies)
Discussion started by: app4dxh
3 Replies
Login or Register to Ask a Question