The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to split pipe delimited file njgirl Shell Programming and Scripting 4 06-18-2008 02:15 PM
Split a file with no pattern -- Split, Csplit, Awk madhunk UNIX for Dummies Questions & Answers 10 12-17-2007 09:57 AM
To split a string to obtain the words delimited by whitespaces Sudhakar333 Shell Programming and Scripting 4 08-06-2007 11:26 AM
Converting a Delimited File to Fixed width file raghavan.aero Shell Programming and Scripting 2 06-06-2007 11:44 AM
Converting Tab delimited file to Comma delimited file in Unix charan81 Shell Programming and Scripting 22 01-20-2006 06:24 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 05-18-2006
Registered User
 

Join Date: Nov 2005
Posts: 91
Fast way to split a tab delimited file

I have searched the forum and tried different options. One of the options work but is very slow. The file has millions and millions of records.

It is a TAB delimited file which contains two types of records. Metadata and Detail records.

M PARTNER 8 LAST_BOOKED_DATE D YYYYMMDD
M PARTNER 8 TRIPS_YTD A 11 TRIPS_TOTAL
D NAME FIRST LAST 209 N SANBORN AVE
D NAME FIRST LAST 6997 COUNTY ROAD D

I need to split the file into two files by looking at the first character. All records that start with 'M' go into one file and all records that start with 'D' go into another file.

The following code works but it is too slow...Is there any fast way of accomplishing it?


#!/usr/bin/ksh

while read line
do
char=`echo "$line" | cut -c1`
if [ "$char" = "M" ]; then
echo "$line" >> M.txt
else
echo "$line" >> D.txt
fi
done < head10000.out

exit 0


Any help would be appreciated.

Thank You,
Madhu
Reply With Quote
Forum Sponsor
  #2  
Old 05-18-2006
Registered User
 

Join Date: Apr 2002
Location: Chesterfield, UK
Posts: 124
'Awk' will be the quick way to do this. Unfortunately I'm not an expert in the syntax, but from experience its much faster that a standard shell script.

Cheers
Helen
Reply With Quote
  #3  
Old 05-18-2006
Registered User
 

Join Date: Nov 2005
Posts: 91
Thank you Helen...

I believe something like this would work..

awk -v logfile=${1:-"stdin"} '{ print > logfile"-"$1 }' "$1"

But it is throwing an error...
awk: syntax error near line 1
awk: bailing out near line 1


Any awk experts out there to resolve this situation.
Reply With Quote
  #4  
Old 05-18-2006
Registered User
 

Join Date: May 2006
Posts: 3
How fast does it need to be ? Is a simple grep faster than looping within a script ?

grep "^M" head10000.out > M.txt
grep "^D" head10000.out > D.txt
Reply With Quote
  #5  
Old 05-18-2006
Registered User
 

Join Date: Jan 2005
Posts: 682
Code:
awk '{
    if ( $0 ~ /^M/) print >"M.txt"
    else print >"D.txt"
}' head10000.out
Reply With Quote
  #6  
Old 05-18-2006
Registered User
 

Join Date: Nov 2005
Posts: 91
Thank you very much....

There are 13019984 records in the file.

5 M records and the rest of them are D records.

It took 10-11 mins to run the awk program...Is this a good standard?

Thanks again for all the help!
Reply With Quote
  #7  
Old 05-18-2006
Registered User
 

Join Date: Jan 2005
Posts: 682
21,700 records per second seems good to me.

You may be able to split the work into two tasks and improve performance.
Code:
awk '/^M/' head10000.out > M.txt &
awk '/^D/' head10000.out > D.txt &
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 03:45 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0