![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to split pipe delimited file | njgirl | Shell Programming and Scripting | 4 | 06-18-2008 02:15 PM |
| Split a file with no pattern -- Split, Csplit, Awk | madhunk | UNIX for Dummies Questions & Answers | 10 | 12-17-2007 09:57 AM |
| To split a string to obtain the words delimited by whitespaces | Sudhakar333 | Shell Programming and Scripting | 4 | 08-06-2007 11:26 AM |
| Converting a Delimited File to Fixed width file | raghavan.aero | Shell Programming and Scripting | 2 | 06-06-2007 11:44 AM |
| Converting Tab delimited file to Comma delimited file in Unix | charan81 | Shell Programming and Scripting | 22 | 01-20-2006 06:24 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Fast way to split a tab delimited file
I have searched the forum and tried different options. One of the options work but is very slow. The file has millions and millions of records.
It is a TAB delimited file which contains two types of records. Metadata and Detail records. M PARTNER 8 LAST_BOOKED_DATE D YYYYMMDD M PARTNER 8 TRIPS_YTD A 11 TRIPS_TOTAL D NAME FIRST LAST 209 N SANBORN AVE D NAME FIRST LAST 6997 COUNTY ROAD D I need to split the file into two files by looking at the first character. All records that start with 'M' go into one file and all records that start with 'D' go into another file. The following code works but it is too slow...Is there any fast way of accomplishing it? #!/usr/bin/ksh while read line do char=`echo "$line" | cut -c1` if [ "$char" = "M" ]; then echo "$line" >> M.txt else echo "$line" >> D.txt fi done < head10000.out exit 0 Any help would be appreciated. Thank You, Madhu |
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
'Awk' will be the quick way to do this. Unfortunately I'm not an expert in the syntax, but from experience its much faster that a standard shell script.
Cheers Helen |
|
#3
|
|||
|
|||
|
Thank you Helen...
I believe something like this would work.. awk -v logfile=${1:-"stdin"} '{ print > logfile"-"$1 }' "$1" But it is throwing an error... awk: syntax error near line 1 awk: bailing out near line 1 Any awk experts out there to resolve this situation. |
|
#4
|
|||
|
|||
|
How fast does it need to be ? Is a simple grep faster than looping within a script ?
grep "^M" head10000.out > M.txt grep "^D" head10000.out > D.txt |
|
#5
|
|||
|
|||
|
Code:
awk '{
if ( $0 ~ /^M/) print >"M.txt"
else print >"D.txt"
}' head10000.out
|
|
#6
|
|||
|
|||
|
Thank you very much....
There are 13019984 records in the file. 5 M records and the rest of them are D records. It took 10-11 mins to run the awk program...Is this a good standard? Thanks again for all the help! |
|
#7
|
|||
|
|||
|
21,700 records per second seems good to me.
You may be able to split the work into two tasks and improve performance. Code:
awk '/^M/' head10000.out > M.txt & awk '/^D/' head10000.out > D.txt & |
|||
| Google The UNIX and Linux Forums |