Fast way to split a tab delimited file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fast way to split a tab delimited file
# 1  
Old 05-18-2006
Fast way to split a tab delimited file

I have searched the forum and tried different options. One of the options work but is very slow. The file has millions and millions of records.

It is a TAB delimited file which contains two types of records. Metadata and Detail records.

M PARTNER 8 LAST_BOOKED_DATE D YYYYMMDD
M PARTNER 8 TRIPS_YTD A 11 TRIPS_TOTAL
D NAME FIRST LAST 209 N SANBORN AVE
D NAME FIRST LAST 6997 COUNTY ROAD D

I need to split the file into two files by looking at the first character. All records that start with 'M' go into one file and all records that start with 'D' go into another file.

The following code works but it is too slow...Is there any fast way of accomplishing it?


#!/usr/bin/ksh

while read line
do
char=`echo "$line" | cut -c1`
if [ "$char" = "M" ]; then
echo "$line" >> M.txt
else
echo "$line" >> D.txt
fi
done < head10000.out

exit 0


Any help would be appreciated.

Thank You,
Madhu
# 2  
Old 05-18-2006
'Awk' will be the quick way to do this. Unfortunately I'm not an expert in the syntax, but from experience its much faster that a standard shell script.

Cheers
Helen
# 3  
Old 05-18-2006
Thank you Helen...

I believe something like this would work..

awk -v logfile=${1:-"stdin"} '{ print > logfile"-"$1 }' "$1"

But it is throwing an error...
awk: syntax error near line 1
awk: bailing out near line 1


Any awk experts out there to resolve this situation.
# 4  
Old 05-18-2006
How fast does it need to be ? Is a simple grep faster than looping within a script ?

grep "^M" head10000.out > M.txt
grep "^D" head10000.out > D.txt
# 5  
Old 05-18-2006
Code:
awk '{
    if ( $0 ~ /^M/) print >"M.txt"
    else print >"D.txt"
}' head10000.out

# 6  
Old 05-18-2006
Thank you very much....

There are 13019984 records in the file.

5 M records and the rest of them are D records.

It took 10-11 mins to run the awk program...Is this a good standard?

Thanks again for all the help!
# 7  
Old 05-18-2006
21,700 records per second seems good to me.

You may be able to split the work into two tasks and improve performance.
Code:
awk '/^M/' head10000.out > M.txt &
awk '/^D/' head10000.out > D.txt &

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies

2. Shell Programming and Scripting

Tab Delimited file in loop

Hi, I have requirement to create tab delimited file with values coming from variables. File will contain only two columns separated by tab. Header will be added once. Values will be keep adding upon the script run. If values already exists then values will be replaced. I have done so... (1 Reply)
Discussion started by: sukhdip
1 Replies

3. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

4. Shell Programming and Scripting

How to make tab delimited file to space delimited?

Hi How to make tab delimited file to space delimited? in put file: ABC kgy jkh ghj ash kjl o/p file: ABC kgy jkh ghj ash kjl Use code tags, thanks. (1 Reply)
Discussion started by: jagdishrout
1 Replies

5. Shell Programming and Scripting

Help with converting Pipe delimited file to Tab Delimited

I have a file which was pipe delimited, I need to make it tab delimited. I tried with sed but no use cat file | sed 's/|//t/g' The above command substituted "/t" not tab in the place of pipe. Sample file: abc|123|2012-01-30|2012-04-28|xyz have to convert to: abc 123... (6 Replies)
Discussion started by: karumudi7
6 Replies

6. UNIX for Dummies Questions & Answers

tab delimited file that is not tab delimited.

Hi Forum I have a tab delimited file that opens well in Openoffice calc (excel). But when I perform any operation in command line, it reads the file incorrectly. When I 'save As' the same file in office as tab delimited then it works fine. The file that I think is tab delimited is actually... (8 Replies)
Discussion started by: imlearning
8 Replies

7. UNIX for Dummies Questions & Answers

100 $1's to new tab delimited file

Hi I have 100 files each with only one column of 10 numbers that I wish to add to a new file so that I get 100 columns collected in one tab delimited file. I tried something like: foreach num (1 2 3) foreach? gawk -F '\t' '{$num=$1}1' OFS='\t' Eu9_10.2patienter/pospep_10.2patient$num >>... (5 Replies)
Discussion started by: Banni
5 Replies

8. UNIX for Dummies Questions & Answers

Converting Space delimited file to Tab delimited file

Hi all, I have a file with single white space delimited values, I want to convert them to a tab delimited file. I tried sed, tr ... but nothing is working. Thanks, Rajeevan D (16 Replies)
Discussion started by: jeevs81
16 Replies

9. Shell Programming and Scripting

Converting Tab delimited file to Comma delimited file in Unix

Hi, Can anyone let me know on how to convert a Tab delimited file to Comma delimited file in Unix Thanks!! (22 Replies)
Discussion started by: charan81
22 Replies

10. Shell Programming and Scripting

tab delimited file to commas

Hi there Just wondered if someone could help me out I have a file that has been delimited by tabs, ie field1<tab>fiield2<tab>field3 Does anybody know a command that will convert tabs to commas throughout the entire file? Note: there are a number of unpopulated fields in the file so... (6 Replies)
Discussion started by: hcclnoodles
6 Replies
Login or Register to Ask a Question