Fast way to split a tab delimited file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fast way to split a tab delimited file
# 8  
Old 05-18-2006
I believe grep would do it faster than awk. Try using johnywilkins suggestion and compare the time taken.
# 9  
Old 05-18-2006
awk seems to be faster on my system.

Test with 500 records:

Code:
#! /usr/bin/ksh

print "Single task awk"
time {
    > M.txt
    > D.txt
    nawk '{
        if ($0 ~ /^M/) print $0 >"M.txt"
        else print $0 >"D.txt"
    }' test.dat
}
ls -altr M.txt D.txt

print "Two task awk"
time {
    > M.txt
    > D.txt
    nawk '/^M/' test.dat >> M.txt &
    nawk '/^D/' test.dat >> D.txt &
    wait
}
ls -altr M.txt D.txt

print "4-way awk"
time {
    > M.txt
    > D.txt
    nawk 'NR <  250000 && /^M/' test.dat >> M.txt &
    nawk 'NR >= 250000 && /^M/' test.dat >> M.txt &
    nawk 'NR <  250000 && /^D/' test.dat >> D.txt &
    nawk 'NR >= 250000 && /^D/' test.dat >> D.txt &
    wait
}
ls -altr M.txt D.txt

print "Grep"
time {
    > M.txt
    > D.txt
    grep "^M" test.dat > M.txt &
    grep "^D" test.dat > D.txt &
    wait
}
ls -altr M.txt D.txt

results:
Code:
Single task awk

real    3m12.40s
user    0m4.69s
sys     0m9.63s
-rw-r--r--   1 ... 34770850 ... D.txt
-rw-r--r--   1 ... 46222065 ... M.txt

Two task awk

real    0m14.12s
user    0m5.93s
sys     0m1.55s
-rw-r--r--   1 ... 34770850 ... D.txt
-rw-r--r--   1 ... 46222065 ... M.txt

4-way awk

real    0m16.14s
user    0m10.52s
sys     0m2.48s
-rw-r--r--   1 ... 34770850 ... D.txt
-rw-r--r--   1 ... 46222065 ... M.txt

Grep

real    0m22.70s
user    0m1.50s
sys     0m3.24s
-rw-r--r--   1 ... 34770850 ... D.txt
-rw-r--r--   1 ... 46222065 ... M.txt

# 10  
Old 05-18-2006
Quote:
Originally Posted by tmarikle
awk seems to be faster on my system.

Test with 500 records:

Code:
#! /usr/bin/ksh

print "Single task awk"
time {
    > M.txt
    > D.txt
    nawk '{
        if ($0 ~ /^M/) print $0 >"M.txt"
        else print $0 >"D.txt"
    }' test.dat
}
ls -altr M.txt D.txt

print "Two task awk"
time {
    > M.txt
    > D.txt
    nawk '/^M/' test.dat >> M.txt &
    nawk '/^D/' test.dat >> D.txt &
    wait
}
ls -altr M.txt D.txt

print "4-way awk"
time {
    > M.txt
    > D.txt
    nawk 'NR <  250000 && /^M/' test.dat >> M.txt &
    nawk 'NR >= 250000 && /^M/' test.dat >> M.txt &
    nawk 'NR <  250000 && /^D/' test.dat >> D.txt &
    nawk 'NR >= 250000 && /^D/' test.dat >> D.txt &
    wait
}
ls -altr M.txt D.txt

print "Grep"
time {
    > M.txt
    > D.txt
    grep "^M" test.dat > M.txt &
    grep "^D" test.dat > D.txt &
    wait
}
ls -altr M.txt D.txt

results:
Code:
Single task awk

real    3m12.40s
user    0m4.69s
sys     0m9.63s
-rw-r--r--   1 ... 34770850 ... D.txt
-rw-r--r--   1 ... 46222065 ... M.txt

Two task awk

real    0m14.12s
user    0m5.93s
sys     0m1.55s
-rw-r--r--   1 ... 34770850 ... D.txt
-rw-r--r--   1 ... 46222065 ... M.txt

4-way awk

real    0m16.14s
user    0m10.52s
sys     0m2.48s-rw-r--r--   1 ... 34770850 ... D.txt
-rw-r--r--   1 ... 46222065 ... M.txt

Grep

real    0m22.70s
user    0m1.50s
sys     0m3.24s-rw-r--r--   1 ... 34770850 ... D.txt
-rw-r--r--   1 ... 46222065 ... M.txt



How do you find the time taken to execute the script?
# 11  
Old 05-18-2006
I have the results like this with the same file..This is a Sunsolaris machine..

Single task awk

real 2m21.59s
user 1m11.76s
sys 0m58.18s
-rw-r--r-- 1 develop 722 May 18 13:48 M.txt
-rw-r--r-- 1 develop 3055795862 May 18 13:50 D.txt
Two task awk

real 1m4.71s
user 1m25.91s
sys 0m33.18s
-rw-r--r-- 1 develop 3055795623 May 18 13:51 D.txt
-rw-r--r-- 1 develop 722 May 18 13:51 M.txt
4-way awk

real 1m10.57s
user 2m30.03s
sys 0m45.67s
-rw-r--r-- 1 develop 722 May 18 13:52 M.txt
-rw-r--r-- 1 develop 3055795623 May 18 13:52 D.txt
Grep

real 0m29.55s
user 0m9.62s
sys 0m30.49s
-rw-r--r-- 1 develop 722 May 18 13:53 M.txt
-rw-r--r-- 1 develop 3055795623 May 18 13:53 D.txt
# 12  
Old 05-18-2006
Your grep is faster than my Sun Solaris grep by a long shot (comparatively speaking of course).
# 13  
Old 05-18-2006
This ksh script should be faster than the original...
Code:
#! /usr/bin/ksh
exec < inputfile
IFS=""
exec 3>M.out 4>D.out
while read line ; do
        if [[ $line = M* ]] ; then
                print -u3 "$line"
        else
                print -u4 "$line"
        fi
done
exit 0

# 14  
Old 05-18-2006
Thank you very much guys...I did learn a lot today!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies

2. Shell Programming and Scripting

Tab Delimited file in loop

Hi, I have requirement to create tab delimited file with values coming from variables. File will contain only two columns separated by tab. Header will be added once. Values will be keep adding upon the script run. If values already exists then values will be replaced. I have done so... (1 Reply)
Discussion started by: sukhdip
1 Replies

3. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

4. Shell Programming and Scripting

How to make tab delimited file to space delimited?

Hi How to make tab delimited file to space delimited? in put file: ABC kgy jkh ghj ash kjl o/p file: ABC kgy jkh ghj ash kjl Use code tags, thanks. (1 Reply)
Discussion started by: jagdishrout
1 Replies

5. Shell Programming and Scripting

Help with converting Pipe delimited file to Tab Delimited

I have a file which was pipe delimited, I need to make it tab delimited. I tried with sed but no use cat file | sed 's/|//t/g' The above command substituted "/t" not tab in the place of pipe. Sample file: abc|123|2012-01-30|2012-04-28|xyz have to convert to: abc 123... (6 Replies)
Discussion started by: karumudi7
6 Replies

6. UNIX for Dummies Questions & Answers

tab delimited file that is not tab delimited.

Hi Forum I have a tab delimited file that opens well in Openoffice calc (excel). But when I perform any operation in command line, it reads the file incorrectly. When I 'save As' the same file in office as tab delimited then it works fine. The file that I think is tab delimited is actually... (8 Replies)
Discussion started by: imlearning
8 Replies

7. UNIX for Dummies Questions & Answers

100 $1's to new tab delimited file

Hi I have 100 files each with only one column of 10 numbers that I wish to add to a new file so that I get 100 columns collected in one tab delimited file. I tried something like: foreach num (1 2 3) foreach? gawk -F '\t' '{$num=$1}1' OFS='\t' Eu9_10.2patienter/pospep_10.2patient$num >>... (5 Replies)
Discussion started by: Banni
5 Replies

8. UNIX for Dummies Questions & Answers

Converting Space delimited file to Tab delimited file

Hi all, I have a file with single white space delimited values, I want to convert them to a tab delimited file. I tried sed, tr ... but nothing is working. Thanks, Rajeevan D (16 Replies)
Discussion started by: jeevs81
16 Replies

9. Shell Programming and Scripting

Converting Tab delimited file to Comma delimited file in Unix

Hi, Can anyone let me know on how to convert a Tab delimited file to Comma delimited file in Unix Thanks!! (22 Replies)
Discussion started by: charan81
22 Replies

10. Shell Programming and Scripting

tab delimited file to commas

Hi there Just wondered if someone could help me out I have a file that has been delimited by tabs, ie field1<tab>fiield2<tab>field3 Does anybody know a command that will convert tabs to commas throughout the entire file? Note: there are a number of unpopulated fields in the file so... (6 Replies)
Discussion started by: hcclnoodles
6 Replies
Login or Register to Ask a Question