The builtin split function in AWK is too slow


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting The builtin split function in AWK is too slow
# 1  
Old 05-20-2010
The builtin split function in AWK is too slow

I have a text file that contains 4 million lines, each line contains 2 fields(colon as field separator). as shown:
Code:
123:444,555,666,777,888,345
233:5444,555,666,777,888,345
623:454,585,664,773,888,345
......

Here I have to split the second field(can be up to 40,000 fields) by comma into an array for analysis, but I find the "split" function is too slow.

I tried to find an alternative to replacing the split function. I think I found one but there's still something that can not be achieved without your help. now I have this code:

Code:
awk -F: -vcmd="awk ' BEGIN { RS=\",\"} { print $1 }'" '{ print $2 | cmd; close(cmd)} ' data.txt

This code has "split"ed the second field fast enough, but I don't know how to store the splitted data in an array in the *first* AWK program as shown above.

How to solve this problem? or you have other alternatives to replacing the split function?

Thanks.

Last edited by kevintse; 05-20-2010 at 06:59 AM..
# 2  
Old 05-20-2010
Maybe, as the file is so big, it will be better to achieve the problem through a C program....
# 3  
Old 05-20-2010
Quote:
Originally Posted by kevintse
Code:
awk -F: -vcmd="awk ' BEGIN { RS=\",\"} { print $1 }'" '{ print $2 | cmd; close(cmd)} ' data.txt

Not sure if this is what you want but it's weird to call awk with awk like that.
You can split the 2nd field like this:
Code:
awk -F: '{
  split($2,a,",")
  # Do your stuff with a[1], a[2] ...
}' file

# 4  
Old 05-20-2010
I used the split function, but it was too slow, and that was why I raised this thread.
# 5  
Old 05-20-2010
What are you trying to achieve? What is the desired output from your input file?

Have you tried something with the split function above?
# 6  
Old 05-20-2010
What is your desired final output? You are not using awk to your advantage.
# 7  
Old 05-20-2010
Quote:
Originally Posted by kevintse
I have a text file that contains 4 million lines, each line contains 2 fields(colon as field separator).
....
Here I have to split the second field(can be up to 40,000 fields) by comma into an array for analysis,
Code:
print $2 | cmd; close(cmd)} ' data.txt

.....
How to solve this problem? or you have other alternatives to replacing the split function?
Calling the external command will be to slow (4,000,000 x 40,000 = 160,000,000,000 calls).
Please answer Franklin52 previous question.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

2. Shell Programming and Scripting

Perl split function

my @d =split('\|', $_); west|ACH|3|Y|LuV|N||N|| Qt|UWST|57|Y|LSV|Y|Bng|N|KT| It Returns d as 8 for First Line, and 9 as for Second Line . I want to Process Both the Files, How to Handle It. (3 Replies)
Discussion started by: vishwakar
3 Replies

3. Shell Programming and Scripting

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (1 Reply)
Discussion started by: castle
1 Replies

4. Homework & Coursework Questions

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (0 Replies)
Discussion started by: castle
0 Replies

5. Homework & Coursework Questions

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (1 Reply)
Discussion started by: castle
1 Replies

6. Shell Programming and Scripting

Use split function in perl

Hello, if i have file like this: 010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010000890306946317387 05306977313623 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000... (5 Replies)
Discussion started by: chriss_58
5 Replies

7. Shell Programming and Scripting

awk - split function

Hi, I have some output in the form of: #output: abc123 def567 hij890 ghi324 the above is in one column, stored in the variable x ( and if you wana know about x... x=sprintf(tolower(substr(someArray,1,1)substr(userArray,3,1)substr(userArray,2,1))) when i simply print x (print x) I get... (7 Replies)
Discussion started by: fusionX
7 Replies

8. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies

9. Shell Programming and Scripting

perl split function

$mystring = "name:blk:house::"; print "$mystring\n"; @s_format = split(/:/, $mystring); for ($i=0; $i <= $#s_format; $i++) { print "index is $i,field is $s_format"; print "\n"; } $size = $#s_format + 1; print "total size of array is $size\n"; i am expecting my size to be 5, why is it... (5 Replies)
Discussion started by: new2ss
5 Replies

10. UNIX for Dummies Questions & Answers

split function

Hi all! I am relatively new to UNIX staff, and I have come across a problem: I have a big directory, which contains 100 smaller ones. Each of the 100 contains a file ending in .txt , so there are 100 files ending in .txt I want to split each of the 100 files in smaller ones, which will contain... (4 Replies)
Discussion started by: ktsirig
4 Replies
Login or Register to Ask a Question