Sponsored Content
Top Forums Shell Programming and Scripting The builtin split function in AWK is too slow Post 302423102 by danmero on Thursday 20th of May 2010 07:21:59 AM
Old 05-20-2010
Quote:
Originally Posted by kevintse
I have a text file that contains 4 million lines, each line contains 2 fields(colon as field separator).
....
Here I have to split the second field(can be up to 40,000 fields) by comma into an array for analysis,
Code:
print $2 | cmd; close(cmd)} ' data.txt

.....
How to solve this problem? or you have other alternatives to replacing the split function?
Calling the external command will be to slow (4,000,000 x 40,000 = 160,000,000,000 calls).
Please answer Franklin52 previous question.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

split function

Hi all! I am relatively new to UNIX staff, and I have come across a problem: I have a big directory, which contains 100 smaller ones. Each of the 100 contains a file ending in .txt , so there are 100 files ending in .txt I want to split each of the 100 files in smaller ones, which will contain... (4 Replies)
Discussion started by: ktsirig
4 Replies

2. Shell Programming and Scripting

perl split function

$mystring = "name:blk:house::"; print "$mystring\n"; @s_format = split(/:/, $mystring); for ($i=0; $i <= $#s_format; $i++) { print "index is $i,field is $s_format"; print "\n"; } $size = $#s_format + 1; print "total size of array is $size\n"; i am expecting my size to be 5, why is it... (5 Replies)
Discussion started by: new2ss
5 Replies

3. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies

4. Shell Programming and Scripting

awk - split function

Hi, I have some output in the form of: #output: abc123 def567 hij890 ghi324 the above is in one column, stored in the variable x ( and if you wana know about x... x=sprintf(tolower(substr(someArray,1,1)substr(userArray,3,1)substr(userArray,2,1))) when i simply print x (print x) I get... (7 Replies)
Discussion started by: fusionX
7 Replies

5. Shell Programming and Scripting

Use split function in perl

Hello, if i have file like this: 010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010000890306946317387 05306977313623 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000... (5 Replies)
Discussion started by: chriss_58
5 Replies

6. Homework & Coursework Questions

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (1 Reply)
Discussion started by: castle
1 Replies

7. Homework & Coursework Questions

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (0 Replies)
Discussion started by: castle
0 Replies

8. Shell Programming and Scripting

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (1 Reply)
Discussion started by: castle
1 Replies

9. Shell Programming and Scripting

Perl split function

my @d =split('\|', $_); west|ACH|3|Y|LuV|N||N|| Qt|UWST|57|Y|LSV|Y|Bng|N|KT| It Returns d as 8 for First Line, and 9 as for Second Line . I want to Process Both the Files, How to Handle It. (3 Replies)
Discussion started by: vishwakar
3 Replies

10. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies
funtbl(1)							SAORD Documentation							 funtbl(1)

NAME
funtbl - extract a table from Funtools ASCII output SYNOPSIS
funtable [-c cols] [-h] [-n table] [-p prog] [-s sep] <iname> DESCRIPTION
[NB: This program has been deprecated in favor of the ASCII text processing support in funtools. You can now perform fundisp on funtools ASCII output files (specifying the table using bracket notation) to extract tables and columns.] The funtbl script extracts a specified table (without the header and comments) from a funtools ASCII output file and writes the result to the standard output. The first non-switch argument is the ASCII input file name (i.e. the saved output from funcnts, fundisp, funhist, etc.). If no filename is specified, stdin is read. The -n switch specifies which table (starting from 1) to extract. The default is to extract the first table. The -c switch is a space-delimited list of column numbers to output, e.g. -c "1 3 5" will extract the first three odd-numbered columns. The default is to extract all columns. The -s switch specifies the separator string to put between columns. The default is a single space. The -h switch specifies that column names should be added in a header line before the data is output. With- out the switch, no header is prepended. The -p program switch allows you to specify an awk-like program to run instead of the default (which is host-specific and is determined at build time). The -T switch will output the data in rdb format (i.e., with a 2-row header of column names and dashes, and with data columns separated by tabs). The -help switch will print out a message describing program usage. For example, consider the output from the following funcnts command: [sh] funcnts -sr snr.ev "ann 512 512 0 9 n=3" # source # data file: /proj/rd/data/snr.ev # arcsec/pixel: 8 # background # constant value: 0.000000 # column units # area: arcsec**2 # surf_bri: cnts/arcsec**2 # surf_err: cnts/arcsec**2 # summed background-subtracted results upto net_counts error background berror area surf_bri surf_err ---- ------------ --------- ------------ --------- --------- --------- --------- 1 147.000 12.124 0.000 0.000 1600.00 0.092 0.008 2 625.000 25.000 0.000 0.000 6976.00 0.090 0.004 3 1442.000 37.974 0.000 0.000 15936.00 0.090 0.002 # background-subtracted results reg net_counts error background berror area surf_bri surf_err ---- ------------ --------- ------------ --------- --------- --------- --------- 1 147.000 12.124 0.000 0.000 1600.00 0.092 0.008 2 478.000 21.863 0.000 0.000 5376.00 0.089 0.004 3 817.000 28.583 0.000 0.000 8960.00 0.091 0.003 # the following source and background components were used: source_region(s) ---------------- ann 512 512 0 9 n=3 reg counts pixels sumcnts sumpix ---- ------------ --------- ------------ --------- 1 147.000 25 147.000 25 2 478.000 84 625.000 109 3 817.000 140 1442.000 249 There are four tables in this output. To extract the last one, you can execute: [sh] funcnts -s snr.ev "ann 512 512 0 9 n=3" | funtbl -n 4 1 147.000 25 147.000 25 2 478.000 84 625.000 109 3 817.000 140 1442.000 249 Note that the output has been re-formatted so that only a single space separates each column, with no extraneous header or comment informa- tion. To extract only columns 1,2, and 4 from the last example (but with a header prepended and tabs between columns), you can execute: [sh] funcnts -s snr.ev "ann 512 512 0 9 n=3" | funtbl -c "1 2 4" -h -n 4 -s " " #reg counts sumcnts 1 147.000 147.000 2 478.000 625.000 3 817.000 1442.000 Of course, if the output has previously been saved in a file named foo.out, the same result can be obtained by executing: [sh] funtbl -c "1 2 4" -h -n 4 -s " " foo.out #reg counts sumcnts 1 147.000 147.000 2 478.000 625.000 3 817.000 1442.000 SEE ALSO
See funtools(7) for a list of Funtools help pages version 1.4.2 January 2, 2008 funtbl(1)
All times are GMT -4. The time now is 05:32 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy