Sponsored Content
Top Forums Shell Programming and Scripting The builtin split function in AWK is too slow Post 302423798 by alister on Saturday 22nd of May 2010 12:56:43 PM
Old 05-22-2010
It seems to me that both files contain the same information, though in different formats. A simpler solution would be to use a different algorithm, which builds an internal list of book-pairs in one pass using one data file:
Code:
#!/bin/sh

awk -F'[:,]' '
    { for(i=2;i<=NF;i++) for(j=2;j<=NF;j++) if (i!=j) a[$i" "$j]++}
    END { for (k in a) print k" "a[k] }' "$1" \
| sort -k1,1 -k3,3nr -k2,2 \
| awk '{b=$1; if (b!=ob) {if (NR>1) print s; s=$1":"$2; ob=b; next}; s=s","$2} END {print s}'

Test run:
Code:
$ cat data
list1:A,B,C
list2:A,B,C,F,H
list3:A,B,D
list4:A,B,F
list5:H,F
list6:C
list7:G
$ ./books.sh data
A:B,C,F,D,H
B:A,C,F,D,H
C:A,B,F,H
D:A,B
F:A,B,H,C
H:F,A,B,C



A perl solution which is probably faster:
Code:
for ($i=1; $i<=$#F; $i++) {
    for ($j=1; $j<=$#F; $j++) {
        if ($i!=$j) {
            $books{$F[$i]}{$F[$j]}++
        }
    }
}

END {
    for $k ( sort keys %books ) {
        @v = sort { $books{$k}{$b} != $books{$k}{$a}
                    ? $books{$k}{$b} <=> $books{$k}{$a}
                    : $a cmp $b
                  } keys %{ $books{$k} };
        print "$k:" . join (",", @v);
    }
}

Test run, using the same data file as with the sh/awk/sort solution:
Code:
$ perl -lan -F'[:,]' books.pl data
A:B,C,F,D,H
B:A,C,F,D,H
C:A,B,F,H
D:A,B
F:A,B,H,C
H:F,A,B,C

Note: Its been about 10 years since I've written anything more than a one-liner in perl, so perhaps a perl guru can slash that to a couple of lines. Smilie

Regards,
Alister
This User Gave Thanks to alister For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

split function

Hi all! I am relatively new to UNIX staff, and I have come across a problem: I have a big directory, which contains 100 smaller ones. Each of the 100 contains a file ending in .txt , so there are 100 files ending in .txt I want to split each of the 100 files in smaller ones, which will contain... (4 Replies)
Discussion started by: ktsirig
4 Replies

2. Shell Programming and Scripting

perl split function

$mystring = "name:blk:house::"; print "$mystring\n"; @s_format = split(/:/, $mystring); for ($i=0; $i <= $#s_format; $i++) { print "index is $i,field is $s_format"; print "\n"; } $size = $#s_format + 1; print "total size of array is $size\n"; i am expecting my size to be 5, why is it... (5 Replies)
Discussion started by: new2ss
5 Replies

3. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies

4. Shell Programming and Scripting

awk - split function

Hi, I have some output in the form of: #output: abc123 def567 hij890 ghi324 the above is in one column, stored in the variable x ( and if you wana know about x... x=sprintf(tolower(substr(someArray,1,1)substr(userArray,3,1)substr(userArray,2,1))) when i simply print x (print x) I get... (7 Replies)
Discussion started by: fusionX
7 Replies

5. Shell Programming and Scripting

Use split function in perl

Hello, if i have file like this: 010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010000890306946317387 05306977313623 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000... (5 Replies)
Discussion started by: chriss_58
5 Replies

6. Homework & Coursework Questions

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (1 Reply)
Discussion started by: castle
1 Replies

7. Homework & Coursework Questions

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (0 Replies)
Discussion started by: castle
0 Replies

8. Shell Programming and Scripting

PERL split function

Hi... I have a question regarding the split function in PERL. I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time. Also... (1 Reply)
Discussion started by: castle
1 Replies

9. Shell Programming and Scripting

Perl split function

my @d =split('\|', $_); west|ACH|3|Y|LuV|N||N|| Qt|UWST|57|Y|LSV|Y|Bng|N|KT| It Returns d as 8 for First Line, and 9 as for Second Line . I want to Process Both the Files, How to Handle It. (3 Replies)
Discussion started by: vishwakar
3 Replies

10. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies
EBOOK-META(1)							      calibre							     EBOOK-META(1)

NAME
ebook-meta - part of calibre SYNOPSIS
ebook-meta ebook_file [options] DESCRIPTION
Read/Write metadata from/to ebook files. Supported formats for reading metadata: azw, azw1, azw3, azw4, cbr, cbz, chm, epub, fb2, html, htmlz, imp, lit, lrf, lrx, mobi, odt, oebzip, opf, pdb, pdf, pml, pmlz, pobi, prc, rar, rb, rtf, snb, tpz, txt, txtz, updb, zip Supported formats for writing metadata: azw, azw1, azw3, azw4, epub, htmlz, lrf, mobi, pdb, pdf, prc, rtf, tpz, txtz Different file types support different kinds of metadata. If you try to set some metadata on a file type that does not support it, the metadata will be silently ignored. Whenever you pass arguments to ebook-meta that have spaces in them, enclose the arguments in quotation marks. OPTIONS
--version show program's version number and exit -h, --help show this help message and exit -t, --title Set the title. -a, --authors Set the authors. Multiple authors should be separated by the & character. Author names should be in the order Firstname Lastname. --title-sort The version of the title to be used for sorting. If unspecified, and the title is specified, it will be auto-generated from the title. --author-sort String to be used when sorting by author. If unspecified, and the author(s) are specified, it will be auto-generated from the author(s). --cover Set the cover to the specified file. -c, --comments Set the ebook description. -p, --publisher Set the ebook publisher. --category Set the book category. -s, --series Set the series this ebook belongs to. -i, --index Set the index of the book in this series. -r, --rating Set the rating. Should be a number between 1 and 5. --isbn Set the ISBN of the book. --tags Set the tags for the book. Should be a comma separated list. -k, --book-producer Set the book producer. -l, --language Set the language. -d, --date Set the published date. --get-cover Get the cover from the ebook and save it at as the specified file. --to-opf Specify the name of an OPF file. The metadata will be written to the OPF file. --from-opf Read metadata from the specified OPF file and use it to set metadata in the ebook. Metadata specified on the command line will over- ride metadata read from the OPF file --lrf-bookid Set the BookID in LRF files SEE ALSO
The User Manual is available at http://manual.calibre-ebook.com Created by Kovid Goyal <kovid@kovidgoyal.net> ebook-meta (calibre 0.8.51) January 2013 EBOOK-META(1)
All times are GMT -4. The time now is 03:25 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy