$ time ./kevin.sh
A:B,C,F,D,H
B:A,C,F,D,H
C:A,B,F,H
D:A,B
F:A,B,H,C
H:F,A,B,C
real 0m1.142s
user 0m0.590s
sys 0m0.580s
$
The problem with that script is that we run a sort command for every book.
The following solution use only one sort command :
Code:
awk -v Q="'" -F'[:,]' '
NR==FNR {
list = $1;
all = "";
for (i=2; i<=NF; i++) {
all = all SUBSEP $i ;
books[list, i-1] = $i;
}
books[list, "all" ] = all SUBSEP;
books[list, "count"] = NF-1;
next;
}
{
book = $1;
delete bookCount;
for (i=2; i<=NF; i++) {
list = $i;
if (books[list, "all"] ~ SUBSEP book SUBSEP) {
for (ib=1; ib<=books[list, "count"]; ib++) {
bookCount[books[list, ib]]++;
}
}
}
for (b in bookCount) {
if (b != book) {
print book, b, bookCount[b]
}
}
close(cmd);
}
' kevin2.dat kevin1.dat |
sort -k1,1 -k3,3nr -k2,2 |
awk '
{
book = $1;
if (book == prev) {
out = out "," $2;
} else {
if (out) print prev ":" out;
out = $2;
prev = book;
}
}
END { if (out) print prev ":" out; }
'
With the same input files, the out is the same but times are better :
Code:
$ time ./kevin2.sh
A:B,C,F,D,H
B:A,C,F,D,H
C:A,B,F,H
D:A,B
F:A,B,H,C
H:F,A,B,C
real 0m0.419s
user 0m0.152s
sys 0m0.169s
$
Jean-Pierre.
Hi, aigles
Thank you so so much. your script is much faster than mine.
And it helps me a lot, I have learned many things about AWK(oh, it surprises me that AWK can be written this way) from your script.
Thank you!
Hi all!
I am relatively new to UNIX staff, and I have come across a problem:
I have a big directory, which contains 100 smaller ones. Each of the 100 contains a file ending in .txt , so there are 100 files ending in .txt
I want to split each of the 100 files in smaller ones, which will contain... (4 Replies)
$mystring = "name:blk:house::";
print "$mystring\n";
@s_format = split(/:/, $mystring);
for ($i=0; $i <= $#s_format; $i++) {
print "index is $i,field is $s_format";
print "\n";
}
$size = $#s_format + 1;
print "total size of array is $size\n";
i am expecting my size to be 5, why is it... (5 Replies)
I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this.
For example:
split -l 3000000 filename.txt
This is very slow and it splits the file with 3 million records in each... (10 Replies)
Hi,
I have some output in the form of:
#output:
abc123
def567
hij890
ghi324
the above is in one column, stored in the variable x ( and if you wana know about x... x=sprintf(tolower(substr(someArray,1,1)substr(userArray,3,1)substr(userArray,2,1)))
when i simply print x (print x) I get... (7 Replies)
Hi... I have a question regarding the split function in PERL.
I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time.
Also... (1 Reply)
Hi... I have a question regarding the split function in PERL.
I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time.
Also... (0 Replies)
Hi... I have a question regarding the split function in PERL.
I have a very huge csv file (more than 80 million records). I need to extract a particular position(eg : 50th position) of each line from the csv file. I tried using split function. But I realized split takes a very long time.
Also... (1 Reply)
my @d =split('\|', $_);
west|ACH|3|Y|LuV|N||N||
Qt|UWST|57|Y|LSV|Y|Bng|N|KT|
It Returns d as 8 for First Line, and 9 as for Second Line . I want to Process Both the Files, How to Handle It. (3 Replies)
Hello;
I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)