The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com




View Single Post in the UNIX and Linux Forums - Click on the Thread or Permalink to View Entire Thread -->
  #1 (permalink)  
Old 07-04-2009
anjas anjas is offline
Registered User
  
 

Join Date: Mar 2009
Location: Bali, Indonesia
Posts: 17
Making Large Connection nodes for Graph

Hi power user,

Basically, this thread is a continuation of the previous one :

Making Connection nodes for Graph

However, I'm going to explain it again.

I have this following data:

file1
aa A
aa B
aa C
bb X
bb Y
bb Z
cc O
cc P
cc Q
. .
. .
. .
. .

and I want to turn them into a connection nodes like this:
file2

A aa A
A aa B
A aa C
B aa C
B aa B
C aa C
X bb X
X bb Y
X bb Z
Y bb Z
Y bb Y
Z bb Z
. . .
. . .
. . .
. . .

I made this relation, to create a graph. The file have more than 6.000.000 lines.
For smaller files (100.000 lines), I have used this following script in the previous thread:

join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '!a[$3$2$1];{a[$1$2$3]++}'
join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '$1<$3{print;next}{print$3,$2,$1}' | sort -u
nawk '
NR==FNR { c = a[$1]; a[$1] = c?c" "$2:$2; next }
{ c = a[$1]
if (c) {
split(c,b)
for (k in b) {
p = $2<b[k]?$2" "$1" "b[k]:b[k]" "$1" "$2
if (!d[p]++) print p
}
}
}
' file1 file1
For small file, those three kind of scripts could create the network only in less than 10 minutes. However, for files with more than 6.000.000 lines, even after one days, there was no results at all . Is there any faster way to do it?


Any suggestion, how to create file2 by using perl or awk? Tx