Hi power user,
Basically, this thread is a continuation of the previous one
:
https://www.unix.com/shell-programmin...#post302326483
However, I'm going to explain it again.
I have this following data:
file1
aa A
aa B
aa C
bb X
bb Y
bb Z
cc O
cc P
cc Q
. .
. .
. .
. .
and I want to turn them into a connection nodes like this:
file2
A aa A
A aa B
A aa C
B aa C
B aa B
C aa C
X bb X
X bb Y
X bb Z
Y bb Z
Y bb Y
Z bb Z
. . .
. . .
. . .
. . .
I made this relation, to create a graph. The file have more than 6.000.000 lines.
For smaller files (100.000 lines), I have used this following script in the previous thread:
join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '!a[$3$2$1];{a[$1$2$3]++}'
join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '$1<$3{print;next}{print$3,$2,$1}' | sort -u
nawk '
NR==FNR { c = a[$1]; a[$1] = c?c" "$2:$2; next }
{ c = a[$1]
if (c) {
split(c,b)
for (k in b) {
p = $2<b[k]?$2" "$1" "b[k]:b[k]" "$1" "$2
if (!d[p]++) print p
}
}
}
' file1 file1
For small file, those three kind of scripts could create the network only in less than 10 minutes. However, for files with more than 6.000.000 lines, even after one days, there was no results at all
. Is there any faster way to do it?
Any suggestion, how to create file2 by using perl or awk? Tx