Making Large Connection nodes for Graph

07-04-2009

Registered User

20, 0

Join Date: Mar 2009

Last Activity: 30 July 2012, 4:53 AM EDT

Location: Germany

Posts: 20

Thanks Given: 0

Thanked 0 Times in 0 Posts

Making Large Connection nodes for Graph

Hi power user,

Basically, this thread is a continuation of the previous one

:

https://www.unix.com/shell-programmin...#post302326483

However, I'm going to explain it again.

I have this following data:

file1
aa A
aa B
aa C
bb X
bb Y
bb Z
cc O
cc P
cc Q
. .
. .
. .
. .

and I want to turn them into a connection nodes like this:
file2

A aa A
A aa B
A aa C
B aa C
B aa B
C aa C
X bb X
X bb Y
X bb Z
Y bb Z
Y bb Y
Z bb Z
. . .
. . .
. . .
. . .

I made this relation, to create a graph. The file have more than 6.000.000 lines.
For smaller files (100.000 lines), I have used this following script in the previous thread:

join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '!a[$3$2$1];{a[$1$2$3]++}'
join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '$1<$3{print;next}{print$3,$2,$1}' | sort -u
nawk '
NR==FNR { c = a[$1]; a[$1] = c?c" "$2:$2; next }
{ c = a[$1]
if (c) {
split(c,b)
for (k in b) {
p = $2<b[k]?$2" "$1" "b[k]:b[k]" "$1" "$2
if (!d[p]++) print p
}
}
}
' file1 file1
For small file, those three kind of scripts could create the network only in less than 10 minutes. However, for files with more than 6.000.000 lines, even after one days, there was no results at all

. Is there any faster way to do it?

Any suggestion, how to create file2 by using perl or awk? Tx

anjas

View Public Profile for anjas

Find all posts by anjas

07-06-2009

Registered User

2,157, 51

Join Date: Feb 2007

Last Activity: 6 September 2017, 5:43 AM EDT

Location: Innsbruck, Austria

Posts: 2,157

Thanks Given: 12

Thanked 51 Times in 48 Posts

To keep the forums high quality for all users, please take the time to format your posts correctly.

Most notably, please use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Thank You.

The UNIX and Linux Forums

otheus

View Public Profile for otheus

Find all posts by otheus

07-13-2009

Registered User

20, 0

Join Date: Mar 2009

Last Activity: 30 July 2012, 4:53 AM EDT

Location: Germany

Posts: 20

Thanks Given: 0

Thanked 0 Times in 0 Posts

Sorry for the mistakes. Now, I repaired the posting.

Hi power user,

Basically, this thread is a continuation of the previous one :

Making Connection nodes for Graph

However, I'm going to explain it again.

I have this following data:

file1
aa A
aa B
aa C
bb X
bb Y
bb Z
cc O
cc P
cc Q
. .
. .
. .
. .

and I want to turn them into a connection nodes like this:
file2

A aa A
A aa B
A aa C
B aa C
B aa B
C aa C
X bb X
X bb Y
X bb Z
Y bb Z
Y bb Y
Z bb Z
. . .
. . .
. . .
. . .

I made this relation, to create a graph. The file have more than 6.000.000 lines.
For smaller files (100.000 lines), I have used this following script in the previous thread:

Code:

join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '!a[$3$2$1];{a[$1$2$3]++}'

Code:

join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '$1<$3{print;next}{print$3,$2,$1}' | sort -u

Code:

nawk '
NR==FNR { c = a[$1]; a[$1] = c?c" "$2:$2; next }
{ c = a[$1]
if (c) {
split(c,b)
for (k in b) {
p = $2<b[k]?$2" "$1" "b[k]:b[k]" "$1" "$2
if (!d[p]++) print p
}
}
}
' file1 file1

For small file, those three kind of scripts could create the network only in less than 10 minutes. However, for files with more than 6.000.000 lines, even after one days, there was no results at all . Is there any faster way to do it?

Any suggestion, how to create file2 by using perl or awk? Tx

anjas

View Public Profile for anjas

Find all posts by anjas

07-13-2009

Registered User

1,801, 116

Join Date: Oct 2003

Last Activity: 15 May 2015, 11:55 AM EDT

Location: 54.23, -4.53

Posts: 1,801

Thanks Given: 1

Thanked 116 Times in 101 Posts

Try...

Code:

awk '{n=++a[$1];b[$1,n]=$2}END{for(c in a)for(n=1;n<=a[c];n++)for(z=1;z<=n;z++)print b[c,z],c,b[c,n]}' file1|sort -k 2,2 -k 1,1 -k 3,3

Result...

Code:

A aa A
A aa B
A aa C
B aa B
B aa C
C aa C
X bb X
X bb Y
X bb Z
Y bb Y
Y bb Z
Z bb Z
O cc O
O cc P
O cc Q
P cc P
P cc Q
Q cc Q

Ygor

View Public Profile for Ygor

Find all posts by Ygor

Shell Programming and Scripting

Making Large Connection nodes for Graph

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How keep running a program n an another computer via a connection ssh when the connection is closed?

Discussion started by: TomTomGre

2. Shell Programming and Scripting

Making Changes to large file in vi

Discussion started by: SkySmart

3. UNIX for Dummies Questions & Answers

One service, two nodes, HA

Discussion started by: Flomaster

4. Solaris

Solaris 10 ftp connection problem (connection refused, connection timed out)

Discussion started by: labdakos

5. Shell Programming and Scripting

Making Connection nodes for Graph

Discussion started by: anjas

6. HP-UX

FTP large files - Getting "Connection Refused"

Discussion started by: bullz26

7. UNIX for Dummies Questions & Answers

problem while making ftp of a large file

Discussion started by: rprajendran

8. UNIX for Advanced & Expert Users

Managing nodes???

Discussion started by: TRUEST

9. UNIX for Dummies Questions & Answers

nodes

Discussion started by: kamisi

10. UNIX for Dummies Questions & Answers

i-nodes

Discussion started by: djatwork