The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Making Connection nodes for Graph anjas Shell Programming and Scripting 4 06-18-2009 04:42 AM
FTP large files - Getting "Connection Refused" bullz26 HP-UX 4 10-25-2008 06:52 AM
problem while making ftp of a large file rprajendran UNIX for Dummies Questions & Answers 1 05-28-2008 01:19 AM
nodes kamisi UNIX for Dummies Questions & Answers 3 05-30-2002 03:47 PM
i-nodes djatwork UNIX for Dummies Questions & Answers 4 09-25-2001 12:29 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 07-04-2009
anjas anjas is offline
Registered User
  
 

Join Date: Mar 2009
Location: Bali, Indonesia
Posts: 17
Making Large Connection nodes for Graph

Hi power user,

Basically, this thread is a continuation of the previous one :

Making Connection nodes for Graph

However, I'm going to explain it again.

I have this following data:

file1
aa A
aa B
aa C
bb X
bb Y
bb Z
cc O
cc P
cc Q
. .
. .
. .
. .

and I want to turn them into a connection nodes like this:
file2

A aa A
A aa B
A aa C
B aa C
B aa B
C aa C
X bb X
X bb Y
X bb Z
Y bb Z
Y bb Y
Z bb Z
. . .
. . .
. . .
. . .

I made this relation, to create a graph. The file have more than 6.000.000 lines.
For smaller files (100.000 lines), I have used this following script in the previous thread:

join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '!a[$3$2$1];{a[$1$2$3]++}'
join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '$1<$3{print;next}{print$3,$2,$1}' | sort -u
nawk '
NR==FNR { c = a[$1]; a[$1] = c?c" "$2:$2; next }
{ c = a[$1]
if (c) {
split(c,b)
for (k in b) {
p = $2<b[k]?$2" "$1" "b[k]:b[k]" "$1" "$2
if (!d[p]++) print p
}
}
}
' file1 file1
For small file, those three kind of scripts could create the network only in less than 10 minutes. However, for files with more than 6.000.000 lines, even after one days, there was no results at all . Is there any faster way to do it?


Any suggestion, how to create file2 by using perl or awk? Tx
  #2 (permalink)  
Old 07-06-2009
otheus's Avatar
otheus otheus is offline Forum Staff  
Moderator ala Mode
  
 

Join Date: Feb 2007
Location: Innsbruck, Austria
Posts: 1,864
To keep the forums high quality for all users, please take the time to format your posts correctly.

Most notably, please use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Thank You.

The UNIX and Linux Forums
  #3 (permalink)  
Old 07-13-2009
anjas anjas is offline
Registered User
  
 

Join Date: Mar 2009
Location: Bali, Indonesia
Posts: 17
Sorry for the mistakes. Now, I repaired the posting.

Hi power user,

Basically, this thread is a continuation of the previous one :

Making Connection nodes for Graph

However, I'm going to explain it again.

I have this following data:

file1
aa A
aa B
aa C
bb X
bb Y
bb Z
cc O
cc P
cc Q
. .
. .
. .
. .

and I want to turn them into a connection nodes like this:
file2

A aa A
A aa B
A aa C
B aa C
B aa B
C aa C
X bb X
X bb Y
X bb Z
Y bb Z
Y bb Y
Z bb Z
. . .
. . .
. . .
. . .

I made this relation, to create a graph. The file have more than 6.000.000 lines.
For smaller files (100.000 lines), I have used this following script in the previous thread:

Code:
join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '!a[$3$2$1];{a[$1$2$3]++}'
Code:
join -o 1.2 0 2.2 -1 1 -2 1 file1 file1 | nawk '$1<$3{print;next}{print$3,$2,$1}' | sort -u
Code:
nawk '
NR==FNR { c = a[$1]; a[$1] = c?c" "$2:$2; next }
{ c = a[$1]
if (c) {
split(c,b)
for (k in b) {
p = $2<b[k]?$2" "$1" "b[k]:b[k]" "$1" "$2
if (!d[p]++) print p
}
}
}
' file1 file1
For small file, those three kind of scripts could create the network only in less than 10 minutes. However, for files with more than 6.000.000 lines, even after one days, there was no results at all . Is there any faster way to do it?


Any suggestion, how to create file2 by using perl or awk? Tx
  #4 (permalink)  
Old 07-13-2009
Ygor's Avatar
Ygor Ygor is offline Forum Staff  
Moderator
  
 

Join Date: Oct 2003
Location: -31.96,115.84
Posts: 1,402
Try...
Code:
awk '{n=++a[$1];b[$1,n]=$2}END{for(c in a)for(n=1;n<=a[c];n++)for(z=1;z<=n;z++)print b[c,z],c,b[c,n]}' file1|sort -k 2,2 -k 1,1 -k 3,3
Result...
Code:
A aa A
A aa B
A aa C
B aa B
B aa C
C aa C
X bb X
X bb Y
X bb Z
Y bb Y
Y bb Z
Z bb Z
O cc O
O cc P
O cc Q
P cc P
P cc Q
Q cc Q
Sponsored Links
Reply

Bookmarks

Tags
graph, nodes

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 07:38 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language translation by Google.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0