Venn Data Maker


 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support Venn Data Maker
# 15  
Old 08-19-2016
Quote:
Originally Posted by jacobs.smith
You are a legend R.Singh. Thank you!
Thank you jsscob.smith, it is all Almigthy's grace on me. Glad it helped you, did you test all permutations and cominations here? As I haven't done that much testing in it, please confirm.

Also thank you for asking this question, please keep asking questions(trying by yourself too), wherever is WILL there is a PATH.

Thanks,
R. Singh

Last edited by RavinderSingh13; 08-19-2016 at 04:52 PM..
# 16  
Old 08-19-2016
Quote:
Originally Posted by jacobs.smith
.
.
.
Also - the number of lines in the intersectionlist.txt should be equal to = (2^(number of sets))-1
.
.
.
So, with 7 sets there should be 127 lines, no? And the sum of individual set counts should be equal to the No. of lines?

Should g2,1,1,0,1,1,1,1,1 from RavinderSingh13's example be in Set1245678 or in Set12, Set14, Set15, ..., Set78?

Last edited by RudiC; 08-19-2016 at 04:47 PM..
# 17  
Old 08-19-2016
Quote:
Originally Posted by RudiC
So, with 7 sets there should be 127 lines, no? And the sum of individual set counts should be equal to the No. of lines?

Should g2,1,1,0,1,1,1,1,1 from RavinderSingh13's example be in Set1245678 or in Set12, Set14, Set15, ..., Set78?
Hi R.Singh,

I checked it with the input file.

But the number of lines in the output.txt doesn't reach to be 127.

I guess, it is printing only the values where there is a common or unique set.

However, I would like to see all combination values.

Thanks
# 18  
Old 08-19-2016
OK, try this:
Code:
awk '
NR==1   {print "Name", $0
         CC = NF
         for (i=1; i<2^CC; i++) Set[i] = 0
         next
        }
        {for (i=1; i<=CC; i++)  {T[$i]
                                 R[$i,i] = 1
                                }
        }
END     {delete T[""]
         for (t in T)   {printf "%s", t
                         for (i=1; i<=CC; i++)  printf ",%s", R[t,i]+0
                         printf RS
                         TMP = 0
                         for (i=1; i<=CC; i++)  TMP = TMP + 2^(i-1)*R[t,i]
                         Set[TMP]++
                        }
         for (i=1; i<2^CC; i++) {printf "Set"; for (j=0; j<CC; j++) if (int(i/2^j)%2) printf "%d", j+1; printf "=%d%s", Set[i], RS }
        }
' FS=,  file
Name,Set1,Set2,Set3,Set4,S5,S6
g1,1,1,1,1,1,1
g2,1,1,0,0,1,1
g3,0,0,1,0,1,1
g4,1,0,0,1,0,0
g5,1,1,1,1,1,1
g6,1,1,0,0,1,1
g7,0,1,0,1,1,0
g8,0,0,1,1,0,0
Set1=0
Set2=0
Set12=0
Set3=0
Set13=0
Set23=0
Set123=0
Set4=0
Set14=1
Set24=0
Set124=0
Set34=1
.
.
.
Set145=0
Set245=1
Set1245=0
.
.
.
Set256=0
Set1256=2
Set356=1
Set1356=0
.
.
.
Set23456=0
Set123456=2

Lines: 63 ( = 2^6 -1 )
Sum (set-values): 8 (8 different genes)

A bit complicated as awk doesn't provide binary operations nor print formats.

Last edited by RudiC; 08-19-2016 at 07:37 PM..
These 2 Users Gave Thanks to RudiC For This Post:
# 19  
Old 08-21-2016
Another small refinement:
Code:
awk '
NR==1   {print "Name," $0
         CC = NF
         for (i=1; i<2^CC; i++) Set[i] = 0
         next
        }

        {for (i=1; i<=CC; i++)  {T[$i]
                                 R[$i,i] = 1
                                }
        }

END     {delete T[""]
         for (t in T)   {TMP = 0
                         for (i=1; i<=CC; i++)  {printf "%s,%s%s", i==1?t:_, R[t,i]+0, i==CC?RS:_
                                                 TMP = TMP + 2^(i-1)*R[t,i]
                                                }
                         Set[TMP]++
                        }
         for (i=1; i<2^CC; i++) {TMP = 0
                                 for (j=0; j<CC; j++) if (int(i/2^j)%2) TMP = TMP * 10 + j+1
                                 printf "Set%d_%s=%d" RS, TMP, TMP<10?"unique":"common", Set[i] |  "sort -k1.4,1n"
                                }
        }
' FS=, file
Name,Set1,Set2,Set3,S4,S5
g1,1,1,1,1,1
g2,1,1,0,1,0
g3,0,0,1,1,0
g4,1,0,0,1,1
g5,0,1,1,1,1
g6,1,0,0,1,1
g7,0,1,0,1,1
g8,0,0,1,0,1
Set1_unique=0
Set2_unique=0
Set3_unique=0
Set4_unique=0
Set5_unique=0
Set12_common=0
Set13_common=0
Set14_common=0
Set15_common=0
Set23_common=0
Set24_common=0
Set25_common=0
Set34_common=1
Set35_common=1
Set45_common=0
Set123_common=0
Set124_common=1
Set125_common=0
Set134_common=0
Set135_common=0
Set145_common=2
Set234_common=0
Set235_common=0
Set245_common=1
Set345_common=0
Set1234_common=0
Set1235_common=0
Set1245_common=0
Set1345_common=0
Set2345_common=1
Set12345_common=1

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

2 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Venn diagram results using awk

Hi, I have the following files 1.txt a 10 b 11 c 12 d 13 e 14 f 15 g 16 h 17 i 18 j 19 k 20 2.txt a 21 b 22 (15 Replies)
Discussion started by: jacobs.smith
15 Replies

2. Programming

maker

how can i remake a program to crash a harddrive using unix:rolleyes: (2 Replies)
Discussion started by: flomper
2 Replies
Login or Register to Ask a Question