Venn Data Maker


 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support Venn Data Maker
# 1  
Old 08-19-2016
Quote:
Originally Posted by jacobs.smith
.
.
.
Also - the number of lines in the intersectionlist.txt should be equal to = (2^(number of sets))-1
.
.
.
So, with 7 sets there should be 127 lines, no? And the sum of individual set counts should be equal to the No. of lines?

Should g2,1,1,0,1,1,1,1,1 from RavinderSingh13's example be in Set1245678 or in Set12, Set14, Set15, ..., Set78?

Last edited by RudiC; 08-19-2016 at 04:47 PM..
# 2  
Old 08-19-2016
Quote:
Originally Posted by RudiC
So, with 7 sets there should be 127 lines, no? And the sum of individual set counts should be equal to the No. of lines?

Should g2,1,1,0,1,1,1,1,1 from RavinderSingh13's example be in Set1245678 or in Set12, Set14, Set15, ..., Set78?
Hi R.Singh,

I checked it with the input file.

But the number of lines in the output.txt doesn't reach to be 127.

I guess, it is printing only the values where there is a common or unique set.

However, I would like to see all combination values.

Thanks
# 3  
Old 08-19-2016
And the second question?
# 4  
Old 08-19-2016
Quote:
Originally Posted by RudiC
And the second question?
I am loving your questions.

Glad to learn.

Here is a way I tried. But it has two disadvantages.

One - I can only do three sets. But my actual input has 7 and even more.

Two - I cannot write the first column saying unique or common.

Code:
for i in 100 010 001 110 101 011 111; do awk -F"," 'NR>1 {print $2$3$4}' 1 | grep $i | wc -l;done

Thanks
# 5  
Old 08-19-2016
How about
Code:
awk '
NR==1   {print "Name", $0
         next
        }
        {for (i=1; i<=3; i++)   {T[$i]
                                 R[$i,i] = 1
                                }
        }
END     {delete T[""]
         for (t in T)   {print t, R[t,1]+0, R[t,2]+0, R[t,3]+0
                         TMP = R[t,1] * 100 + R[t,2] * 10 + R[t,3]
                         if (TMP == 111) Set123++
                         if (TMP == 110) Set12++
                         if (TMP == 101) Set13++
                         if (TMP == 11)  Set23++
                         if (TMP == 100) Set1++
                         if (TMP == 10)  Set2++
                         if (TMP == 1)   Set3++
                        }
         print "Set1_unique="   0+Set1
         print "SDet2_unique="  0+Set2
         print "Set3_unique="   0+Set3
         print "Set12_common="  0+Set12
         print "Set13_common="  0+Set13
         print "Set23_common="  0+Set23
         print "Set123_common=" 0+Set123
        }
' FS=, OFS=, file
Name,Set1,Set2,Set3
g1,1,1,1
g2,1,1,0
g3,0,0,1
g4,1,0,0
g5,0,1,1
g6,1,0,0
g7,0,1,0
g8,0,0,1
Set1_unique=2
SDet2_unique=1
Set3_unique=2
Set12_common=1
Set13_common=0
Set23_common=1
Set123_common=1

This User Gave Thanks to RudiC For This Post:
# 6  
Old 08-19-2016
Thanks Rudic.

But I would like to make it dynamic.

The post example has 3 sets. But my actual input file has numerous sets.

Can you please share any comments on how I can edit your solution?
# 7  
Old 08-19-2016
Quote:
Originally Posted by jacobs.smith
Thanks Rudic.
But I would like to make it dynamic.
The post example has 3 sets. But my actual input file has numerous sets.
Can you please share any comments on how I can edit your solution?
Hello jacobs.smith,

Let's say we have following Input_file(which willbe created by your 1st requirement, so I have edited it to test it more).
Code:
cat Input_file
Name,Set1,Set2,Set3,Set4,Set5,Set6,Set7,Set8
g5,0,1,1,1,0,1,1,0
g6,1,0,0,0,0,0,0,0
g7,0,1,0,0,0,0,0,1
g8,0,0,1,1,1,1,0,1
g1,1,1,1,0,1,1,1,0
g2,1,1,0,1,1,1,1,1
g3,0,0,1,0,0,0,1,1
g4,1,0,0,1,0,0,1,1

Then following code may help you in same.
Code:
awk -F, 'NR==1{
		next
              } 
              {
		for(i=2;i<=NF;i++){
					for(j=i+1;j<=NF;j++){
								if($i==$j && $i!=0 && $j!=0){
												S["Set"(i-1)(j-1)"_common"]++;
											    };
                                                            }
                                  }
              } 
              {
                for(q=2;q<=NF;q++){
					if($q==1)           {
								num=q-1;
								E++
							    }
                                  };
                if(E==1)          {
					Y["Set"num"_unique"]++
				  };
		E=""
              } 
         END  {
		for(i in S){
				print i OFS S[i]
			   }
                for(u in Y){
				print u OFS Y[u]
                           }
              }
         '   Input_file

Then output will be as follows:
Code:
Set28_common 2
Set27_common 3
Set18_common 2
Set45_common 2
Set35_common 2
Set36_common 3
Set34_common 2
Set78_common 3
Set25_common 2
Set26_common 3
Set24_common 2
Set17_common 3
Set16_common 2
Set15_common 2
Set68_common 2
Set58_common 2
Set23_common 2
Set14_common 2
Set13_common 1
Set12_common 2
Set67_common 3
Set57_common 2
Set56_common 3
Set48_common 3
Set47_common 3
Set46_common 3
Set38_common 2
Set37_common 3
Set1_unique 1

Now you could make All in All command as follows, which you could run with original Input_file(posted in POST#1)
Code:
awk -F, 'NR==1{
                print "Name," $0;
                R=NF
              }
         NR>1 {
                for(i=1;i<=NF;i++){
                                        A[$i,i]++;
                                        if($i){
                                                C[$i]
                                              }
                                  }
              }
         END  {
                for(i in C)       {
                                        for(j=1;j<=R;j++){
                                                                Q=Q?Q FS (A[i,j]=A[i,j]>=1?1:0):i FS  (A[i,j]=A[i,j]>=1?1:0)};
                                                                print Q;
                                                                Q=""
                                                         }
              }
         ' Input_file   |   awk -F, 'NR==1{
		next
              } 
              {
		for(i=2;i<=NF;i++){
					for(j=i+1;j<=NF;j++){
								if($i==$j && $i!=0 && $j!=0){
												S["Set"(i-1)(j-1)"_common"]++;
											    };
                                                            }
                                  }
              } 
              {
                for(q=2;q<=NF;q++){
					if($q==1)           {
								num=q-1;
								E++
							    }
                                  };
                if(E==1)          {
					Y["Set"num"_unique"]++
				  };
		E=""
              } 
         END  {
		for(i in S){
				print i OFS S[i]
			   }
                for(u in Y){
				print u OFS Y[u]
                           }
              }
         '

Output will be as follows(as per Input_file into your POST#1).
Code:
Set23_common 2
Set13_common 1
Set12_common 2
Set1_unique 2
Set3_unique 2
Set2_unique 1

Please let me know if this helps you, will be glad.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

2 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Venn diagram results using awk

Hi, I have the following files 1.txt a 10 b 11 c 12 d 13 e 14 f 15 g 16 h 17 i 18 j 19 k 20 2.txt a 21 b 22 (15 Replies)
Discussion started by: jacobs.smith
15 Replies

2. Programming

maker

how can i remake a program to crash a harddrive using unix:rolleyes: (2 Replies)
Discussion started by: flomper
2 Replies
Login or Register to Ask a Question