Venn Data Maker

08-19-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Quote:

Originally Posted by jacobs.smith

.
.
.
Also - the number of lines in the intersectionlist.txt should be equal to = (2^(number of sets))-1
.
.
.

So, with 7 sets there should be 127 lines, no? And the sum of individual set counts should be equal to the No. of lines?

Should g2,1,1,0,1,1,1,1,1 from RavinderSingh13's example be in Set1245678 or in Set12, Set14, Set15, ..., Set78?

Last edited by RudiC; 08-19-2016 at 04:47 PM..

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-19-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Quote:

Originally Posted by RudiC

So, with 7 sets there should be 127 lines, no? And the sum of individual set counts should be equal to the No. of lines?

Should g2,1,1,0,1,1,1,1,1 from RavinderSingh13's example be in Set1245678 or in Set12, Set14, Set15, ..., Set78?

Hi R.Singh,

I checked it with the input file.

But the number of lines in the output.txt doesn't reach to be 127.

I guess, it is printing only the values where there is a common or unique set.

However, I would like to see all combination values.

Thanks

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

08-19-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

And the second question?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-19-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Quote:

Originally Posted by RudiC

And the second question?

I am loving your questions.

Glad to learn.

Here is a way I tried. But it has two disadvantages.

One - I can only do three sets. But my actual input has 7 and even more.

Two - I cannot write the first column saying unique or common.

Code:

for i in 100 010 001 110 101 011 111; do awk -F"," 'NR>1 {print $2$3$4}' 1 | grep $i | wc -l;done

Thanks

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

08-19-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

How about

Code:

awk '
NR==1   {print "Name", $0
         next
        }
        {for (i=1; i<=3; i++)   {T[$i]
                                 R[$i,i] = 1
                                }
        }
END     {delete T[""]
         for (t in T)   {print t, R[t,1]+0, R[t,2]+0, R[t,3]+0
                         TMP = R[t,1] * 100 + R[t,2] * 10 + R[t,3]
                         if (TMP == 111) Set123++
                         if (TMP == 110) Set12++
                         if (TMP == 101) Set13++
                         if (TMP == 11)  Set23++
                         if (TMP == 100) Set1++
                         if (TMP == 10)  Set2++
                         if (TMP == 1)   Set3++
                        }
         print "Set1_unique="   0+Set1
         print "SDet2_unique="  0+Set2
         print "Set3_unique="   0+Set3
         print "Set12_common="  0+Set12
         print "Set13_common="  0+Set13
         print "Set23_common="  0+Set23
         print "Set123_common=" 0+Set123
        }
' FS=, OFS=, file
Name,Set1,Set2,Set3
g1,1,1,1
g2,1,1,0
g3,0,0,1
g4,1,0,0
g5,0,1,1
g6,1,0,0
g7,0,1,0
g8,0,0,1
Set1_unique=2
SDet2_unique=1
Set3_unique=2
Set12_common=1
Set13_common=0
Set23_common=1
Set123_common=1

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-19-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Thanks Rudic.

But I would like to make it dynamic.

The post example has 3 sets. But my actual input file has numerous sets.

Can you please share any comments on how I can edit your solution?

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

08-19-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Quote:

Originally Posted by jacobs.smith

Thanks Rudic.
But I would like to make it dynamic.
The post example has 3 sets. But my actual input file has numerous sets.
Can you please share any comments on how I can edit your solution?

Hello jacobs.smith,

Let's say we have following Input_file(which willbe created by your 1st requirement, so I have edited it to test it more).

Code:

cat Input_file
Name,Set1,Set2,Set3,Set4,Set5,Set6,Set7,Set8
g5,0,1,1,1,0,1,1,0
g6,1,0,0,0,0,0,0,0
g7,0,1,0,0,0,0,0,1
g8,0,0,1,1,1,1,0,1
g1,1,1,1,0,1,1,1,0
g2,1,1,0,1,1,1,1,1
g3,0,0,1,0,0,0,1,1
g4,1,0,0,1,0,0,1,1

Then following code may help you in same.

Code:

awk -F, 'NR==1{
		next
              } 
              {
		for(i=2;i<=NF;i++){
					for(j=i+1;j<=NF;j++){
								if($i==$j && $i!=0 && $j!=0){
												S["Set"(i-1)(j-1)"_common"]++;
											    };
                                                            }
                                  }
              } 
              {
                for(q=2;q<=NF;q++){
					if($q==1)           {
								num=q-1;
								E++
							    }
                                  };
                if(E==1)          {
					Y["Set"num"_unique"]++
				  };
		E=""
              } 
         END  {
		for(i in S){
				print i OFS S[i]
			   }
                for(u in Y){
				print u OFS Y[u]
                           }
              }
         '   Input_file

Then output will be as follows:

Code:

Set28_common 2
Set27_common 3
Set18_common 2
Set45_common 2
Set35_common 2
Set36_common 3
Set34_common 2
Set78_common 3
Set25_common 2
Set26_common 3
Set24_common 2
Set17_common 3
Set16_common 2
Set15_common 2
Set68_common 2
Set58_common 2
Set23_common 2
Set14_common 2
Set13_common 1
Set12_common 2
Set67_common 3
Set57_common 2
Set56_common 3
Set48_common 3
Set47_common 3
Set46_common 3
Set38_common 2
Set37_common 3
Set1_unique 1

Now you could make All in All command as follows, which you could run with original Input_file(posted in POST#1)

Code:

awk -F, 'NR==1{
                print "Name," $0;
                R=NF
              }
         NR>1 {
                for(i=1;i<=NF;i++){
                                        A[$i,i]++;
                                        if($i){
                                                C[$i]
                                              }
                                  }
              }
         END  {
                for(i in C)       {
                                        for(j=1;j<=R;j++){
                                                                Q=Q?Q FS (A[i,j]=A[i,j]>=1?1:0):i FS  (A[i,j]=A[i,j]>=1?1:0)};
                                                                print Q;
                                                                Q=""
                                                         }
              }
         ' Input_file   |   awk -F, 'NR==1{
		next
              } 
              {
		for(i=2;i<=NF;i++){
					for(j=i+1;j<=NF;j++){
								if($i==$j && $i!=0 && $j!=0){
												S["Set"(i-1)(j-1)"_common"]++;
											    };
                                                            }
                                  }
              } 
              {
                for(q=2;q<=NF;q++){
					if($q==1)           {
								num=q-1;
								E++
							    }
                                  };
                if(E==1)          {
					Y["Set"num"_unique"]++
				  };
		E=""
              } 
         END  {
		for(i in S){
				print i OFS S[i]
			   }
                for(u in Y){
				print u OFS Y[u]
                           }
              }
         '

Output will be as follows(as per Input_file into your POST#1).

Code:

Set23_common 2
Set13_common 1
Set12_common 2
Set1_unique 2
Set3_unique 2
Set2_unique 1

Please let me know if this helps you, will be glad.

Thanks,
R. Singh

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

Emergency UNIX and Linux Support

Venn Data Maker

2 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Venn diagram results using awk

Discussion started by: jacobs.smith

2. Programming

maker

Discussion started by: flomper