Impute values within groups


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Impute values within groups
# 1  
Old 12-09-2014
Impute values within groups

Hello all,

this is quite complex, so please allow me some space to make the problem clear.

I have several groups which has 3 types T1,T2 and T1*T2.

The T1 (and T2) values are always in doubles ( aa, cc, gg , tt , dd ,ii ).

The T1*T2 values can be either equal to T1 , or equal to T2 or equal to a mixture of T1 and T2.

As an example if within a group, T1 is aa , T2 is tt, then T1*T2 can be aa, tt or at .

Since T1*T2 values are derivatives of T1 and/or T2 values , the values of T1 or T2 if missing, can be imputed from a value of T1*T2 and T1 (or T2).

So if value of T1 is aa, T2 is missing and one of the T1*T2 values is gg (different from T1 and a double ), the value of T2 has to be gg.

Code:
Group       Type    Value   Serial_Number
Group4	T1	aa	1 
Group4	T1*T2	gg    3	
Group4	T1*T2	aa     4
Group4	T1*T2	ag    5

If a double does not exist in T1*T2, then the missing value is to be determined from a mixed value (like ag, gt, at)

as another example , in the following data, t2 value is missing in Group5, but from a mixed value of gt , T2 can be inferred as tt since T1 is gg.



Code:
Group       Type    Value   Serial_Number
Group5	T1	gg	1 
Group5	T1*T2	gg    3	
Group5	T1*T2	gt     4

Similarly, in the following example , where T1 is missing , it can be inferred as aa.


Code:
Group       Type    Value   Serial_Number
Group6	T2	gg	2
Group6	T1*T2	gg     6	
Group6	T1*T2	ag     3
Group6	T1*T2	gg     4
Group6	T1*T2	ag     5

So the idea is, if either one of T1 and T2 is missing for a group, then the missing value is imputed for that group. If both T1 and T2 are missing, but there exists two different doubles like aa and tt in T1*T2 ( or there exists a mixed value of T1*T2 like at ) then T1 and T2 both can be inferred for the group.

If both T1 and T2 are missing and can be inferred from different doubles, the order doesnt matter, meaning if there T1 and T2 are inferred to have values aa and tt, but if it cant be determined which one is which, then any value can be assigned to T1 and the other one can be assigned to T2.



In Group7, since there is at least a mixed value ga, T1 is gg and T2 is aa.

Code:
Group       Type    Value   Serial_Number
Group7	T1*T2	ga     7
Group7	T1*T2	gg     8
Group7	T1*T2	ga     9


If T1 and T2 both exist for a group, then we let that group be since there is nothing to impute.


Example input
Code:
Group     Type    Value   Serial_Number
Group1    T1        aa       1
Group1    T2        tt        2
Group1    T1*T2        tt        3
Group1    T1*T2        tt        4
Group1    T1*T2        at        5
Group4	T1	aa	1 
Group4	T1*T2	gg    3	
Group4	T1*T2	aa     4
Group4	T1*T2	ag    5
Group5	T1	gg	1 
Group5	T1*T2	gg    3	
Group5	T1*T2	gt     4
Group5	T1*T2	gg    5
Group6	T2	gg	2
Group6	T1*T2	gg     6	
Group6	T1*T2	ag     3
Group6	T1*T2	gg     4
Group6	T1*T2	ag     5
Group7	T1*T2	ga     7
Group7	T1*T2	gg     8
Group7	T1*T2	ga     9

This is how the imputed output looks like

Code:
Group     Type    Value   Serial_Number
Group1    T1        aa       1
Group1    T2        tt        2
Group1    T1*T2        tt        3
Group1    T1*T2        tt        4
Group1    T1*T2        at        5
Group4	T1	aa	1 
Group4	T2	gg	imputed
Group4	T1*T2	gg    3	
Group4	T1*T2	aa     4
Group4	T1*T2	ag    5
Group5	T1	gg	1
Group5     T2     tt      imputed 
Group5	T1*T2	gg    3	
Group5	T1*T2	gt     4
Group5	T1*T2	gg    5
Group6	T1	aa	imputed
Group6	T2	gg	2
Group6	T1*T2	gg     6	
Group6	T1*T2	ga     3
Group6	T1*T2	gg     4
Group6	T1*T2	ga     5
Group7	T1	gg     imputed
Group7	T2	aa     imputed
Group7	T1*T2	ga     7
Group7	T1*T2	gg     8
Group7	T1*T2	ga     9

# 2  
Old 12-09-2014
What's your question?
# 3  
Old 12-09-2014
Quote:
Originally Posted by durden_tyler
What's your question?
Given the example input with missing data, I would like to have an output with imputed data.
# 4  
Old 12-11-2014
This is a challenge, but looks like course work / homework.?
What have you tried already?
# 5  
Old 12-11-2014
Wow!
Whatever it is, it looks like the wife helped. Smilie

Last edited by ongoto; 12-11-2014 at 07:12 PM..
# 6  
Old 12-12-2014
Well, this again is far from elegant, but try
Code:
awk     'function prlast( ) {
                         for (y in Y)
                                 if (T[y])      {sub (T[y], "", C[LAST])
                                                 delete T[y]
                                                }
                         split (C[LAST], M, "")
                         CNT=0
                         for (t in T) print LAST, t, M[++CNT]M[CNT], "\timputed"
                         delete T
                         LAST=$1
                        }

         NR == 1        {print; Y["T1"]; Y["T2"]; next}
         NR == 2        {LAST=$1}
         $1 != LAST     {prlast()}

         $2 ~ /\*/      {split ($3, M, "")
                         for (i in M) if (!index(C[$1], M[i])) C[$1]=C[$1] M[i]
                         print
                         next
                        }

                        {T[$2]=substr($3,1,1)
                         print
                        }

         END            {prlast()}
        ' SUBSEP="," FS="\t" OFS="\t" file
Group    Type    Value    Serial_Number
Group1    T1    aa    1
Group1    T2    tt    2
Group1    T1*T2    tt    3
Group1    T1*T2    tt    4
Group1    T1*T2    at    5
Group4    T1    aa    1
Group4    T1*T2    gg    3
Group4    T1*T2    aa    4
Group4    T1*T2    ag    5
Group4    T2    gg        imputed
Group5    T1    gg    1
Group5    T1*T2    gg    3
Group5    T1*T2    gt    4
Group5    T1*T2    gg    5
Group5    T2    tt        imputed
Group6    T2    gg    2
Group6    T1*T2    gg    6
Group6    T1*T2    ag    3
Group6    T1*T2    gg    4
Group6    T1*T2    ag    5
Group6    T1    aa        imputed
Group7    T1*T2    ga    7
Group7    T1*T2    gg    8
Group7    T1*T2    ga    9
Group7    T1    gg        imputed
Group7    T2    aa        imputed

As you can see, the "imputed" values are printed at the end of each group. Pipe it through a sort step if that's not acceptable.
This User Gave Thanks to RudiC For This Post:
# 7  
Old 12-12-2014
Another one, perhaps even more clumsy, but works with any awk (but the Solaris /usr/bin/awk -> oawk)
Code:
awk '
BEGIN {OFS="\t"}
function prlast(){
  f=""
  if (T1=="") {
    t=substr(T2,1,1)
    for (i=1;i<=length(T1T2);i++) {
      s=substr (T1T2,i,1)
      if (s!=t) {
        print p,"T1",s s,"imputed"
        f=s
        break
      }
    }
  }
  if (T2=="") {
    t=substr(T1,1,1)
    for (i=1;i<=length(T1T2);i++) {
      s=substr (T1T2,i,1)
      if (s!=t && s!=f) {
        print p,"T2",s s,"imputed"
        break
      }
    }
  }
}
p!=$1 && NR>2 {
  # next group begins
  prlast()
  T1=T2=T1T2=""
}
{
  if ($2=="T1") T1=T1 $3
  else if ($2=="T2") T2=T2 $3
  else if ($2=="T1*T2") T1T2=T1T2 $3
  p=$1
  print
} END {
  prlast()
}
' file


Last edited by MadeInGermany; 12-12-2014 at 12:12 PM..
This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Groups is not visible

OS : SunOS 5.8 I am trying to add a user ad3059 to the following groups, A B C D ( four groups A,B,C,D) When i use usermod command and add the user to the above groups, and go to > groups ad3059 other C D It doesnt show A and B groups and shows it as other.Please advice on how... (13 Replies)
Discussion started by: Revathi2089
13 Replies

2. Shell Programming and Scripting

Print values within groups of lines with awk

Hello to all, I'm trying to print the value corresponding to the words A, B, C, D, E. These words could appear sometimes and sometimes not inside each group of lines. Each group of lines begins with "ZYX". My issue with current code is that should print values for 3 groups and only is... (6 Replies)
Discussion started by: Ophiuchus
6 Replies

3. UNIX for Dummies Questions & Answers

Groups

Must I be in a group? I am using Ubuntu and am the only user on my PC. I know how to change groups but do not see a way to not be in a group. Any help would be appreciated. (2 Replies)
Discussion started by: nthepines
2 Replies

4. Shell Programming and Scripting

Groups in Unix ???

What is Primary group and Secondary Group in Unix.? (1 Reply)
Discussion started by: gwgreen1
1 Replies

5. HP-UX

Groups access

Hi all, Can someone tell me how I can get around this problem. Basically I use the HP-UX OS and I work with 2 top level directories. /z/group1 /z/group2 these 2 dirs are managed where group1 can only be access by one set of users and group2 another. This is managed by adding the 2... (3 Replies)
Discussion started by: cyberfrog
3 Replies

6. Shell Programming and Scripting

Remove matched values and their related groups

For each value in file1 it has to check in file2 and file3. If value matched it has to delete that value and related group value in file2 and file3. In this example it takes A , deletes A and take related group value 1 and deletes E-1,then checks in file3 and deletes K-1.After that it takes D... (7 Replies)
Discussion started by: kanagaraj
7 Replies

7. Solaris

groups

1 user in member of 4 groups find file permissions and default group (1 Reply)
Discussion started by: tirupathi
1 Replies

8. Solaris

groups

how to create 1000 users in 1 group (0 Replies)
Discussion started by: tirupathi
0 Replies

9. Shell Programming and Scripting

groups starting with c2?

I have some groups and when i issue a command like groups $LOGNAME it displays in one line rfautosys c2ru cash2 I want to fetch only group starting with c2 but when i grep i am getting full line. Can someone advise on this please as how i can get output as c2ru? (2 Replies)
Discussion started by: gehlnar
2 Replies

10. AIX

Where are my groups

Hello A couple of weeks ago, I added a user to an AIX 5.3 system. I go to add one today, and it appears that when creating a user in smit, I cannot see any groups. No primary groups No Group set No Admin Groups The /etc/group and etc/secuity/group files seem to be intact. I did... (4 Replies)
Discussion started by: mhenryj
4 Replies
Login or Register to Ask a Question