Sponsored Content
Top Forums Shell Programming and Scripting Match child with parents and form matrix Post 302933629 by jalaj841 on Saturday 31st of January 2015 11:30:37 PM
Old 02-01-2015
Match child with parents and form matrix

thank you for letting me join this forum, lots of learning opportunities looks like.
Myself a biologist, very new into unix, so please excuse if I use incorrect language. I am using cygwin on windows, it can run perl, awk , sed etc.

I have 2 files, the first sample sheet, tells which parent and children are in which sample. Parents are represented as P1, P2, P3 and corresponding children groups are represented as P1/P2 , P2/P3 etc.

Code:
index,line,sample
1,p1,s1
2,p2,s2
3,p1/p2,s3
4,p1/p2,s4
5,p1/p2,s5
6,p1/p2,s6
7,p1/p3,s7
8,p1/p3,s8
9,p1/p3,s9
10,p1/p3,s10
11,p2/p3,s11
12,p2/p3,s12
13,p2/p3,s13
14,p2/p3,s14
15,p3,s15

The second file contains data, having sample number, variable name and value. The parents always can be aa,tt,gg,cc (same character repeated twice)

Code:
sample,var,value
s1,v1,aa
s1,v2,tt
s1,v3,aa
s1,v4,gg
s2,v1,tt
s2,v2,aa
s2,v3,aa
s2,v4,gg
s3,v1,at
s3,v3,aa
s3,v4,tt
s4,v1,tt
s4,v2,at
s4,v3,aa
s4,v4,gt
s5,v1,aa
s5,v2,tt
s5,v3,aa
s5,v4,gt
s6,v1,aa
s6,v2,aa
s6,v3,aa
s6,v4,tt
s7,v1,aa
s7,v2,aa
s7,v3,at
s7,v4,ag
s8,v1,aa
s8,v2,tt
s8,v3,at
s8,v4,ag
s9,v1,aa
s9,v2,at
s9,v3,tt
s9,v4,gg
s10,v1,aa
s10,v2,at
s10,v3,aa
s10,v4,ag
s11,v1,aa
s11,v2,aa
s11,v3,tt
s11,v4,gg
s12,v1,tt
s12,v2,tt
s12,v3,tt
s12,v4,ag
s13,v1,aa
s13,v2,at
s13,v3,aa
s13,v4,ag
s14,v1,at
s14,v2,aa
s14,v3,at
s14,v4,aa
s15,v1,aa
s15,v2,aa
s15,v3,tt
s15,v4,aa

I am only interested in variables in which a pair of parents dont match. If parents have same value, that variable is not considered in the output, also if one/both parents are absent for a variable, I dont want to consider that one.

What I need to do is create new files for all sets of children with same parents, and assign the variables values a (if matching first parent) , b (if matching second parent) and m (mixture of both) . If data is missing in child variable, hyphen (-) can be used.

So my desired output are 3 files, all in matrix form.


Code:
file p1_p2

    s3  s4  s5 s6 
v1  m   b   a   a
v2  -   m   a   b


file p1_p3

    s7  s8  s9 s10 
v2  b   a   m   m
v3  m   m   b   a
v4  m   m   a   m


file p2_p3

    s11  s12  s13 s14 
v1   b    a    b   m 
v3   b    b    a   m
v4   a    m    m   b


I`m ready to answer questions that you may have. please guide me to achieve the output.

Last edited by jalaj841; 02-01-2015 at 12:48 PM..
 

7 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Changing Unix form to Microsoft Word form to be able to email it to someone.

Please someone I need information on how to change a Unix form/document into a microsoft word document in order to be emailed to another company. Please help ASAP. Thankyou :confused: (8 Replies)
Discussion started by: Cheraunm
8 Replies

2. UNIX for Dummies Questions & Answers

changing data into matrix form

Hi, I have a file whose structure is like this 7 7 1 2 3 4 5 1 3 4 8 6 1 4 5 6 0 2 6 8 3 8 2 5 7 8 0 5 7 9 4 1 3 8 0 2 2 3 5 6 8 basically first two row tell the number of rows and column but the data following them are not arranged in that format. now i want to create another... (1 Reply)
Discussion started by: g0600014
1 Replies

3. Shell Programming and Scripting

Cut and paste data in matrix form

I have large formatted data file with five columns. This has to be rearranged in lower order matrix form as shown below for sample data. 1 2 3 4 5 1.0 3.0 2.0 5.0 3.0 2.0 4.0 3.0 1.0 6.0 2.0 3.0 4.0 5.0 1.0 1.0 4.0 2.0 3.0 5.0 3.0 5.0 4.0 2.0 8.0 1.0 3.0 2.0 4.0 5.0 2.0... (7 Replies)
Discussion started by: dhilipumich
7 Replies

4. Shell Programming and Scripting

fetch last line no form file which is match with specific pattern by grep command

Hi i have a file which have a pattern like this Nov 10 session closed Nov 10 Nov 9 08:14:27 EST5EDT 2010 on tty . Nov 10 Oct 19 02:14:21 EST5EDT 2010 on pts/tk . Nov 10 afrtetryytr Nov 10 session closed Nov 10 Nov 10 03:21:04 EST5EDT 2010 Dec 8 Nov 10 05:03:02 EST5EDT 2010 ... (13 Replies)
Discussion started by: Himanshu_soni
13 Replies

5. Shell Programming and Scripting

Reformatting data in matrix form

Hi, Some assistance with respect to the following problem will be very helpful. I want to reformat my dataset in the following manner for subsequent analysis. I have first column values (which repeat for each value of 2nd column) which are names, the second column specifies position ad the... (1 Reply)
Discussion started by: newbie83
1 Replies

6. Shell Programming and Scripting

Comparing two strings receiving form two different loops and execute if condition when single match

I want to read a file contain sub-string and same string need to match in file name I got from for loop. I am using below code: #!/bin/bash C_UPLOADEDSUFFIX='.uploaded' files=$(find . -iname '*'$C_UPLOADEDSUFFIX -type f) # find files having .uploaded prefix for file in $files do ... (1 Reply)
Discussion started by: ketanraut
1 Replies

7. UNIX for Dummies Questions & Answers

Form balanced matrix by filtering data

I need to form a matrix out of unbalanced set of records. First eliminate the sample that do not have at least 3 variables (col2). So, in the example, samples 4 and 5 get eliminated. Then form a matrix of values (col3) from the samples using only variables that are present accross all samples.... (3 Replies)
Discussion started by: senhia83
3 Replies
zpbstf.f(3)							      LAPACK							       zpbstf.f(3)

NAME
zpbstf.f - SYNOPSIS
Functions/Subroutines subroutine zpbstf (UPLO, N, KD, AB, LDAB, INFO) ZPBSTF Function/Subroutine Documentation subroutine zpbstf (characterUPLO, integerN, integerKD, complex*16, dimension( ldab, * )AB, integerLDAB, integerINFO) ZPBSTF Purpose: ZPBSTF computes a split Cholesky factorization of a complex Hermitian positive definite band matrix A. This routine is designed to be used in conjunction with ZHBGST. The factorization has the form A = S**H*S where S is a band matrix of the same bandwidth as A and the following structure: S = ( U ) ( M L ) where U is upper triangular of order m = (n+kd)/2, and L is lower triangular of order n-m. Parameters: UPLO UPLO is CHARACTER*1 = 'U': Upper triangle of A is stored; = 'L': Lower triangle of A is stored. N N is INTEGER The order of the matrix A. N >= 0. KD KD is INTEGER The number of superdiagonals of the matrix A if UPLO = 'U', or the number of subdiagonals if UPLO = 'L'. KD >= 0. AB AB is COMPLEX*16 array, dimension (LDAB,N) On entry, the upper or lower triangle of the Hermitian band matrix A, stored in the first kd+1 rows of the array. The j-th column of A is stored in the j-th column of the array AB as follows: if UPLO = 'U', AB(kd+1+i-j,j) = A(i,j) for max(1,j-kd)<=i<=j; if UPLO = 'L', AB(1+i-j,j) = A(i,j) for j<=i<=min(n,j+kd). On exit, if INFO = 0, the factor S from the split Cholesky factorization A = S**H*S. See Further Details. LDAB LDAB is INTEGER The leading dimension of the array AB. LDAB >= KD+1. INFO INFO is INTEGER = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = i, the factorization could not be completed, because the updated element a(i,i) was negative; the matrix A is not positive definite. Author: Univ. of Tennessee Univ. of California Berkeley Univ. of Colorado Denver NAG Ltd. Date: November 2011 Further Details: The band storage scheme is illustrated by the following example, when N = 7, KD = 2: S = ( s11 s12 s13 ) ( s22 s23 s24 ) ( s33 s34 ) ( s44 ) ( s53 s54 s55 ) ( s64 s65 s66 ) ( s75 s76 s77 ) If UPLO = 'U', the array AB holds: on entry: on exit: * * a13 a24 a35 a46 a57 * * s13 s24 s53**H s64**H s75**H * a12 a23 a34 a45 a56 a67 * s12 s23 s34 s54**H s65**H s76**H a11 a22 a33 a44 a55 a66 a77 s11 s22 s33 s44 s55 s66 s77 If UPLO = 'L', the array AB holds: on entry: on exit: a11 a22 a33 a44 a55 a66 a77 s11 s22 s33 s44 s55 s66 s77 a21 a32 a43 a54 a65 a76 * s12**H s23**H s34**H s54 s65 s76 * a31 a42 a53 a64 a64 * * s13**H s24**H s53 s64 s75 * * Array elements marked * are not used by the routine; s12**H denotes conjg(s12); the diagonal elements of S are real. Definition at line 154 of file zpbstf.f. Author Generated automatically by Doxygen for LAPACK from the source code. Version 3.4.1 Sun May 26 2013 zpbstf.f(3)
All times are GMT -4. The time now is 05:16 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy