![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Paste Command does not align my output | jplayermx | Shell Programming and Scripting | 4 | 09-05-2008 02:03 PM |
| Align Text from a file. | earlepps | UNIX for Dummies Questions & Answers | 9 | 08-01-2006 08:37 AM |
| align several fields and fill spaces with zero | DebianJ | Shell Programming and Scripting | 2 | 11-23-2005 07:51 AM |
| how to align report headers in awk | galinaqt | Shell Programming and Scripting | 3 | 10-16-2005 03:41 PM |
| How to underline/bold and how to align output | clara | UNIX for Dummies Questions & Answers | 1 | 06-16-2005 12:41 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
||||
|
To Align clinical data files using awk
Hello, I am new to shell scripts and 1st week into awk. I have so far managed to format output files to sample File 1 and File 2 as shown below and File 3 output solution is what I am looking for...thanks
File 1: 0633-009_001200008:225065338468009:CMBRTRM:albuterol 0633-009_001200008:225065338468009:CMCLAS1:respiratory system 0633-009_001200008:225065338468009:CMCLAS2:drugs for obstructive 0633-009_001200008:225065338468009:CMCLAS3:adrenergics 0633-009_001200009:225065338468008:CMBRTRM:albuterol 0633-009_001200009:225065338468008:CMCLAS1:respiratory system 0633-009_001200009:225065338468008:CMCLAS2:drugs for obstructive 0633-009_001200009:225065338468008:CMCLSCD3:R03C File 2: USUBJID|CMSEQ|CMBRTRM|CMCLAS1|CMCLAS2|CMCLAS3|CMCLSCD1|CMCLSCD2|CMCLSCD3|CMROUTE| 0633-009_001200008|225065338468009 0633-009_001200009|225065338468008 Need Output File 3 with Conditions: File 1's Field 4 values ( seprated by ":" for example albuterol) appends to File 2 by Field 3 value of File 1 ( for example CMBRTM) equals File 2 header separeted by | ( for example |CMBRTM|) and for that row of insert File 2's field 1 and field 2 values is equal to File 1's field 1 and 2 values Out put File 3: USUBJID|CMSEQ|CMBRTRM|CMCLAS1|CMCLAS2|CMCLAS3|CMCLSCD1|CMCLSCD2|CMCLSCD3|CMROUTE| 633-009_001200008|225065338468009|albuterol|respiratorysystem|drugs for obstructive |adrenergics ||||| 0633-009_001200009|225065338468008|albuterol|respiratory system|drugs for obstructive||||R03C|| |
|
||||
|
So far in my humble work.....
Code:
# For File 2, 1st I tried to create unique values from File 1's column 3
# and transpose those values as headers for File 2
awk -F":" '{print $3}' File1.txt | sort | uniq > tst.txt
# for each row of distinct 3rd column values of File 1 , create a file with column headers ( for now hard code first two columns USUBJID and CMSEQ)
awk -F "/n" '
BEGIN {OFS= "|"}
{
for (i=1;i<=NF;i++)
{arr[NR,i]=$i;}
}
END {
printf("USUBJID|CMSEQ|");
for(i=1;i<=NF;i++)
{
for(j=1;j<=NR;j++)
{printf("%s|",arr[j,i]);}
printf("\n");
}
}' tst.txt > temp1.txt
cp temp1.txt File2.txt
# each 1st and 2nd record in File1.txt file , needs to create a distinct row in File2.txt for 1st and 2nd columns usubjid|cmseq combo
awk -F':' '
{
SEQ[$1,"|",$2] = SEQ[$1,"|",$2];
}
END {
for (i in SEQ)
print i;
}' File1.txt >> File2.txt
Now I know my columns in File 2 , I know my number of rows in File 2 from File 1, I need to process Cloumn 4 from File 1 and paste in File 2 , thinking --> I need a way to form a string for each values in column 4 of File 1 seprated by '|' and that string will append for each line of File 2 based on: Need to figure out for each unique column 1 and column 2 combination in File 1 , I need to pick the column 4 values and append next to the applicable row in File 2 and when I do so the string values with column 4 for column 3 of File 1 needs to match the header order of File 2 hope I am not making my explanation complicated :-( Last edited by chowdhut; 05-28-2009 at 03:22 PM.. |
|
||||
|
I made progress and in this phase I need some help to figure out why an outer for loop KSH variable does not decode in AWK but inner for loop does. Below is my code,
If I hard code variable 'SUBSEQ' in AWK it works but if I try to pass the SUBSEQ from KSH, it does not and when I pass the variable 'NAM' from KSH it works: I have all the file below. I need help so code gives output if I uncomment the line #SUBSEQ = "'$sub'"; and comment out the hardcode SUBSEQ = "0633-009_001200008|225065338468009"; Not following why the if block does not evaluate SUBSEQ if ( arr[j,1] == SUBSEQ && && arr[j,2] == NAM) #! /bin/ksh # read subject and sequence set -A subseq_array $(<subseq.txt) # read the applicable QVALS Order for the study ( can be from define) set -A qnam_array $(<qnam.txt) for sub in ${subseq_array[@]} do for n in ${qnam_array[@]} do #print $sub ; `awk -F ':' 'BEGIN { } { { #SUBSEQ = "'$sub'"; SUBSEQ = "0633-009_001200008|225065338468009"; NAM = "'$n'"; NULL = " |"; } for(i=1;i<=NF;i++) { arr[NR,i]=$i; } } END{ for(j=1;j<=NR;j++) { if ( arr[j,1] == SUBSEQ && arr[j,2] == NAM) { {printf("%s|",arr[j,3]);} exit; } } for(j=1;j<=NR;j++) { if ( arr[j,1] == SUBSEQ && arr[j,2] != NAM) { {printf("%s",NULL);} exit; } } }' supp_q.txt >> out1.txt` done print $sub >> out1.txt break; done The files: subseq.txt ========= 0633-009_001200008|225065338468009 0633-009_001200008|225065338468010 0633-009_001200009|225065338468008 0633-009_001200018|225065338468009 0633-009_001200018|225065338468011 qnam.txt ======= CMBRTRM CMCLAS1 CMCLAS2 CMCLAS3 CMCLSCD1 CMCLSCD2 CMCLSCD3 CMROUTE source file that I am reading from supp_q.txt ========== 0633-009_001200008|225065338468009:CMBRTRM:albuterol 0633-009_001200008|225065338468009:CMCLAS1:respiratory system 0633-009_001200008|225065338468009:CMCLAS2:drugs for obstructive airway diseases 0633-009_001200008|225065338468009:CMCLAS3:adrenergics for systemic use 0633-009_001200008|225065338468009:CMCLSCD1:R 0633-009_001200008|225065338468009:CMCLSCD2:R03 0633-009_001200008|225065338468009:CMCLSCD3:R03C 0633-009_001200008|225065338468010:CMCLSCD2:R03 0633-009_001200008|225065338468010:CMCLSCD3:R03C 0633-009_001200009|225065338468008:CMBRTRM:albuterol 0633-009_001200009|225065338468008:CMCLAS1:respiratory system 0633-009_001200009|225065338468008:CMCLAS2:drugs for obstructive airway diseases 0633-009_001200009|225065338468008:CMCLSCD3:R03C 0633-009_001200018|225065338468009:CMCLAS2:drugs for obstructive airway diseases 0633-009_001200018|225065338468011:CMROUTE:RESPIR initial output file that I am writing to from AWK: ============ CMBRTRM|CMCLAS1|CMCLAS2|CMCLAS3|CMCLSCD1|CMCLSCD2|CMCLSCD3|CMROUTE|USUBJID|CMSEQ| what the output needs to be: ======================== CMBRTRM|CMCLAS1|CMCLAS2|CMCLAS3|CMCLSCD1|CMCLSCD2|CMCLSCD3|CMROUTE|USUBJID|CMSEQ| albuterol|respiratory system|drugs for obstructive airway diseases|adrenergics for systemic use|R|R03|R03C| |0633-009_001200008|225065338468009 Last edited by chowdhut; 06-18-2009 at 04:05 PM.. |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|