The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Paste Command does not align my output jplayermx Shell Programming and Scripting 4 09-05-2008 03:03 PM
Align Text from a file. earlepps UNIX for Dummies Questions & Answers 9 08-01-2006 09:37 AM
align several fields and fill spaces with zero DebianJ Shell Programming and Scripting 2 11-23-2005 07:51 AM
how to align report headers in awk galinaqt Shell Programming and Scripting 3 10-16-2005 04:41 PM
How to underline/bold and how to align output clara UNIX for Dummies Questions & Answers 1 06-16-2005 01:41 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-27-2009
chowdhut chowdhut is offline
Registered User
  
 

Join Date: May 2009
Location: CT, USA
Posts: 4
To Align clinical data files using awk

Hello, I am new to shell scripts and 1st week into awk. I have so far managed to format output files to sample File 1 and File 2 as shown below and File 3 output solution is what I am looking for...thanks

File 1:
0633-009_001200008:225065338468009:CMBRTRM:albuterol
0633-009_001200008:225065338468009:CMCLAS1:respiratory system
0633-009_001200008:225065338468009:CMCLAS2:drugs for obstructive
0633-009_001200008:225065338468009:CMCLAS3:adrenergics
0633-009_001200009:225065338468008:CMBRTRM:albuterol
0633-009_001200009:225065338468008:CMCLAS1:respiratory system
0633-009_001200009:225065338468008:CMCLAS2:drugs for obstructive
0633-009_001200009:225065338468008:CMCLSCD3:R03C
File 2:
USUBJID|CMSEQ|CMBRTRM|CMCLAS1|CMCLAS2|CMCLAS3|CMCLSCD1|CMCLSCD2|CMCLSCD3|CMROUTE|
0633-009_001200008|225065338468009
0633-009_001200009|225065338468008

Need Output File 3 with Conditions:
File 1's Field 4 values ( seprated by ":" for example albuterol) appends to File 2 by
Field 3 value of File 1 ( for example CMBRTM) equals File 2 header separeted by | ( for example |CMBRTM|) and
for that row of insert File 2's field 1 and field 2 values is equal to File 1's field 1 and 2 values


Out put File 3:
USUBJID|CMSEQ|CMBRTRM|CMCLAS1|CMCLAS2|CMCLAS3|CMCLSCD1|CMCLSCD2|CMCLSCD3|CMROUTE|
633-009_001200008|225065338468009|albuterol|respiratorysystem|drugs for obstructive |adrenergics |||||
0633-009_001200009|225065338468008|albuterol|respiratory system|drugs for obstructive||||R03C||
  #2 (permalink)  
Old 05-27-2009
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,538
so what have you got with awk?
  #3 (permalink)  
Old 05-28-2009
chowdhut chowdhut is offline
Registered User
  
 

Join Date: May 2009
Location: CT, USA
Posts: 4
So far in my humble work.....

Code:
# For File 2, 1st I tried to create unique values from File 1's column 3
# and transpose those values as headers for File 2

awk -F":" '{print $3}' File1.txt | sort | uniq > tst.txt


# for each row of distinct 3rd column values of File 1 , create a file with column headers ( for now hard code first two columns USUBJID and CMSEQ) 


awk -F "/n" '
BEGIN {OFS= "|"}

{
for (i=1;i<=NF;i++)
 {arr[NR,i]=$i;}
}

END {
     printf("USUBJID|CMSEQ|");
     for(i=1;i<=NF;i++)
      {
      for(j=1;j<=NR;j++)
        {printf("%s|",arr[j,i]);}
        printf("\n");
      }
     }' tst.txt > temp1.txt

cp temp1.txt File2.txt




# each 1st and 2nd record in File1.txt file , needs to create a distinct row in File2.txt for 1st and 2nd columns usubjid|cmseq combo

awk -F':' '
   {
        SEQ[$1,"|",$2] = SEQ[$1,"|",$2];
       
   }
   END {

   
          for (i in SEQ)
           
              print i;
          
                           
       }' File1.txt >> File2.txt

Now I know my columns in File 2 , I know my number of rows in File 2 from File 1,
I need to process Cloumn 4 from File 1 and paste in File 2 , thinking -->

I need a way to form a string for each values in column 4 of File 1 seprated by '|' and
that string will append for each line of File 2 based on:

Need to figure out for each unique column 1 and column 2 combination in File 1
, I need to pick the column 4 values and append next to the applicable row in File 2
and when I do so the string values with column 4 for column 3 of File 1 needs to match the header order of File 2


hope I am not making my explanation complicated :-(

Last edited by chowdhut; 05-28-2009 at 04:22 PM..
  #4 (permalink)  
Old 06-18-2009
chowdhut chowdhut is offline
Registered User
  
 

Join Date: May 2009
Location: CT, USA
Posts: 4
I made progress and in this phase I need some help to figure out why an outer for loop KSH variable does not decode in AWK but inner for loop does. Below is my code,
If I hard code variable 'SUBSEQ' in AWK it works but if I try to pass the SUBSEQ from KSH, it does not and when I pass the variable 'NAM' from KSH it works: I have all the file below. I need help so code gives output if I uncomment the line
#SUBSEQ = "'$sub'"; and comment out the hardcode
SUBSEQ = "0633-009_001200008|225065338468009";

Not following why the if block does not evaluate SUBSEQ if ( arr[j,1] == SUBSEQ && && arr[j,2] == NAM)

#! /bin/ksh


# read subject and sequence
set -A subseq_array $(<subseq.txt)


# read the applicable QVALS Order for the study ( can be from define)

set -A qnam_array $(<qnam.txt)


for sub in ${subseq_array[@]}
do
for n in ${qnam_array[@]}
do

#print $sub ;

`awk -F ':' 'BEGIN { }
{

{
#SUBSEQ = "'$sub'";
SUBSEQ = "0633-009_001200008|225065338468009";
NAM = "'$n'";
NULL = " |";
}



for(i=1;i<=NF;i++)
{
arr[NR,i]=$i;
}
}

END{


for(j=1;j<=NR;j++)
{

if ( arr[j,1] == SUBSEQ && arr[j,2] == NAM) {
{printf("%s|",arr[j,3]);}
exit; }
}



for(j=1;j<=NR;j++)
{


if ( arr[j,1] == SUBSEQ && arr[j,2] != NAM) {
{printf("%s",NULL);}
exit; }

}






}' supp_q.txt >> out1.txt`

done
print $sub >> out1.txt
break;
done



The files:
subseq.txt
=========

0633-009_001200008|225065338468009
0633-009_001200008|225065338468010
0633-009_001200009|225065338468008
0633-009_001200018|225065338468009
0633-009_001200018|225065338468011

qnam.txt
=======

CMBRTRM
CMCLAS1
CMCLAS2
CMCLAS3
CMCLSCD1
CMCLSCD2
CMCLSCD3
CMROUTE

source file that I am reading from
supp_q.txt
==========

0633-009_001200008|225065338468009:CMBRTRM:albuterol
0633-009_001200008|225065338468009:CMCLAS1:respiratory system
0633-009_001200008|225065338468009:CMCLAS2:drugs for obstructive airway diseases
0633-009_001200008|225065338468009:CMCLAS3:adrenergics for systemic use
0633-009_001200008|225065338468009:CMCLSCD1:R
0633-009_001200008|225065338468009:CMCLSCD2:R03
0633-009_001200008|225065338468009:CMCLSCD3:R03C
0633-009_001200008|225065338468010:CMCLSCD2:R03
0633-009_001200008|225065338468010:CMCLSCD3:R03C
0633-009_001200009|225065338468008:CMBRTRM:albuterol
0633-009_001200009|225065338468008:CMCLAS1:respiratory system
0633-009_001200009|225065338468008:CMCLAS2:drugs for obstructive airway diseases
0633-009_001200009|225065338468008:CMCLSCD3:R03C
0633-009_001200018|225065338468009:CMCLAS2:drugs for obstructive airway diseases
0633-009_001200018|225065338468011:CMROUTE:RESPIR
initial output file that I am writing to from AWK:
============
CMBRTRM|CMCLAS1|CMCLAS2|CMCLAS3|CMCLSCD1|CMCLSCD2|CMCLSCD3|CMROUTE|USUBJID|CMSEQ|

what the output needs to be:
========================

CMBRTRM|CMCLAS1|CMCLAS2|CMCLAS3|CMCLSCD1|CMCLSCD2|CMCLSCD3|CMROUTE|USUBJID|CMSEQ|
albuterol|respiratory system|drugs for obstructive airway diseases|adrenergics for systemic use|R|R03|R03C| |0633-009_001200008|225065338468009

Last edited by chowdhut; 06-18-2009 at 05:05 PM..
  #5 (permalink)  
Old 06-18-2009
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,122
Code:
awk -F':' -v SUBSEQ="${sub}" -v NAM="${n}" '.....' myFileName
OR
awk -F':' '.....' SUBSEQ="${sub}" NAM="${n}" myFileName
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 01:45 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0