Format data by consolidating replicated obs


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Format data by consolidating replicated obs
# 1  
Old 02-28-2015
Format data by consolidating replicated obs

Please assist with the following.


I have readings R1, R2,..Rn from plants from different fields , sometimes replicated. I need to first consolidate the replicates by their mean value per plant per field. Then I need to reformat the data by partially transposing it.

The missing values (blanks) need to be reported as dot (.) in the output.

I have several files like the following format, where the readings start from a fixed column colX (col4 in this example) but number of readings is variable.


I want to go from

Code:
Plant Field Replicate R1 R2 R3 
N1 I1 1 1 2 3 
N2 I1 1 1 3 4
N1 I1 2 2 3 4
N3 I1 2 3 4 5
N1 I2 1 1 2 3 
N2 I2 1 1 3 4
N2 I2 2 2 3 4
N3 I2 1 3 4 5
N1 I3 1 1   4  
N2 I3 1 2 5   
N3 I3 1   6 
N3 I3 2     1


to

Code:
Reading Field N1 N2 N3
R1 I1 1.5 1 3
R2 I1 2.5 3 4
R3 I1 3.5 4 5
R1 I2 1 1.5 3  
R2 I2 2 3 4
R3 I2 3 4 5
R1 I3 1 2 .
R2 I3 . 5 6
R3 I3 4 . 1

I tried this.


Code:
awk '
    NR>1{
        arr[$1" "$2" "$3]   += $4
        count[$1" "$2" "$3] += 1
    }
    END{
        for (a in arr) {
            print a, arr[a] / count[a]
        }
    }
' file |  awk '
NR == 1 {
    n = NF
    for (i = 1; i <= NF; i++)
        row[i] = $i
    next
}
{
    if (NF > n)
        n = NF
    for (i = 1; i <= NF; i++)
        row[i] = row[i] " " $i
}
END {
    for (i = 1; i <= n; i++)
        print row[i]
}'

# 2  
Old 02-28-2015
There's a well tested solution for converting single column data to tables/matrices published several times in these fora; some upfront conditioning of your data and we can use that ideally:
Code:
awk     'NR==1          {READ1=$(NF-2)
                         READ2=$(NF-1)
                         READ3=$NF
                         next
                        }
                        {print $1, READ1"_"$2, $(NF-2)
                         print $1, READ2"_"$2, $(NF-1)
                         print $1, READ3"_"$2, $(NF)
                        }
        ' FS="\t" OFS="\t" file2 |
awk  '       {LN[$2]; HD[$1]; MX[$2,$1]+=$3; CNT[$2,$1]++}
         END    {               printf "%10s", ""; for (i in HD) printf "%10s", i; print "";
                 for (j in LN) {printf "%10s",j;   for (i in HD) printf "%10s", MX[j,i]/CNT[j,i]; print ""}
                }
        ' FS="\t"
                  N1        N2        N3
     R3_I1       3.5         4         5
     R3_I2         3         4         5
     R3_I3         4         0       3.5
     R1_I1       1.5         1         3
     R1_I2         1       1.5         3
     R1_I3         1         2         0
     R2_I1       2.5         3         4
     R2_I2         2         3         4
     R2_I3         0         5         0

Of course some good sport might want to combine these two into one single script... and get dots in place of the zeroes in the result.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 02-28-2015
thank you, but the issue is you have fixed the reads, as I had mentioned it is quite variable (sometimes ranging upto 70), , can we modify this script to accommodate those types of conditions?

I tried this for the first part, would you please have a look and correct the syntax?
Code:
        awk     'NR==1          { for (i=(NF-2);i<=NF;i++)
        				
        			  j++ ;
        			   read[j]=$i;

	                        }
	              NR>1          { for (i=1;i<=j;i++)
	                        { print $1, read[j]"_"$2, $(NF-j) }

	                        }
        ' FS="\t" OFS="\t" file2


Last edited by jianp83; 03-01-2015 at 12:15 AM..
# 4  
Old 03-01-2015
Close. To make it really read n columns, try
Code:
awk     'NR==1          {for (i=4; i<=NF; i++) READ[i]=$i
                         COLS=NF
                         next
                        }
                        {for (i=4; i<=COLS; i++) print $1, READ[i] "_" $2, $i
                        }
        ' FS="\t" OFS="\t" file

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Consolidating multiple outputs in one file

Dears, i am stuck here i have 3 scripts running at one time and all the three scripts finish at different time and each script generate 1 file with different name. so i will have three files. i dnt know which script finish first i want to have a script which check if all the there files are... (6 Replies)
Discussion started by: mirwasim
6 Replies

2. Shell Programming and Scripting

Script to generate Excel file or to SQL output data to Excel format/tabular format

Hi , i am generating some data by firing sql query with connecting to the database by my solaris box. The below one should be the header line of my excel ,here its coming in separate row. TO_CHAR(C. CURR_EMP_NO ---------- --------------- LST_NM... (6 Replies)
Discussion started by: dani1234
6 Replies

3. Shell Programming and Scripting

Help with consolidating mails sent by executing script

I have a script which checks for swap usage of the unix server. A cron job has been created to execute this script every hour and send the output via mail. This output is only required for a particular duration and it can range anywhere from 6 hours to 10 hours. So we are commenting out the... (1 Reply)
Discussion started by: kiran1112
1 Replies

4. Solaris

Working with disk sets from replicated LUNs

Hi everybody We have a Sun Cluster with two nodes, connected to a number of SAN disks over fiber cables. We made SVM disk sets on those disks for our application needs. Now we constructed another site in another metropolitan area, but with only one node (no cluster), and connected it to the main... (2 Replies)
Discussion started by: abohmeed
2 Replies

5. Shell Programming and Scripting

Preventing script from being replicated on a defined number of hosts

ok. i have an extensive program written predominantly in borne shell. i have to give an "evaluation" copy of this program to a user so she can test it out and see if she wants it. problem is, i dont have an evaluation copy. and even if i did, im worried the evaluation copy can be edited to... (8 Replies)
Discussion started by: SkySmart
8 Replies

6. Windows & DOS: Issues & Discussions

Consolidating Freespace to allow shrinking partition?

i have an "old" laptop with 84gb used space, 203gb free, running 32bit Windows Vista. i've tried all defragmenting programs i could find and though some offer Free Space Defrag, they don't seem to take into account where on the disk to consolidates the space to. what i am trying to achieve is... (4 Replies)
Discussion started by: Sterist
4 Replies

7. Shell Programming and Scripting

How to replicated records using sed

I need to generate exactly 102 duplicates in a file using sed. Here's an example of my input: Grid-ref = 1, 148 Grid-ref = 1, 311 Grid-ref = 1, 428 I need: Grid-ref = 1, 148 Grid-ref = 1, 148 Grid-ref = 1, 148 Grid-ref = 1, 148 ... repeated 102 times, then Grid-ref = 1, 311... (2 Replies)
Discussion started by: Russ
2 Replies

8. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Hi, Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent. E.g : username empid ------------------------ sri 123 to username,empid sri,123 Thanks, Hema:confused: (2 Replies)
Discussion started by: Hemamalini
2 Replies

9. UNIX for Advanced & Expert Users

Consolidating Pipes

This is something I've given a lot of thought to and come up with no answer. Say you have a data stream passing from a file, through process A, into process B. Process A only modifies a few bytes of the stream, then prints the rest of the stream unmodified. Is there any way to stream the file... (4 Replies)
Discussion started by: Corona688
4 Replies

10. Shell Programming and Scripting

Adding values then consolidating lines

I'm a begginer with scripting... Have a script that pulls check information data. There are multiple lines for each check. ALL the info is the same on the line EXCEPT the charges. ie: same check number, same payee, same patient, same check date, DIFFERENT CHARGE. I need to total the charges... (0 Replies)
Discussion started by: tdavenpo
0 Replies
Login or Register to Ask a Question