Help for a Perl newcomer! Transposing data from columns to rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help for a Perl newcomer! Transposing data from columns to rows
# 8  
Old 11-30-2011
I used only one space to split the line. This might caused the problem. Here is an improved version with formatted output.

Code:
#!/usr/bin/perl

use strict;
use warnings;

my $line = <STDIN>;
chomp($line);
my @header_item = split(/[ ]+/, $line);
die("Invalid header line.\n") if (@header_item < 2);

my %new_header_item = ();
my %data = ();

while ($line = <STDIN>)
{
    chomp($line);
    $line =~ s/^[ ]*//g; # Strip leading spaces
    my @item = split(/[ ]+/, $line); # Use continuous spaces as delimiter
    next if (@item < 2);
    $data{$item[0]} = () if (!defined($data{$item[0]}));

    for (my $i = 2; $i < @item; $i++)
    {
        last if (!defined($header_item[$i]));
        my $name = $header_item[$i].$item[1];
        $new_header_item{$name} = '';
        $data{$item[0]}{$name} = $item[$i];
    }
}

my @h = sort keys(%new_header_item);
print sprintf('%-9s ', $header_item[0]);

foreach my $name (@h)
{
    print sprintf(' %12s', $name);
}

print "\n";

foreach my $subj (keys(%data))
{
    print sprintf(' %-9s', $subj);

    foreach my $key (@h)
    {
        my $value = defined($data{$subj}{$key}) ? $data{$subj}{$key} : 0;
        print sprintf(' %12.3f', $value);
    }

    print "\n";
}

exit(0);


Sample output:

Code:
[root@localhost temp]# ./parse_exp.pl < exp.txt 
Subject     Fp1mixedneg Fp1mixedneut   Fp1pureneg  Fp1pureneut  Fp2mixedneg Fp2mixedneut   Fp2pureneg  Fp2pureneut  FpZmixedneg FpZmixedneut   FpZpureneg  FpZpureneut
 S1             235.012      212.100     1578.220     4878.200      125.650      548.215       15.250      512.448      548.233     1548.200     1654.200      545.880
 S2            5465.000      545.000      587.000      874.000    87465.000     7498.000     8798.000     8730.390    46984.000     5136.000     6516.000      879.000

# 9  
Old 11-30-2011
Thank you, that one does work a lot better!

I think the issue is that the way my raw data has been exported, it is not properly alligned in collumns itself, so when i run the script it does not have properly alligned data to work with in the first place!

I dont suppose you know of any scripts that could allign my raw data first, before running the script?

Currently my raw data, has all the items along the top, but the numbers underneath are not fitting perfectly under the headings;

for example,
Code:
Subjects Conditions Fp1_Li..nce Fpz_Li..nce Fp2_Li..nce F7_Lin..nce F3_Lin..nce
S1 MixNeutRem    14.7695    21.5628    22.7087     0.0992     0.3253
S1 MixNeutFor     1.2255     3.5854     5.2456    -1.6241    -1.5311
S1  MixNegRem    -0.5305     2.1425     6.1005     0.8624
S1 MixNegForg     2.6340     4.6610     6.0398     2.3030     2.6258

So, as you can see the raw data itself is not alligned correctly.

Am I right in thinking, this is probably why the script is not working perfectly.

If you know of anyways to correct this I would be very happy!

Thanks

Last edited by Franklin52; 12-03-2011 at 11:38 AM.. Reason: Please use code tags for data and code samples, thank you
# 10  
Old 11-30-2011
Quote:
Originally Posted by Sarah_W
Thank you, that one does work a lot better!

I think the issue is that the way my raw data has been exported, it is not properly alligned in collumns itself, so when i run the script it does not have properly alligned data to work with in the first place!

I dont suppose you know of any scripts that could allign my raw data first, before running the script?

Currently my raw data, has all the items along the top, but the numbers underneath are not fitting perfectly under the headings;

...

So, as you can see the raw data itself is not alligned correctly.

Am I right in thinking, this is probably why the script is not working perfectly.

If you know of anyways to correct this I would be very happy!

Thanks
Still doesn't work? The later version should work with aligned or non-aligned data as it uses continuous spaces as delimiter, so the spaces between the columns don't a matter.

Last edited by MacMonster; 11-30-2011 at 04:06 PM.. Reason: Shorten the quote.
# 11  
Old 11-30-2011
Sorry I was a bit quick to say it doesn't work, having a play around on excel i have managed to allign the collumns and the items at the top are specified perfectly by the condition, which is what I needed.

However, it has rearranged the the data, so the subject numbers are now randomally arranged rather than being subject 1,2,3 etc. Likewise the items at the top are arranged in a alphabetical order, not the order how they originally were. consequently it appears that the corresponding number data within the table does not match the relevent condition and item.

I have cross checked a few of the conditions and items to what the number in the table should be and they do not match.

Also in the row where the item and condition is specified along the top, it is fine apart from there is a random insertion of the the condition collum and obviously that does not need to be there. it looks like this....

CPz_Li..ncePurNeutRem ConditionsMixNegForg ConditionsMixNegRem ConditionsMixNeutFor ConditionsMixNeutRem 7.906 2.634 -0.53 1.226 14.77
obvioulsy, i do not need the 'conditions' listed like that in the top row, as that is what the MixNegRem etc... are already doing along side the items!

Hope all this makes sense!

Thanks again

---------- Post updated at 09:08 PM ---------- Previous update was at 08:53 PM ----------

Weirdly however, I have just ran it on only subject 1's data and it works perfectly!
all the conditions are with the items as they should be along the top and there is no intrusion of a 'condition' item along the top.
Again though the items at the top are in alphabetical order, is there any way to keep them in the original format?
Crucially the data in the table matches the conditions, so it may just mean that I have to run the script on each subject individually and merge the data into excell by hand.

I think when I try to run it on the large data set, the collumn called 'Condition' may be the confusing factor.

If you can think of anything to make it work on the large data set that would be fantastic, but if not I may have to just result to doing it by hand.

thanks
# 12  
Old 11-30-2011
Here's a different take on this pivot problem -

Code:
$
$
$ # display the contents of the input data file
$
$ cat input
Subject  Condition     Fp1            Fp2           FpZ
 S1     mixedneg    235.012        125.65        548.233
 S1     mixedneut   212.1          548.215      1548.2
 S1     pureneg     1578.22         15.25       1654.2
 S1     pureneut    4878.2         512.448       545.88
 S2     mixedneg    5465         87465         46984
 S2     mixedneut    545          7498          5136
 S2     pureneg      587          8798          6516
 S2     pureneut     874          8730.39        879
$
$
$ # Perl script to pivot the input data
$
$ perl -ane 'BEGIN {$fmt = "%-15s"}
            if ($. == 1) {
              $sub = $F[0];
              @items = @F[2..$#F];
            } elsif ( grep { $F[0] eq $x[$_][0] } 0..$#x ) {
              push @{$x[$#x]}, [ $F[1], [ @F[2..$#F] ] ] ;
            } else {
              push @x, [ $F[0], [ $F[1], [ @F[2..$#F] ] ] ];
            }
            END {
              printf ($fmt, $sub);
              foreach $itm (@items) {
                foreach $i ( 1.. $#{$x[0]} ) {
                  printf ($fmt, "$itm$x[0][$i][0]");
                }
              }
              print "\n";
              foreach $i ( 0..$#x ) {
                printf ($fmt, $x[$i][0]);
                foreach $k ( 0.. $#{$x[0][1][1]} ) {
                  foreach $j ( 1..$#{$x[$i]} ) {
                    printf ($fmt, $x[$i][$j][1][$k]);
                  }
                }
                print "\n";
              }
            }
           ' input
Subject        Fp1mixedneg    Fp1mixedneut   Fp1pureneg     Fp1pureneut    Fp2mixedneg    Fp2mixedneut   Fp2pureneg     Fp2pureneut    FpZmixedneg    FpZmixedneut   FpZpureneg     FpZpureneut
S1             235.012        212.1          1578.22        4878.2         125.65         548.215        15.25          512.448        548.233        1548.2         1654.2         545.88
S2             5465           545            587            874            87465          7498           8798           8730.39        46984          5136           6516           879
$
$
$

Each column in the output has a width of 15 characters. You can control this width by changing the value of "$fmt" variable in the BEGIN section. For example, the output for "$fmt" value equal to "%-20s" is as follows -

Code:
$
$
$ perl -ane 'BEGIN {$fmt = "%-20s"}
            if ($. == 1) {
              $sub = $F[0];
              @items = @F[2..$#F];
            } elsif ( grep { $F[0] eq $x[$_][0] } 0..$#x ) {
              push @{$x[$#x]}, [ $F[1], [ @F[2..$#F] ] ] ;
            } else {
              push @x, [ $F[0], [ $F[1], [ @F[2..$#F] ] ] ];
            }
            END {
              printf ($fmt, $sub);
              foreach $itm (@items) {
                foreach $i ( 1.. $#{$x[0]} ) {
                  printf ($fmt, "$itm$x[0][$i][0]");
                }
              }
              print "\n";
              foreach $i ( 0..$#x ) {
                printf ($fmt, $x[$i][0]);
                foreach $k ( 0.. $#{$x[0][1][1]} ) {
                  foreach $j ( 1..$#{$x[$i]} ) {
                    printf ($fmt, $x[$i][$j][1][$k]);
                  }
                }
                print "\n";
              }
            }
           ' input
Subject             Fp1mixedneg         Fp1mixedneut        Fp1pureneg          Fp1pureneut         Fp2mixedneg         Fp2mixedneut        Fp2pureneg          Fp2pureneut         FpZmixedneg         FpZmixedneut        FpZpureneg          FpZpureneut
S1                  235.012             212.1               1578.22             4878.2              125.65              548.215             15.25               512.448             548.233             1548.2              1654.2              545.88
S2                  5465                545                 587                 874                 87465               7498                8798                8730.39             46984               5136                6516                879
$
$

The order of the "FP"+"Condition" columns at the top is preserved as in the input data file. The floating point numbers correspond to their pivoted row/column pairs.

tyler_durden
# 13  
Old 12-02-2011
something like this

Code:
while(<DATA>){
  my @tmp = split;
  if($.==1){
   @headers=@tmp[2..4];
   next;
  }
  $tt{$tmp[0]}++;
  for(my $i=0;$i<=$#headers;$i++){
    $hash{$tmp[0]}->{$headers[$i].$tmp[1]}->{VAL}=$tmp[$i+2];
    $hash{$tmp[0]}->{$headers[$i].$tmp[1]}->{SEQ}=($i*4)+$tt{$tmp[0]}-1;
  }
}
foreach my $key(sort {$a<=>$b} keys %hash){
  print $key;
  foreach my $k(sort {$hash{$key}->{$a}->{SEQ} <=> $hash{$key}->{$b}->{SEQ}} keys %{$hash{$key}}){
    print " ", $hash{$key}->{$k}->{VAL};
  }
  print "\n";
}
__DATA__
Subject  Condition      Fp1        Fp2       FpZ
S1          mixedneg    235.012   125.65   548.233
S1          mixedneut   212.1     548.215    1548.2
S1          pureneg     1578.22   15.25      1654.2
S1          pureneut    4878.2     512.448    545.88
S2          mixedneg    5465       87465     46984
S2          mixedneut    545       7498         5136
S2          pureneg        587        8798        6516
S2          pureneut      874       8730.39     879

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Transposing rows to columns with multiple similar lines

Hi, I am trying to transpose rows to columns for thousands of records. The problem is there are records that have the same lines that need to be separated. the input file as below:- ID 1A02_HUMAN AC P01892; O19619; P06338; P10313; P30444; P30445; P30446; P30514; AC Q29680; Q29837;... (2 Replies)
Discussion started by: redse171
2 Replies

2. Shell Programming and Scripting

Transposing rows and columns (pivoting) using shell scripting

Here is the contents of an input file. A,1,2,3,4 10,aaa,bbb,ccc,ddd 11,eee,fff,ggg,hhh 12,iii,jjj,lll,mmm 13,nnn,ooo,ppp I wanted the output to be A 10 1 aaa 10 2 bbb 10 3 ccc 10 4 ddd 11 1 eee 11 2 fff 11 3 ggg 11 4 hhh ..... and so on How to do it in ksh... (9 Replies)
Discussion started by: ksatish89
9 Replies

3. Shell Programming and Scripting

transposing columns into rows

Hi, I need to transpose columns of my files into rows and save it as individual files. sample contents of the file below. 0.9120 0.7782 0.6959 0.6904 0.6322 0.8068 0.9082 0.9290 0.7272 0.9870 0.7648 0.8053 0.8300 0.9520 0.8614 0.6734 0.7910 0.6413 0.7126 0.7364 0.8491 0.8868 0.7586 0.8949... (8 Replies)
Discussion started by: ida1215
8 Replies

4. Shell Programming and Scripting

Transpose Data from Columns to rows

Hello. very new to shell scripting and would like to know if anyone could help me. I have data thats being pulled into a txt file and currently have to manually transpose the data which is taking a long time to do. here is what the data looks like. Server1 -- Date -- Other -- value... (7 Replies)
Discussion started by: Mikes88
7 Replies

5. Shell Programming and Scripting

Transposing Repeated Rows to Columns.

I have 1000s of these rows that I would like to transpose to columns. However I would like the transpose every 3 consecutive rows to columns like below, sorted by column 3 and provide a total for each occurrences. Finally I would like a grand total of column 3. 21|FE|41|0B 50\65\78 15... (2 Replies)
Discussion started by: ravzter
2 Replies

6. Shell Programming and Scripting

awk, string as record separator, transposing rows into columns

I'm working on a different stage of a project that someone helped me address elsewhere in these threads. The .docs I'm cycling through look roughly like this: 1 of 26 DOCUMENTS Copyright 2010 The Age Company Limited All Rights Reserved The Age (Melbourne, Australia) November 27, 2010... (9 Replies)
Discussion started by: spindoctor
9 Replies

7. UNIX for Dummies Questions & Answers

Suggestion to convert data in rows to data in columns

Hello everyone! I have a huge dataset looking like this: nameX nameX 0 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ............... nameY nameY 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ..... nameB nameB 0 1 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ..... (can be several thousands of codes) and I need... (8 Replies)
Discussion started by: kush
8 Replies

8. Shell Programming and Scripting

Transposing rows into columns

I have a file like the one given below P1|V1|V2 P1|V1|V3 P1V1|V2 P2|V1|V4 P2|V2|V6 P2|V1|V4 I want it convert to P1|V1|V2|V2|V3 P2|V1|V4|V2|V6 2nd and 3rd column should be considered as together and so the tird row is duplicate Any ideas? (3 Replies)
Discussion started by: prasperl
3 Replies

9. Shell Programming and Scripting

Data in Rows to Columns

Hi, I am a beginner in bash&perl. I have data in form of:- A 1 B 2 C 3 D 4 E 5 I would like your help to find a simple way to change it to :- A B C D E 1 2 3 4 5 Any help would be highly appreciated. (8 Replies)
Discussion started by: umaars
8 Replies

10. Shell Programming and Scripting

Rows to columns transposing and reformating.

----File attached. Input file =========== COL_1 <IP Add 1> COL_2 <Service1> COL_3 <ABCDEFG> COL_4 <IP ADD:PORT> COL_4 <IP ADD:PORT> COL_1 <IP Add 2> COL_2 <Service2> COL_2 <Service3> COL_2 <Service4> COL_3 <AAAABBB> COL_4 <IP ADD:PORT> COL_4 <IP ADD:PORT> COL_4 <IP... (27 Replies)
Discussion started by: bluethunder
27 Replies
Login or Register to Ask a Question