Help for a Perl newcomer! Transposing data from columns to rows

11-30-2011

Registered User

77, 24

Join Date: Oct 2001

Last Activity: 7 June 2013, 2:00 PM EDT

Location: Null

Posts: 77

Thanks Given: 1

Thanked 24 Times in 22 Posts

I used only one space to split the line. This might caused the problem. Here is an improved version with formatted output.

Code:

#!/usr/bin/perl

use strict;
use warnings;

my $line = <STDIN>;
chomp($line);
my @header_item = split(/[ ]+/, $line);
die("Invalid header line.\n") if (@header_item < 2);

my %new_header_item = ();
my %data = ();

while ($line = <STDIN>)
{
    chomp($line);
    $line =~ s/^[ ]*//g; # Strip leading spaces
    my @item = split(/[ ]+/, $line); # Use continuous spaces as delimiter
    next if (@item < 2);
    $data{$item[0]} = () if (!defined($data{$item[0]}));

    for (my $i = 2; $i < @item; $i++)
    {
        last if (!defined($header_item[$i]));
        my $name = $header_item[$i].$item[1];
        $new_header_item{$name} = '';
        $data{$item[0]}{$name} = $item[$i];
    }
}

my @h = sort keys(%new_header_item);
print sprintf('%-9s ', $header_item[0]);

foreach my $name (@h)
{
    print sprintf(' %12s', $name);
}

print "\n";

foreach my $subj (keys(%data))
{
    print sprintf(' %-9s', $subj);

    foreach my $key (@h)
    {
        my $value = defined($data{$subj}{$key}) ? $data{$subj}{$key} : 0;
        print sprintf(' %12.3f', $value);
    }

    print "\n";
}

exit(0);

Sample output:

Code:

[root@localhost temp]# ./parse_exp.pl < exp.txt 
Subject     Fp1mixedneg Fp1mixedneut   Fp1pureneg  Fp1pureneut  Fp2mixedneg Fp2mixedneut   Fp2pureneg  Fp2pureneut  FpZmixedneg FpZmixedneut   FpZpureneg  FpZpureneut
 S1             235.012      212.100     1578.220     4878.200      125.650      548.215       15.250      512.448      548.233     1548.200     1654.200      545.880
 S2            5465.000      545.000      587.000      874.000    87465.000     7498.000     8798.000     8730.390    46984.000     5136.000     6516.000      879.000

MacMonster

View Public Profile for MacMonster

Find all posts by MacMonster

11-30-2011

Registered User

6, 0

Join Date: Nov 2011

Last Activity: 1 December 2011, 8:57 AM EST

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thank you, that one does work a lot better!

I think the issue is that the way my raw data has been exported, it is not properly alligned in collumns itself, so when i run the script it does not have properly alligned data to work with in the first place!

I dont suppose you know of any scripts that could allign my raw data first, before running the script?

Currently my raw data, has all the items along the top, but the numbers underneath are not fitting perfectly under the headings;

for example,

Code:

Subjects Conditions Fp1_Li..nce Fpz_Li..nce Fp2_Li..nce F7_Lin..nce F3_Lin..nce
S1 MixNeutRem    14.7695    21.5628    22.7087     0.0992     0.3253
S1 MixNeutFor     1.2255     3.5854     5.2456    -1.6241    -1.5311
S1  MixNegRem    -0.5305     2.1425     6.1005     0.8624
S1 MixNegForg     2.6340     4.6610     6.0398     2.3030     2.6258

So, as you can see the raw data itself is not alligned correctly.

Am I right in thinking, this is probably why the script is not working perfectly.

If you know of anyways to correct this I would be very happy!

Thanks

Last edited by Franklin52; 12-03-2011 at 11:38 AM.. Reason: Please use code tags for data and code samples, thank you

Sarah_W

View Public Profile for Sarah_W

Find all posts by Sarah_W

11-30-2011

Registered User

77, 24

Join Date: Oct 2001

Last Activity: 7 June 2013, 2:00 PM EDT

Location: Null

Posts: 77

Thanks Given: 1

Thanked 24 Times in 22 Posts

Quote:

Originally Posted by Sarah_W

Still doesn't work? The later version should work with aligned or non-aligned data as it uses continuous spaces as delimiter, so the spaces between the columns don't a matter.

Last edited by MacMonster; 11-30-2011 at 04:06 PM.. Reason: Shorten the quote.

MacMonster

View Public Profile for MacMonster

Find all posts by MacMonster

11-30-2011

Registered User

6, 0

Join Date: Nov 2011

Last Activity: 1 December 2011, 8:57 AM EST

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

Sorry I was a bit quick to say it doesn't work, having a play around on excel i have managed to allign the collumns and the items at the top are specified perfectly by the condition, which is what I needed.

However, it has rearranged the the data, so the subject numbers are now randomally arranged rather than being subject 1,2,3 etc. Likewise the items at the top are arranged in a alphabetical order, not the order how they originally were. consequently it appears that the corresponding number data within the table does not match the relevent condition and item.

I have cross checked a few of the conditions and items to what the number in the table should be and they do not match.

Also in the row where the item and condition is specified along the top, it is fine apart from there is a random insertion of the the condition collum and obviously that does not need to be there. it looks like this....

CPz_Li..ncePurNeutRem ConditionsMixNegForg ConditionsMixNegRem ConditionsMixNeutFor ConditionsMixNeutRem 7.906 2.634 -0.53 1.226 14.77
obvioulsy, i do not need the 'conditions' listed like that in the top row, as that is what the MixNegRem etc... are already doing along side the items!

Hope all this makes sense!

Thanks again

---------- Post updated at 09:08 PM ---------- Previous update was at 08:53 PM ----------

Weirdly however, I have just ran it on only subject 1's data and it works perfectly!
all the conditions are with the items as they should be along the top and there is no intrusion of a 'condition' item along the top.
Again though the items at the top are in alphabetical order, is there any way to keep them in the original format?
Crucially the data in the table matches the conditions, so it may just mean that I have to run the script on each subject individually and merge the data into excell by hand.

I think when I try to run it on the large data set, the collumn called 'Condition' may be the confusing factor.

If you can think of anything to make it work on the large data set that would be fantastic, but if not I may have to just result to doing it by hand.

thanks

Sarah_W

View Public Profile for Sarah_W

Find all posts by Sarah_W

11-30-2011

Registered User

2,100, 402

Join Date: Apr 2009

Last Activity: 11 February 2020, 10:24 AM EST

Posts: 2,100

Thanks Given: 26

Thanked 402 Times in 360 Posts

Here's a different take on this pivot problem -

Code:

$
$
$ # display the contents of the input data file
$
$ cat input
Subject  Condition     Fp1            Fp2           FpZ
 S1     mixedneg    235.012        125.65        548.233
 S1     mixedneut   212.1          548.215      1548.2
 S1     pureneg     1578.22         15.25       1654.2
 S1     pureneut    4878.2         512.448       545.88
 S2     mixedneg    5465         87465         46984
 S2     mixedneut    545          7498          5136
 S2     pureneg      587          8798          6516
 S2     pureneut     874          8730.39        879
$
$
$ # Perl script to pivot the input data
$
$ perl -ane 'BEGIN {$fmt = "%-15s"}
            if ($. == 1) {
              $sub = $F[0];
              @items = @F[2..$#F];
            } elsif ( grep { $F[0] eq $x[$_][0] } 0..$#x ) {
              push @{$x[$#x]}, [ $F[1], [ @F[2..$#F] ] ] ;
            } else {
              push @x, [ $F[0], [ $F[1], [ @F[2..$#F] ] ] ];
            }
            END {
              printf ($fmt, $sub);
              foreach $itm (@items) {
                foreach $i ( 1.. $#{$x[0]} ) {
                  printf ($fmt, "$itm$x[0][$i][0]");
                }
              }
              print "\n";
              foreach $i ( 0..$#x ) {
                printf ($fmt, $x[$i][0]);
                foreach $k ( 0.. $#{$x[0][1][1]} ) {
                  foreach $j ( 1..$#{$x[$i]} ) {
                    printf ($fmt, $x[$i][$j][1][$k]);
                  }
                }
                print "\n";
              }
            }
           ' input
Subject        Fp1mixedneg    Fp1mixedneut   Fp1pureneg     Fp1pureneut    Fp2mixedneg    Fp2mixedneut   Fp2pureneg     Fp2pureneut    FpZmixedneg    FpZmixedneut   FpZpureneg     FpZpureneut
S1             235.012        212.1          1578.22        4878.2         125.65         548.215        15.25          512.448        548.233        1548.2         1654.2         545.88
S2             5465           545            587            874            87465          7498           8798           8730.39        46984          5136           6516           879
$
$
$

Each column in the output has a width of 15 characters. You can control this width by changing the value of "$fmt" variable in the BEGIN section. For example, the output for "$fmt" value equal to "%-20s" is as follows -

Code:

$
$
$ perl -ane 'BEGIN {$fmt = "%-20s"}
            if ($. == 1) {
              $sub = $F[0];
              @items = @F[2..$#F];
            } elsif ( grep { $F[0] eq $x[$_][0] } 0..$#x ) {
              push @{$x[$#x]}, [ $F[1], [ @F[2..$#F] ] ] ;
            } else {
              push @x, [ $F[0], [ $F[1], [ @F[2..$#F] ] ] ];
            }
            END {
              printf ($fmt, $sub);
              foreach $itm (@items) {
                foreach $i ( 1.. $#{$x[0]} ) {
                  printf ($fmt, "$itm$x[0][$i][0]");
                }
              }
              print "\n";
              foreach $i ( 0..$#x ) {
                printf ($fmt, $x[$i][0]);
                foreach $k ( 0.. $#{$x[0][1][1]} ) {
                  foreach $j ( 1..$#{$x[$i]} ) {
                    printf ($fmt, $x[$i][$j][1][$k]);
                  }
                }
                print "\n";
              }
            }
           ' input
Subject             Fp1mixedneg         Fp1mixedneut        Fp1pureneg          Fp1pureneut         Fp2mixedneg         Fp2mixedneut        Fp2pureneg          Fp2pureneut         FpZmixedneg         FpZmixedneut        FpZpureneg          FpZpureneut
S1                  235.012             212.1               1578.22             4878.2              125.65              548.215             15.25               512.448             548.233             1548.2              1654.2              545.88
S2                  5465                545                 587                 874                 87465               7498                8798                8730.39             46984               5136                6516                879
$
$

The order of the "FP"+"Condition" columns at the top is preserved as in the input data file. The floating point numbers correspond to their pivoted row/column pairs.

tyler_durden

durden_tyler

View Public Profile for durden_tyler

Find all posts by durden_tyler

12-02-2011

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

something like this

Code:

while(<DATA>){
  my @tmp = split;
  if($.==1){
   @headers=@tmp[2..4];
   next;
  }
  $tt{$tmp[0]}++;
  for(my $i=0;$i<=$#headers;$i++){
    $hash{$tmp[0]}->{$headers[$i].$tmp[1]}->{VAL}=$tmp[$i+2];
    $hash{$tmp[0]}->{$headers[$i].$tmp[1]}->{SEQ}=($i*4)+$tt{$tmp[0]}-1;
  }
}
foreach my $key(sort {$a<=>$b} keys %hash){
  print $key;
  foreach my $k(sort {$hash{$key}->{$a}->{SEQ} <=> $hash{$key}->{$b}->{SEQ}} keys %{$hash{$key}}){
    print " ", $hash{$key}->{$k}->{VAL};
  }
  print "\n";
}
__DATA__
Subject  Condition      Fp1        Fp2       FpZ
S1          mixedneg    235.012   125.65   548.233
S1          mixedneut   212.1     548.215    1548.2
S1          pureneg     1578.22   15.25      1654.2
S1          pureneut    4878.2     512.448    545.88
S2          mixedneg    5465       87465     46984
S2          mixedneut    545       7498         5136
S2          pureneg        587        8798        6516
S2          pureneut      874       8730.39     879

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

Shell Programming and Scripting

Help for a Perl newcomer! Transposing data from columns to rows

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Transposing rows to columns with multiple similar lines

Discussion started by: redse171

2. Shell Programming and Scripting

Transposing rows and columns (pivoting) using shell scripting

Discussion started by: ksatish89

3. Shell Programming and Scripting

transposing columns into rows

Discussion started by: ida1215

4. Shell Programming and Scripting

Transpose Data from Columns to rows

Discussion started by: Mikes88

5. Shell Programming and Scripting

Transposing Repeated Rows to Columns.

Discussion started by: ravzter

6. Shell Programming and Scripting

awk, string as record separator, transposing rows into columns

Discussion started by: spindoctor

7. UNIX for Dummies Questions & Answers

Suggestion to convert data in rows to data in columns

Discussion started by: kush

8. Shell Programming and Scripting

Transposing rows into columns

Discussion started by: prasperl

9. Shell Programming and Scripting

Data in Rows to Columns

Discussion started by: umaars

10. Shell Programming and Scripting

Rows to columns transposing and reformating.

Discussion started by: bluethunder