how to rearrange a matrix with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to rearrange a matrix with awk
# 1  
Old 10-14-2012
how to rearrange a matrix with awk

Hi, every one. I have two files ,one is in matrix like this, one is a list with the same data as the matrix.


HTML Code:
       AB   AE    AC    AD   AA    AF   
SA     3     4     5       6     4      6
SC     5     7     2      8     4      3
SD     4     6      5     3      8     3 
SE     45    34    5    12     09    1
SB      34   33    34    45    67    23
HTML Code:
AA   SA   4
AB   SA   3
AD   SC   8
AF    SB   23
.       .
.       .
.       .
.       .
How can I get the matrix in order like this? Can I do it using awk?

HTML Code:
       AA   AB    AC    AD   AE    AF   
SA     
SB     
SC     
SD     
SE      
Thank you!
# 2  
Old 10-14-2012
With gawk:
Code:
gawk 'NR==1{for(i=1;i<=NF;i++) y[i+1]=$i;next}
{x[++n]=$1;for(i=2;i<=NF;i++) data[$1,y[i]]=$i}
END{
nx=asort(x);ny=asort(y)
for(i=1;i<=nx;i++)
{
 if(i==1)
 {
  for(j=0;j<=ny;j++)
   print y[j]
  printf "\n"
 }
 print x[i]
 for(j=1;j<=ny;j++)
  print data[x[i],y[j]]
 printf "\n"
}}' ORS='\t' matrix_file

And, if you don't have gawk, it can also be done with other awks with a user-defined function to sort arrays. Let me know if that is the case with you.

Last edited by elixir_sinari; 10-14-2012 at 11:57 AM..
This User Gave Thanks to elixir_sinari For This Post:
# 3  
Old 10-14-2012
Some versions of awk provide a asort() and asorti() functions to sort data.
This example uses the sort utility to sort data and should work whether or not your version of awk has those functions. This would probably also be easier if it read in the three column input file rather than the unsorted matrix input, but I didn't want to take the time to add the entries you listed as ... to test the program. This example also automatically adjusts for varying input field widths:
Code:
#!/bin/ksh
# Usage: sortmatrix [unsorted_matrix_file]
# The following awk program reads in a matrix with unsorted row and
# column headings and writes the same data rearranged with sorted row and
# column headings.  The output column widths automatically adjust for
# input data width.  If this script is invoked with no operands, it will
# read unsorted matrix information from a file named "input".
#
# This script is written using ksh and awk on OS X.  Other than changing
# the #! line above, any shell supporting basic POSIX shell syntax
# should work.  If you're using a Solaris system, change "awk" to
# "/usr/xpg4/bin/awk" (nawk might also work, but I don't remember if it
# was updated to support "%*s" to specify a runtime settable field
# width in awk printf() statements).
#
# awk variables name key:
#       c0w     row heading Column Width
#       cc      data Column Count (not counting row heading)
#       cw[]    data Column Width
#       dbg     if non-0/non-null log DeBuGging info into the file named by dlf
#       dlf     name of Debugging Log File
#       i       loop control
#       ich[]   unsorted Input Column Headers
#       och[]   sorted Output Column Headers
#       sc      Sort Command used to execute to sort column and row headings
#       tmpfile TeMPFILE to hold results of sorting row and column headings
tmpfile="sortmatrix.order"
awk -v dbg=0 -v tmpfile="$tmpfile" 'BEGIN{
        sc="sort -o " tmpfile
        if(dbg) dlf = "debug.out"
}
dbg{    printf("input: %s\n", $0) > dlf
}
NR == 1{# Read and sort data oolumn headings from the 1st input line
        for(i = 1; i <= NF; i++) {
                if(dbg) print $i > dlf
                ich[i] = $i
                print $i | sc
        }
        close(sc)
        # Produce the list of sorted output column headings
        cc = 0
        while((getline och[++cc] < tmpfile) == 1) { # read the sorted headings
                cw[och[cc]] = length(och[cc]) # set initial column width
                if(dbg) printf("och[%d]=%s, cw[%s]=%d\n", cc, och[cc], och[cc],
                        cw[och[cc]]) > dlf
        }
        close(tmpfile)
        # Verify that the sort worked
        if(cc != NF) {
                printf("sortmatrix: %d columns read, but sort returned %d\n",
                        NF, cc)
                exit 1
        }
        next
}
{       # Process remaining input lines
        # Feed the row heading into sort
        print $1 | sc
        if(length($1) > c0w) {
                # Increase output column 0 width to match row heading length
                c0w = length($1)
                if(dbg) printf("c0w increeased to %d\n", c0w) > dlf
        }
        for(i = 2; i <= NF; i++) {
                # Save the data in an array indexed by row heading and
                # column heading.
                data[$1,ich[i - 1]] = $i
                if(dbg) printf("data[%s,%s]=%s\n", $1, ich[i-1], $i) > dlf
                if(cw[ich[i - 1]] < length($i)) {
                        # A data field is wider than the column heading,
                        # adjust the column width.
                        cw[ich[i - 1]] = length($i)
                        if(dbg) printf("cw[%s] increased to %d\n",
                                $i, length($i)) > dlf
                }
        }
}
END{    # Finish the row headings sort
        close(sc)
        # Print the column headings
        printf("%*s", c0w, "")
        for(i = 1; i <= cc; i++) {
                cw[och[i]]++ # Add room to put a space between output columns.
                printf("%*s", cw[och[i]], och[i])
        }
        printf("\n")
        # Read the sorted row headings and print the data
        rowcnt = 1
        while((getline row < tmpfile) == 1) {
                printf("%-*s", c0w, row)
                for(i = 1; i <= cc; i++)
                        printf("%*s", cw[och[i]], data[row,och[i]])
                printf("\n")
                rowcnt++
        }
        # Verify that the sort worked
        if(rowcnt != NR) {
                printf("sortmatrix: Read %d data rows, sort only returned %d\n",
                        NR - 1, rowcnt - 1)
                exit 2
        }
}' ${1:-input}
rm $tmpfile

When the above is saved in a file named sortmatrix and made executable with chmod +x sortmatrix run with a file named input that contains:
Code:
       AB   AE    AC    AD   AA    AF   AC12 AClonger
SA     3     4     5       6     4      6 121 long1
SC     5     7     2      8     4      3 123 long3
SD     4     6      5     3      8     3 124 long4
SE     45    34    5    12     09    1 125 long5
SB      34   33    34    45    67    2 122 long2
more   m1 m2 m3 m4 m5 m6 m7-xxx m8

it produces the following output:
Code:
     AA AB AC   AC12 AClonger AD AE AF
SA    4  3  5    121    long1  6  4  6
SB   67 34 34    122    long2 45 33  2
SC    4  5  2    123    long3  8  7  3
SD    8  4  5    124    long4  3  6  3
SE   09 45  5    125    long5 12 34  1
more m5 m1 m3 m7-xxx       m8 m4 m2 m6

Note that if you change -v dbg=0 to -v dbg=1, this script will produce a debugging log file providing data that may be useful if you need to make changes to some part of the program's logic.
These 3 Users Gave Thanks to Don Cragun For This Post:
# 4  
Old 10-15-2012
Many thanks!

Last edited by xshang; 10-16-2012 at 04:51 PM..
# 5  
Old 10-15-2012
Quote:
Originally Posted by xshang
NICE code! Thank you very much for your great work.

I faced another problem and don't know how to fix it.

I have some columns like this:

HTML Code:
SA1
SA2
SA11
SA3
SA5
SA27
SA10
SA15
SA25
It showed the following result after sorted:

HTML Code:
SA1
SA10
SA11
SA15
SA2
SA25
SA27
SA3
SA5
What should I do if I want the sorted result like this:
HTML Code:
SA1
SA2
SA3
SA5
SA10
SA11
SA15
SA25
SA27
Many thanks!
If all of your header names are exactly two alphabetic characters followed by one or more numeric characters, you could change the sort command for the column headings from:
Code:
sc="sort -o " tmpfile

to:
Code:
csc="sort -k1.1,1.2 -k1.3n,1 -o " tmpfile

and use csc (column sort command) instead of sc when sorting the column headers. But if only some of your names fit this pattern, the easiest thing to do is to rename your headers so they have leading zeros to make an alphabetic sort work, (i.e., SA01, SA02, SA03, SA05, SA10, SA11, SA15, SA25, and SA27). This could be coded into the awk program, but I don't have the time to devote to adding the leading zeros before sorting them, stripping off the leading zeros after sorting them, and figuring out what to do if one of you input column names already as a 0 followed by other numeric characters.

Do you also have a problem with sort order for the first field in the rows in your matrix?
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 10-15-2012
Quote:
Do you also have a problem with sort order for the first field in the rows in your matrix?
No problem! Thank you! I'm really appreciate it.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Transpose matrix, and rearrange columns common with another file

This is my first post, I apologize if I have broken rules. Some assistance with the following will be very helpful. I have a couple of files, both should ultimately have common columns only, arranged in the same order. This file needs to be transposed, to bring the rows to columns ... (2 Replies)
Discussion started by: abh.kumar
2 Replies

2. Shell Programming and Scripting

Use awk to count and rearrange entries

How can I use awk to count the occurrence of field 2 and rearrange the output like below: Input: OA1 FM AA OA0 FM CC ON0 FM CC FN1 FN BB OY1 FN BB OY2 FN CC OY3 FN CC YT0 FM AA KW1 FN CC KW3 FM BB YT4 FM AA FN2 FT BB OA3 FT AA ON7 FM BB (14 Replies)
Discussion started by: aydj
14 Replies

3. Shell Programming and Scripting

Rearrange Lines with awk

I need to rearrange the lines in the input file in the example below: Input: LG1 R500 A-170 F1:81 F1:22 F2:32 F1:71 LG1 R700 A-203 F2:17 E2:18 LG1 R700 B-224 E1:9 LG2 R500 C-235 E2:9 F2:17 Output: LG1 R500 A-170 F1:81 LG1 R500 A-170 F1:22 LG1 R500 A-170 F2:32 LG1 R500 A-170... (2 Replies)
Discussion started by: aydj
2 Replies

4. Shell Programming and Scripting

Using awk to rearrange fields

Hi, I am required to arrange columns of a file i.e make the 15th column into the 1st column. I am doing awk 'begin {fs=ofs=","} {print $15,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14}' ad.data>ad.csv the problem is that column 15 gets to column 1 but it is not comma separated with the... (10 Replies)
Discussion started by: seddoubt
10 Replies

5. Shell Programming and Scripting

Summing up a matrix using awk

Hi there, If anyone can help me sorting out this small task would be great. Given a matrix like the following: 100 3 3 3 3 3 ... 200 5 5 5 5 5 ... 400 1 1 1 1 1 ... 500 8 8 8 8 8 ... 900 0 0 0 0... (5 Replies)
Discussion started by: JRodrigoF
5 Replies

6. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

7. Shell Programming and Scripting

Using awk to rearrange data according to date

Hi, I have a large data frame as shown below, where data is separated into years. 10 May 2011 Created: 10 May 11 15:05 GMT Scale: SIO-2005 and others GC-MD, Cape Grim, Tasmania, Lat.: 40.68S, Lon.: 144.69E, Alt: 94m above sea level You can use the following format in Fortran to read data... (4 Replies)
Discussion started by: gd9629
4 Replies

8. Shell Programming and Scripting

awk matrix problem

hi there I'm very new in programing and i've started with awk. I'm processing 200 data files and I need to do some precessing on them. The files have 3 columns with N-lines for each line a have on the first and second value is the same for all the files and only the third is variable. like... (2 Replies)
Discussion started by: philstar
2 Replies

9. UNIX for Dummies Questions & Answers

Rearrange columns and rows with awk

Hello, I have the following problem I have two columns with numbers arranged as follows: x1 y1 x2 y2 .... .... x250 y250 Now I need them arranged as follows: "string a" x1 y1 x1 y2 "string b" "string a" x1 y2 x2 y2 (3 Replies)
Discussion started by: Tom46
3 Replies

10. Shell Programming and Scripting

Need help in AWK;Search String and rearrange columns

Hi AWK Experts, file1.txt contains: 29b11b820ddcc:-|OHad.perWrk|spn_id=AH111|spn_ordtyp=MY_REQ|msg_typ=ah.ntf.out|spn_ordid=928176|spn_nid=3|msg_strt=1175615334703|msg_que=oput|diff=371|17:48:55,074|17:48:55,084|10 file2.txt contains:... (2 Replies)
Discussion started by: spring_buck
2 Replies
Login or Register to Ask a Question