Visit Our UNIX and Linux User Community


sorting


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sorting
# 1  
Old 08-16-2010
sorting

Hi all,

Does anyone can help me the following question? I would like to write an AWK script.
In the following input file, each number in "start" is paired with numbers in column "end".

 
NoStartEnd
A22,222,33,22,1233,3232,44555,333,222,55,1235,3235,66
B33,333,22,4466,340,44,55
C66,55,55575,58,560
 

For example, the first row has 7 pairs of numbers, they are in the following order,
22,555
222,333
33,222
22,55
1233,1235
3232,3235
44,66

For each row, I would like to sort those pair of number by "start" then by "end".
The output after being sorted is follow,

 
 
No Start End
A22,22,33,44,222,1233,323255,555,222,66,333,1235,3235
B22,33,44,333 44,66,55,340
C55,66,555 58,75,560
 

Your advice is much appreciated.

Thanks,
Phoebe

Last edited by phoeberunner; 08-16-2010 at 04:48 PM..
# 2  
Old 08-16-2010
Quote:
Originally Posted by phoeberunner
...
In the following input file, each number in "start" is paired with numbers in column "end".

No StartEnd
A 22,222,33,22,1233,3232,44 555,333,222,55,1235,3235,66
B 33,333,22,44 66,340,44,55
C 66,55,555 75,58,560

...
For each row, I would like to sort those pair of number by "start" then by "end".
The output after being sorted is follow,

 
No Start End
A 22,22,33,44,222,1233,3232 55,555,222,66,333,1235,3235
B 22,33,44,333 44,66,55,340
C 55,66,555 58,75,560

...

Here's one way to do it in Perl -

Code:
$
$
$ cat f4
No Start End
A 22,222,33,22,1233,3232,44 555,333,222,55,1235,3235,66
B 33,333,22,44 66,340,44,55
C 66,55,555 75,58,560
$
$
$
$ perl -lane 'if ($. > 1) {
              @x=split(/,/,$F[1]);
              @y=split(/,/,$F[2]);
              for $i (0..$#x) {push @z,"$x[$i]~$y[$i]"}
              @idx1 = (); @idx2 = ();
              for (@z) {
                ($key1, $key2) = /(\d+)~(\d+)/;
                push @idx1, $key1;
                push @idx2, $key2;
              }
              @sorted = @z[sort{$idx1[$a]<=>$idx1[$b] || $idx2[$a]<=>$idx2[$b]} 0..$#z];
              @x=(); @y=();
              foreach (@sorted) {/(\d+)~(\d+)/ && push(@x,$1) && push(@y,$2)}
              print "$F[0] ",join(",",@x)," ",join(",",@y);
              @x=(); @y=(); @z = (); @new = ();
            } else {print}
           ' f4
No Start End
A 22,22,33,44,222,1233,3232 55,555,222,66,333,1235,3235
B 22,33,44,333 44,66,55,340
C 55,66,555 58,75,560
$
$

tyler_durden
# 3  
Old 08-16-2010
Can you do in with awk script? Thanks!

Phoebe
# 4  
Old 08-16-2010
Quote:
Originally Posted by phoeberunner
Can you do in with awk script? ...
No I cannot, sorry.
I've yet to find a language that gives me as much power despite the terseness as Perl. If I understand correcly, awk does not have an inbuilt array sort routine, but I may be mistaken.

The only other non-Perl and non-Ruby method I could think of leverages the shell's multiple-key sort command, the algorithm for which would be something like -

(1) Read a line and form the start-end pairs separated by a delimiter.
(2) Redirect the pairs, one per line, to a temporary file.
(3) Sort the temporary file on the multiple numeric keys.
(4) Read them back and print to stdout.
(5) Repeat (1) through (4) till eof.

tyler_durden

---------- Post updated at 05:15 PM ---------- Previous update was at 04:16 PM ----------

Well ok, here's an implementation of algorithm posted earlier, in a Bash shell script -

Code:
$
$
$ cat f4
No Start End
A 22,222,33,22,1233,3232,44 555,333,222,55,1235,3235,66
B 33,333,22,44 66,340,44,55
C 66,55,555 75,58,560
$
$
$ cat f4.sh
#!/usr/bin/bash
LNUM=1
while read NO START END; do
  if [ "$LNUM" -eq "1" ]; then
    echo "$NO $START $END"
  else
    echo $START | tr -s "," "\n" >f1.tmp
    echo $END | tr -s "," "\n" >f2.tmp
    paste -d~ f1.tmp f2.tmp | sort -t~ -nk1,1 -nk2,2 >f3.tmp
    IFS="~"
    INDX=1
    while read NUM1 NUM2; do
      if [ "$INDX" -eq "1" ]; then
        LIST1=$NUM1
        LIST2=$NUM2
      else
        LIST1="$LIST1,$NUM1"
        LIST2="$LIST2,$NUM2"
      fi
      INDX=`expr $INDX + 1`
    done <f3.tmp
    echo "$NO $LIST1 $LIST2"
    unset IFS
  fi
  LNUM=`expr $LNUM + 1`
done <f4
rm f1.tmp f2.tmp f3.tmp
$
$
$
$ . f4.sh
No Start End
A 22,22,33,44,222,1233,3232 55,555,222,66,333,1235,3235
B 22,33,44,333 44,66,55,340
C 55,66,555 58,75,560
$
$
$

I believe it's not good enough if the data is huge. Use of arrays instead of temp files should make it faster. It's definitely slower than the Perl script.

tyler_durden
# 5  
Old 08-16-2010
Quote:
Originally Posted by durden_tyler
No I cannot, sorry.
I've yet to find a language that gives me as much power despite the terseness as Perl. If I understand correctly, awk does not have an inbuilt array sort routine, but I may be mistaken.
gawk has the function asort(), which can sort array's value.

String Functions - The GNU Awk User's Guide

Code:
asort(source [, dest]) #
asort is a gawk-specific extension, returning the number of elements in the array source. The contents of source are sorted using gawk's normal rules for comparing values (in particular, IGNORECASE affects the sorting) and the indices of the sorted values of source are replaced with sequential integers starting with one. If the optional array dest is specified, then source is duplicated into dest. dest is then sorted, leaving the indices of source unchanged. For example, if the contents of a are as follows:
          a["last"] = "de"
          a["first"] = "sac"
          a["middle"] = "cul"
A call to asort:

          asort(a)
results in the following contents of a:

          a[1] = "cul"
          a[2] = "de"
          a[3] = "sac"
The asort function is described in more detail in Array Sorting. asort is a gawk extension; it is not available in compatibility mode (see Options).

# 6  
Old 08-16-2010
Solution using gawk (support for asort assumed).

Code:
awk '
        /Start/ { print; next; }                # header; just as it
        {
                ns = split( $2, st, "," );
                ne = split( $3, en, "," );

                if( ns != ne )
                {
                        printf( "count mismatch; skipped\n" );
                        next;
                }

                delete joined;               # ditch array built for last record
                for( i = 1; i <= ns; i++ )
                        joined[i] = sprintf( "%05d %05d", st[i], en[i] );       # 0s force proper sort order

                asort( joined )             # sort the array using array value

                printf( "%s ", $1 );                    # a,b,c etc. from col 1 of input
                for( i = 1; i <= ns; i++ )
                {
                        printf( "%d%s", joined[i]+0, i == ns ? " " : "," );     # print start column
                        gsub( ".* ", " ", joined[i] );  # delete start number from the pair
                        oe[i] = joined[i]+0;            # save end column pair
                }

                for( i = 1; i <= ns; i++ )              # print end column
                        printf( "%d%s", oe[i]+0, i == ns ? "\n" : "," );
        }
' input-file

# 7  
Old 08-17-2010
Code:
$ cat urfile
No       Start End
A       22,22,33,44,222,1233,3232       55,555,222,66,333,1235,3235
B       22,33,44,333     44,66,55,340
C       55,66,555        58,75,560

$ gawk '
function sort (a)
{   c=""
    split(a,b,",")
    count=asort(b)
    for (i=1;i<=count;i++) c=c "," b[i]
    gsub(/^,/,"",c)
    return c
}
NR>1{for (j=2;j<=NF;j++) $j=sort($j)}1
' urfile

No       Start End
A 22,22,33,44,222,1233,3232 55,66,222,333,555,1235,3235
B 22,33,44,333 44,55,66,340
C 55,66,555 58,75,560


Previous Thread | Next Thread
Test Your Knowledge in Computers #642
Difficulty: Easy
The first full release of NeXTSTEP 1.0 shipped on September 18, 1988.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sorting

Hii guys, I need to sort my file and remove duplicates before writing to another file. The first line in the file are column names. I dont want this line to be sorted and should always be the first line in the output. sort -u file.txt > file1.txt. is the command that i am using... (4 Replies)
Discussion started by: just4u_sharath
4 Replies

2. Shell Programming and Scripting

sorting help

Hi, Please i need help in writing an 'awk' script in sorting the following data; traceroute6 to 2001:1ba0:2a0:5965:0:30:24:1 (2001:1ba0:2a0:5965:0:30:24:1) from 2001:418:1::62, 64 hops max, 16 byte packets 1 2001:418:1::4 0.342 ms 2 2001:418:1::1 0.630 ms 3 2001:504:16::1b1b 0.393 ms 4... (6 Replies)
Discussion started by: sam127
6 Replies

3. Shell Programming and Scripting

Sorting HELP

Hi, I have posted related topic but as i continue the research I find more need to sort the data. AS(2607:f278:4101:11:dead:beef:f00f:f), AS786 AS6453 AS7575 AS7922 AS(2607:f2e0:f:1db::16), AS786 AS3257 AS36252 AS786 AS3257 AS36252 AS(2607:f2f8:1700::2), AS786 AS6939 AS25795 ... (6 Replies)
Discussion started by: sam127
6 Replies

4. UNIX for Advanced & Expert Users

HELP on sorting

hi everyone, I am kind of new to this forum. I need help in sorting this data out accordingly, I am actually doing a traceroute application and wants my AS path displayed in front of my address like this; 192.168.1.1 AS28513 AS65534 AS5089 AS5089 .... till the last AS number and if possible... (1 Reply)
Discussion started by: sam127
1 Replies

5. UNIX for Dummies Questions & Answers

HELP on sorting

hi everyone, I am kind of new to this forum. I need help in sorting this data out accordingly, I am actually doing a traceroute application and wants my AS path displayed in front of my address like this; 192.168.1.1 AS28513 AS65534 AS5089 AS5089 .... till the last AS number and if possible... (1 Reply)
Discussion started by: sam127
1 Replies

6. Shell Programming and Scripting

Sorting

Let's say that I have a database that I call part ID. This database has the following grouping: Dart1=4 Dart2=8 Dart3=12 Fork1=68 Fork2=72 Fork3=64 Bike1=28 Bike2=24 Bike3=20 Car1=44 Car2=40 Car3=36 I want to write a program that would read this database and tell me when the... (19 Replies)
Discussion started by: Ernst
19 Replies

7. Homework & Coursework Questions

Sorting help

i have list of files: Wang De Wong CVPR 09.pdf Yaacob AFGR 99 Second edition.pdf Shimon CVPR 01.pdf Den CCC 97 long one.pdf Ronald De Bour CSPP 04.pdf ..... how can i sort this directory so the output will be in the next format: <year>\t<conference/journal>\t<author list> - t is tab (its... (1 Reply)
Discussion started by: nirnir26
1 Replies

8. UNIX for Dummies Questions & Answers

Sorting help

i have list of files: Wang De Wong CVPR 09.pdf Yaacob AFGR 99 Second edition.pdf Shimon CVPR 01.pdf Den CCC 97 long one.pdf Ronald De Bour CSPP 04.pdf ..... how can i sort this directory so the output will be in the next format: <year>\t<conference/journal>\t<author list> - t is tab (its... (1 Reply)
Discussion started by: nirnir26
1 Replies

9. UNIX for Dummies Questions & Answers

Sorting help

how can i sort the next list just by look at the numbers (ignore letters) example: abc123 dff4f aaa2aa bbbb55555bb output: aaa2aa dff4f abc123 bbbb55555bb (1 Reply)
Discussion started by: nirnir26
1 Replies

10. Shell Programming and Scripting

Need immediate help with sorting!!!

hey, I have a file that looks smthng like this: /*--- abcd_0050 ---*/ asdfjk adsfkja lkjljgafsd /*---abcd_0005 ---*/ lkjkljbfkgj ldfksjgf dfkgfjb /*-- abcd_0055--*/ klhfdghd dflkjgd jfdg I would like it to be sorted so that it looks like this: /*---abcd_0005 ---*/ lkjkljbfkgj (9 Replies)
Discussion started by: sasuke_uchiha
9 Replies

Featured Tech Videos