sort on second column only based on first column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sort on second column only based on first column
# 8  
Old 03-15-2010
Hi.

I like the concise awk code of radoulov.

I sometimes prefer to think in terms of large tasks before I code up a solution in awk, perl, c, etc (if needed for performance). For example, this problem could be considered as one of an alternate collating sequence: that of the first field. It could also be considered as a grouping problem.

Because the input is already in groups of a specific, desired order, the grouping view lets me think what I need to do to each group. Namely I need to sort by the second field. I cannot normally do that to a part of a file. However, if I could identify each section, then I'd be a step in the right direction.

There are no specific commands to do that, but you can find some interesting codes on the net that will. Here's how this can be done using some of these codes. The names of the codes should be suggestive of what they do:
Code:
#!/usr/bin/env bash

# @(#) s2	Demonstrate group sort, missing textutils.

# Infrastructure details, environment, commands for forum posts. 
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo ; echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p blockwise sort
set -o nounset
echo

FILE=${1-data1}

# If "specimen" does not exist, replace with "cat".
specimen $FILE

echo " Preliminary conditions:"
t1=$( diff $FILE expected-output.txt | wc -l )
echo " About $t1 lines differ."

echo
echo " Results:"
split_at_colchange 1 $FILE |
tee t1 |
blockwise "sort -k2,2" |
tee t2 |
remove_blank_lines > tf

if ! cmp expected-output.txt tf
then
  sdiff -w78 expected-output.txt tf
else 
  echo
  echo " Pass - generated output and expected-output.txt are identical."
  echo
  specimen tf
fi

exit 0

producing:
Code:
% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
blockwise - ( ~/bin/blockwise Sep 29 12:53 )
sort (GNU coreutils) 6.10

Edges: 10 of 23 lines in data1
     1	AAAlkalines	Energizer
     2	AAAlkalines	Energizer
     3	AAAlkalines	Energizer
     4	AAAlkalines	Sunlight
     5	AAAlkalines	Sunlight
   ...
    19	RechargableAAA	Duracell
    20	EmergencyLight	AlFaris
    21	EmergencyLight	AlFaris
    22	EmergencyLight	Geepas
    23	EmergencyLight	Geepas

 Preliminary conditions:
 About 18 lines differ.

 Results:

 Pass - generated output and expected-output.txt are identical.

Edges: 10 of 23 lines in tf
     1	AAAlkalines	Energizer
     2	AAAlkalines	Energizer
     3	AAAlkalines	Energizer
     4	AAAlkalines	Energizer
     5	AAAlkalines	Energizer
   ...
    19	RechargableAAA	Energizer
    20	EmergencyLight	AlFaris
    21	EmergencyLight	AlFaris
    22	EmergencyLight	Geepas
    23	EmergencyLight	Geepas

I canonicalized the data and reference output file so that the separators were TABs.

The steps are:
1. Separate the blocks
2. For each block, sort on the second field,
3. Remove the separator between blocks.

The temporary files from the tee commands can be examined to see the intermediate-step results.

The collection of perl codes can be found at The Missing Textutils

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sort based on one column

Hi All , I am having an input file like this Input file 7 sks/jsjssj/ddjd/hjdjd/hdhd/Q 10 0.5 13 dkdkd/djdjd/djdjd/djd/QB 01 0.5 ldld/dkd/jdf/fjfjf/fjf/Q 0.5 10 sjs/jsdd/djdkd/dhd/Q 01 0.5 21 kdkd/djdd/djdd/jdd/djd/QB 01 0.5 dkdld/djdjd/djd/Q 01 0.5 ... (9 Replies)
Discussion started by: kshitij
9 Replies

2. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

3. Shell Programming and Scripting

Sort based on certain value in a column

Hi, i need to sort content of files based on a specific value. An example as below. Input1.txt Col_1 SW_MH2_ST ST_F72_9S SW_MH3_S6 Col_2 SW_MH3_AS7 ST_S15_9CH SW_MH3_AS8 SW_MH3_ST Col_3 ST_M93_SZ ST_C16_TC (12 Replies)
Discussion started by: redse171
12 Replies

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

5. Shell Programming and Scripting

Sort based on column 1, not working with awk

Hi Guru, I need some help regarding awking the output so it only show the first line (based on column) of each row. So If column has 1, three row, then it only show the first line of that row, based on similar character in column 1. So i am trying to achieve a sort, based on column one and... (3 Replies)
Discussion started by: Junes
3 Replies

6. UNIX for Dummies Questions & Answers

Sort command in one column and not effect to another column

If my data is numerical : 1 = 101 2 = 102 3 = 104 4 = 104 7 = 103 8 = 103 9 = 105 I need the result like below: 1 = 101 2 = 102 3 = 103 4 = 103 7 = 104 8 = 104 9 = 105 (4 Replies)
Discussion started by: GeodusT
4 Replies

7. UNIX for Dummies Questions & Answers

How to sort a column based on numerical ascending order if it includes e-10?

I have a column of numbers in the following format: 1.722e-05 2.018e-05 2.548e-05 2.747e-05 7.897e-05 4.016e-05 4.613e-05 4.613e-05 5.151e-05 5.151e-05 5.151e-05 6.1e-05 6.254e-05 7.04e-05 7.12e-05 7.12e-05 (6 Replies)
Discussion started by: evelibertine
6 Replies

8. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

9. Shell Programming and Scripting

Sort file based on column

Hi, My input file is $cat samp 1 siva 1 raja 2 siva 1 siva 2 raja 4 venkat i want sort this name wise...alos need to remove duplicate lines. i am using cat samp|awk '{print $2,$1}'|sort -u it showing raja 1 (3 Replies)
Discussion started by: rsivasan
3 Replies

10. Shell Programming and Scripting

Question about sort specific column and print other column at the same time !

Hi, This is my input file: ali 5 usa abc abu 4 uk bca alan 6 brazil bac pinky 10 utah sdc My desired output: pinky 10 utah sdc alan 6 brazil bac ali 5 usa abc abu 4 uk bca Based on the column two, I want to do the descending order and print out other related column at the... (3 Replies)
Discussion started by: patrick87
3 Replies
Login or Register to Ask a Question