sort on second column only based on first column

03-15-2010

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

I like the concise awk code of radoulov.

I sometimes prefer to think in terms of large tasks before I code up a solution in awk, perl, c, etc (if needed for performance). For example, this problem could be considered as one of an alternate collating sequence: that of the first field. It could also be considered as a grouping problem.

Because the input is already in groups of a specific, desired order, the grouping view lets me think what I need to do to each group. Namely I need to sort by the second field. I cannot normally do that to a part of a file. However, if I could identify each section, then I'd be a step in the right direction.

There are no specific commands to do that, but you can find some interesting codes on the net that will. Here's how this can be done using some of these codes. The names of the codes should be suggestive of what they do:

Code:

#!/usr/bin/env bash

# @(#) s2	Demonstrate group sort, missing textutils.

# Infrastructure details, environment, commands for forum posts. 
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo ; echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p blockwise sort
set -o nounset
echo

FILE=${1-data1}

# If "specimen" does not exist, replace with "cat".
specimen $FILE

echo " Preliminary conditions:"
t1=$( diff $FILE expected-output.txt | wc -l )
echo " About $t1 lines differ."

echo
echo " Results:"
split_at_colchange 1 $FILE |
tee t1 |
blockwise "sort -k2,2" |
tee t2 |
remove_blank_lines > tf

if ! cmp expected-output.txt tf
then
  sdiff -w78 expected-output.txt tf
else 
  echo
  echo " Pass - generated output and expected-output.txt are identical."
  echo
  specimen tf
fi

exit 0

producing:

Code:

% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
blockwise - ( ~/bin/blockwise Sep 29 12:53 )
sort (GNU coreutils) 6.10

Edges: 10 of 23 lines in data1
     1	AAAlkalines	Energizer
     2	AAAlkalines	Energizer
     3	AAAlkalines	Energizer
     4	AAAlkalines	Sunlight
     5	AAAlkalines	Sunlight
   ...
    19	RechargableAAA	Duracell
    20	EmergencyLight	AlFaris
    21	EmergencyLight	AlFaris
    22	EmergencyLight	Geepas
    23	EmergencyLight	Geepas

 Preliminary conditions:
 About 18 lines differ.

 Results:

 Pass - generated output and expected-output.txt are identical.

Edges: 10 of 23 lines in tf
     1	AAAlkalines	Energizer
     2	AAAlkalines	Energizer
     3	AAAlkalines	Energizer
     4	AAAlkalines	Energizer
     5	AAAlkalines	Energizer
   ...
    19	RechargableAAA	Energizer
    20	EmergencyLight	AlFaris
    21	EmergencyLight	AlFaris
    22	EmergencyLight	Geepas
    23	EmergencyLight	Geepas

I canonicalized the data and reference output file so that the separators were TABs.

The steps are:
1. Separate the blocks
2. For each block, sort on the second field,
3. Remove the separator between blocks.

The temporary files from the tee commands can be examined to see the intermediate-step results.

The collection of perl codes can be found at The Missing Textutils

Best wishes ... cheers, drl

This User Gave Thanks to drl For This Post:

drl

View Public Profile for drl

Find all posts by drl

Shell Programming and Scripting

sort on second column only based on first column

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sort based on one column

Discussion started by: kshitij

2. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Discussion started by: sargotrons

3. Shell Programming and Scripting

Sort based on certain value in a column

Discussion started by: redse171

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Discussion started by: prashob123

5. Shell Programming and Scripting

Sort based on column 1, not working with awk

Discussion started by: Junes

6. UNIX for Dummies Questions & Answers

Sort command in one column and not effect to another column

Discussion started by: GeodusT

7. UNIX for Dummies Questions & Answers

How to sort a column based on numerical ascending order if it includes e-10?

Discussion started by: evelibertine

8. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

9. Shell Programming and Scripting

Sort file based on column

Discussion started by: rsivasan

10. Shell Programming and Scripting

Question about sort specific column and print other column at the same time !

Discussion started by: patrick87