Sort based on certain value in a column

05-02-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Maybe I didn't understand what you're trying to do. I thought the Col_n was supposed to cause the following lines up to the next Col_n line to be sorted in increasing alphanumeric order based on the nth character in the line.

Unfortunately, with the sample data given, there is no way to tell if you're trying to sort on the whole line or on the nth character since the results would be the same. If you are trying to sort on the nth character, rdrtx1's script won't do that. Assuming that there aren't any spaces or tabs in your input file (at least not before the character position that is to be sorted), the following might work:

Code:

awk '
function finish() {
	if(sc != "") close(sc)
}

/^Col_/ {
	finish()
	print
	col = substr($0, 5)
	sc = sprintf("sort -k1.%d,1.%d", col, col)
	next
}
{	print | sc
}
END {	finish()
}' File1.txt

If your input file contains:

Code:

Col_1
SW_MH2_ST
ST_F72_9S
SW_MH3_S6
Col_10
SW_MH3_AS7
ST_S15_9CH
SW_MH3_AS8
SW_MH3_ST
Col_5
ST_M93_SZ
ST_C16_TC
Col_4
Abc4123
Cde3234
Bcd2345
Def1234

rdrtx1's script will produce:

Code:

Col_1
ST_F72_9S
SW_MH2_ST
SW_MH3_S6
Col_10
ST_S15_9CH
SW_MH3_AS7
SW_MH3_AS8
SW_MH3_ST
Col_5
ST_C16_TC
ST_M93_SZ
Col_4
Abc4123
Bcd2345
Cde3234
Def123

while the script above will produce:

Code:

Col_1
ST_F72_9S
SW_MH2_ST
SW_MH3_S6
Col_10
SW_MH3_ST
SW_MH3_AS7
SW_MH3_AS8
ST_S15_9CH
Col_5
ST_C16_TC
ST_M93_SZ
Col_4
Def1234
Bcd2345
Cde3234
Abc4123

If there are spaces in your input file, you need to specify a field separator in the sort command naming a character that can never appear in your input file. If there are tabs in the input file and you want to sort based on output line positions (rather than input character counts), you would have to expand input tabs to a variable number of spaces depending on where in the input line the tab(s) appear. And if you want to sort on output print positions and there are backspace characters in the input, you will need to give a much clearer explanation of what is supposed to happen.

If you want to try the above awk script on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

05-02-2014

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Hi Don Cragun,

Thanks so much for kind explanation. Really appreciate that.
For my files right now, the Col_n is already in alphanumeric order and i just need to sort ascending the members of Col_n only, which makes rdrtx1's script work perfectly. As for your codes, i tried to understand it as it would be a big help for me if I need to sort the nth too in the future. I tried your code, but, i am wondering why "Col_5" comes before "Col_4" and why col_4 members are sorted descending? Hope u can help to explain them. Thanks.

redse171

View Public Profile for redse171

Find all posts by redse171

05-02-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Your description of what to do was vague. You said:

Quote:

the output that i want is to sort based on "Col_X" (X is the number)

I thought that meant you were trying to sort lines following lines of the form Col_n in increasing alphanumeric order based on the nth column (i.e., input character position) in those lines. So, the output produced by my script is sorted on the characters marked in red:

Code:

Col_1   # following lines are sorted on character 1
ST_F72_9S
SW_MH2_ST
SW_MH3_S6
Col_10  # following lines are sorted on character 10
SW_MH3_ST
SW_MH3_AS7
SW_MH3_AS8
ST_S15_9CH
Col_5   # following lines are sorted on character 5
ST_C16_TC
ST_M93_SZ
Col_4   # following lines are sorted on character 4
Def1234
Bcd2345
Cde3234
Abc4123

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

05-02-2014

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Hi,

Ok, now i get what u meant. Sorry for the confusion. Actually "Col_X" here just to represent groups. I just want to sort ascending the members of each group. thanks

redse171

View Public Profile for redse171

Find all posts by redse171

05-02-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by redse171

Hi,

Ok, now i get what u meant. Sorry for the confusion. Actually "Col_X" here just to represent groups. I just want to sort ascending the members of each group. thanks

OK. You asked how rdrtx1's script works. Basically, it reads the data from your input file into two arrays. One containing the separator lines, and the other containing the data for each group. After it has read all of the data, it prints the group name line and uses the sort utility to sort the data it accumulated for each group. A slightly simpler awk script with comments is:

Code:

awk '
/^Col_/ {
	# Group separator found.
	# Finish sorting the previous group, if there was a previous group.
	if(NR != 1) close("sort")
	# Print separator for next group.
	print
	# Skip to next line of input.
	next
}
{	# Send data for current group to sort command...
	print | "sort"
}
END {	# Finish sorting hte last group.
	close("sort")
}' File1.txt

This will not work if the 1st line in your input file does not start with Col_ and will probably produce an error message if your input file is an empty file, but I assume neither of these is a problem. In practice, the lines shown in orange could be left off, but it is good practice to explicitly close() any pipelines you open.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

05-02-2014

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Hi Don Cragun,

Many thanks!!! Really appreciate that.

redse171

View Public Profile for redse171

Find all posts by redse171

Shell Programming and Scripting

Sort based on certain value in a column

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sort based on one column

Discussion started by: kshitij

2. Shell Programming and Scripting

Use sort to sort numerical column

Discussion started by: sand1234

3. UNIX for Beginners Questions & Answers

How to align/sort the column pairs of an csv file, based on keyword word specified in another file?

Discussion started by: dineshkumarsrk

4. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Discussion started by: sargotrons

5. Shell Programming and Scripting

Sort based on column 1, not working with awk

Discussion started by: Junes

6. UNIX for Dummies Questions & Answers

Sort command in one column and not effect to another column

Discussion started by: GeodusT

7. UNIX for Dummies Questions & Answers

How to sort a column based on numerical ascending order if it includes e-10?

Discussion started by: evelibertine

8. Shell Programming and Scripting

Sort file based on column

Discussion started by: rsivasan

9. Shell Programming and Scripting

sort on second column only based on first column

Discussion started by: malcomex999

10. Shell Programming and Scripting

Question about sort specific column and print other column at the same time !

Discussion started by: patrick87