Sort file by field 1 that has text as well as a number


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sort file by field 1 that has text as well as a number
# 1  
Old 10-03-2015
Sort file by field 1 that has text as well as a number

I am using the below awk that results in the below output:

Code:
awk '{k=$1 OFS $2; s[k]+=$4; c[k]++} END{for(i in s) print i, s[i]/c[i]}' input.txt > output.txt

output.txt
Code:
chr20:43625799-43625957 STK4:exon.6;STK4:exon.7 310.703
chr20:36770455-36770611 TGM2:exon.6;TGM2:exon.7 614.756
chr20:19945585-19945678 RIN2:exon.6;RIN2:exon.7 175.258
chr20:10632768-10632908 JAG1:exon.5;JAG1:exon.7 319.586
chr20:8630010-8630106 PLCB1:exon.1;PLCB1:exon.7 183.188
chr19:17952438-17952581 JAK3:exon.2;JAK3:exon.6;JAK3:exon.7 306.566
chr19:13051547-13051711 CALR:exon.3;CALR:exon.6;CALR:exon.7 337.811
chr19:13006795-13006945 GCDH:exon.5;GCDH:exon.6;GCDH:exon.7 628.62
chr19:11491549-11491657 EPOR:exon.1;EPOR:exon.6;EPOR:exon.7 301.87
chr18:3456341-3456588 TGIF1:exon.1;TGIF1:exon.2;TGIF1:exon.3 430.332
chr15:90630333-90630505 IDH2:exon.5;IDH2:exon.7 516.128

I can not seem to pipe in a sort of the first column that would re-order the file by ascending order. I think the "chr" text is messing up the sort, but I'm not sure. Thank you Smilie.

Desired output

Code:
chr1
chr1
chr2
chr2
chr3
....
....

EDIT: Thought the below would sort using the first column using the fourth character sorted numerically, but that's not working.

Code:
awk '{k=$1 OFS $2; s[k]+=$4; c[k]++} END{for(i in s) print i, s[i]/c[i]}' input > output.txt | sort -k1.4 -n output.txt


Last edited by cmccabe; 10-03-2015 at 12:18 PM.. Reason: added edit
# 2  
Old 10-03-2015
Code:
awk ' .... ' | awk -F: '{print $1, $0}' OFS=':' | sort -t :  -k1,1n | cut -d : -f2-

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 10-03-2015
Quote:
Originally Posted by cmccabe
[...]

Code:
awk '{k=$1 OFS $2; s[k]+=$4; c[k]++} END{for(i in s) print i, s[i]/c[i]}' input > output.txt | sort -k1.4 -n output.txt

Code:
awk '{k=$1 OFS $2; s[k]+=$4; c[k]++} END{for(i in s) print i, s[i]/c[i]}' input | sort -k1.4 -n > output.txt

This User Gave Thanks to Aia For This Post:
# 4  
Old 10-03-2015
Try:
Code:
sort -nt: -k1.4,1 -k2,2

to sort the ranges as well
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 10-04-2015
Hi.

Utility msort allows fields to be described as hybrid, mixed characters and numeric:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate sort of mixed field, "hybrid", with msort.
# If msort is not in repository:
# http://freecode.com/projects/msort

LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C msort

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results of msort:"
msort -l -q -j -d: -n 1 -c hybrid $FILE

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
msort 8.44

-----
 Input data file data1:
chr20:43625799-43625957 STK4:exon.6;STK4:exon.7 310.703
chr20:36770455-36770611 TGM2:exon.6;TGM2:exon.7 614.756
chr20:19945585-19945678 RIN2:exon.6;RIN2:exon.7 175.258
chr20:10632768-10632908 JAG1:exon.5;JAG1:exon.7 319.586
chr20:8630010-8630106 PLCB1:exon.1;PLCB1:exon.7 183.188
chr19:17952438-17952581 JAK3:exon.2;JAK3:exon.6;JAK3:exon.7 306.566
chr19:13051547-13051711 CALR:exon.3;CALR:exon.6;CALR:exon.7 337.811
chr19:13006795-13006945 GCDH:exon.5;GCDH:exon.6;GCDH:exon.7 628.62
chr19:11491549-11491657 EPOR:exon.1;EPOR:exon.6;EPOR:exon.7 301.87
chr18:3456341-3456588 TGIF1:exon.1;TGIF1:exon.2;TGIF1:exon.3 430.332
chr15:90630333-90630505 IDH2:exon.5;IDH2:exon.7 516.128

-----
 Results of msort:
chr15:90630333-90630505 IDH2:exon.5;IDH2:exon.7 516.128
chr18:3456341-3456588 TGIF1:exon.1;TGIF1:exon.2;TGIF1:exon.3 430.332
chr19:17952438-17952581 JAK3:exon.2;JAK3:exon.6;JAK3:exon.7 306.566
chr19:13051547-13051711 CALR:exon.3;CALR:exon.6;CALR:exon.7 337.811
chr19:13006795-13006945 GCDH:exon.5;GCDH:exon.6;GCDH:exon.7 628.62
chr19:11491549-11491657 EPOR:exon.1;EPOR:exon.6;EPOR:exon.7 301.87
chr20:43625799-43625957 STK4:exon.6;STK4:exon.7 310.703
chr20:10632768-10632908 JAG1:exon.5;JAG1:exon.7 319.586
chr20:8630010-8630106 PLCB1:exon.1;PLCB1:exon.7 183.188
chr20:36770455-36770611 TGM2:exon.6;TGM2:exon.7 614.756
chr20:19945585-19945678 RIN2:exon.6;RIN2:exon.7 175.258

See link listed in script if msort is not your repository ... cheers, drl
This User Gave Thanks to drl For This Post:
# 6  
Old 10-05-2015
Thank you all Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Different field number in the file

Hello Friends I have a data file which is comma seperate (,) where i am expecting 2 column but there are number of time when file comes with data having more than 2 column. I want to check which line has more columns 20141115,15/11/2014 20141129,29/11/2014 20141003,03/10/2014... (4 Replies)
Discussion started by: guddu_12
4 Replies

2. Shell Programming and Scripting

Sort file based on number of delimeters in line

Hi, Need to sort file based on the number of delimeters in the lines. cat testfile /home/oracle/testdb /home /home/oracle/testdb/newdb /home/oracle Here delimeter is "/" expected Output: /home/oracle/testdb/newdb /home/oracle/testdb /home/oracle /home (3 Replies)
Discussion started by: Sumanthsv
3 Replies

3. Shell Programming and Scripting

Replace a field with line number in file

I am working on a script to convert bank data to a csv file. I have the format done - columns etc. The final piece of the puzzle is to change the second field (after the R) of every line to reflect its' line number in the file. I am stumped. I can use awk on each line but need help looping through... (9 Replies)
Discussion started by: Melah Gindi
9 Replies

4. Shell Programming and Scripting

Sort the file based on number of occurences

I have a file (input) I want to sort the file based on the number of times a pattern in the first column occurs for example grapes occurs 4 times in combination with other patterns so i want it to be first like shown in the output file. then apple ocuurs thrice so it occupies second position and so... (7 Replies)
Discussion started by: anurupa777
7 Replies

5. UNIX for Dummies Questions & Answers

Sort Files based on the number(s) on the file name

Experts I have a list of files in the directory mysample1 mysample2 mysample3 mysample4 mysample5 mysample6 mysample7 mysample8 mysample9 mysample10 mysample11 mysample12 mysample13 mysample14 mysample15 (4 Replies)
Discussion started by: dsedi
4 Replies

6. UNIX for Dummies Questions & Answers

Inserting a sequential number into a field on a flat file

I have a csv flatfile with a few million rows. I need to replace a field (field number is 85) in the file with a sequential number. As an example, let's assume there are only 4 fields in the file: A,A,,32 A,A,,27 A,B,,43 C,C,,354 If I wanted to amend the 3rd field in this way my... (2 Replies)
Discussion started by: BristolSmithy
2 Replies

7. Shell Programming and Scripting

need Shell script for Sort BASED ON FIRST FIELD and PRINT THE WHOLE FILE WITHOUT DUPLICATES

Can some one provide me a shell script. I have file with many columns and many rows. need to sort the first column and then remove the duplicates records if exists.. finally print the full data with first coulm as unique. Sort BASED ON FIRST FIELD and remove the duplicates if exists... (2 Replies)
Discussion started by: tuffEnuff
2 Replies

8. Shell Programming and Scripting

Sort alpha on 1st field, numerical on 2nd field (sci notation)

I want to sort alphabetically on the first field and sort in descending numerical order on the 2nd field. With a normal "sort -r -n" it does this: abc ||| 5e-05 ||| bla abc ||| 3 ||| ble def ||| 1 ||| abc def ||| 0.2 ||| def As you can see it ignores the fact that 5e-05 is actually 0.00005... (1 Reply)
Discussion started by: FrancoisCN
1 Replies

9. Shell Programming and Scripting

How to sort a field in a file having date values

Hi All, I am having a pipe delimited file .In this file the 3rd column is having date values.I need to get the min date and max date from that file. I have used cut -d '|' test.dat -f 3|sort -u But it is not sorting the date .How to sort the date column using unix commands Thanks ... (4 Replies)
Discussion started by: risshanth
4 Replies
Login or Register to Ask a Question