Sort by values in the 1st row, leaving first four coulumns untouched


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Sort by values in the 1st row, leaving first four coulumns untouched
# 1  
Sort by values in the 1st row, leaving first four coulumns untouched

Dear all, will be thankful if you can help on sort command.

My data looks like (tab separated; number of columns 2317; N of rows ~200000):

Code:
a    b    c    d    V10    V2    V8    V4    V7 
xx    z    y    1000    1    2    0    2    0
tr    v    m    1001    0    0    1    2    2
rg    s    n    1003    1    1    2    0    0

I need to sort my data so, that first four columns remain untouched. Rest of the columns are sorted by values in the first row. Result will look like:
Code:
a    b    c    d    V2    V4    V7     V8    V10
xx    z    y    1000    2    2    0    0    1
tr    v    m    1001    0    2    2    1    0
rg    s    n    1003    1    0    0    2    1

Thank you a lot for your help!

Last edited by jim mcnamara; 10-31-2017 at 05:15 PM..
# 2  
Looks to me like you mean 'sort by vertical column' You move each column based on the content of the firs row - columns 5 - 9 (V2, V4,. ....)

Correct?
# 3  
Are we correct in assuming that each heading on the 1st line for the last 2313 fields are the single letter V followed by unique non-negative integers?

What output do you get from running the following three commands:
Code:
uname -a

getconf LINE_MAX

awk -F'\t' '
lNF != NF	{print "NF=" (lNF = NF), "NR=" NR}
length() > lm	{lm = length()}
END		{print "Max length=" lm}
' file

where file is the name of the file that contains your data.

Note that, by definition, a text file contains no lines that contain more than LINE_MAX (which is 2048 on most systems) bytes in a line (including the <newline> terminator) and most of the UNIX text processing utilities (like awk, sed, and sort) are only defined to work on text files. If the file containing your data has 2317 fields and LINE_MAX is 2048 on your system, the file containing your data is not a text file. Some versions of these utilities work even if the input files have line lengths longer than those required by the standards; other versions of these utilities will give you an error if they encounter long lines; and other versions will silently ignore some data if they encounter long lines. Hopefully, the awk script above will give us an indication of how your implementation of awk will behave. (We hope that it will just print two lines of output on standard output and not print any diagnostics.)
# 4  
Dear all,
thank you for quick attempts to solve the sorting!

jim mcnamara: yes, this sorting literally means moving columns based on values in the first row.

Don Cragun: - yes, values in heading line (based on which i have to sort) contain a letter V followed by non-negative unique number.
- and your code gives me:
Code:
getconf LINE_MAX 
2048


NF=2317 NR=1
NF=2313 NR=2
NF=362 NR=16134
Max length=16236

Looks like it not a trivial thing. Maybe I have to try to do it in R.

But thank you once more!

Last edited by Scott; 11-01-2017 at 09:03 AM.. Reason: Code tags
# 5  
Mmmm, your awk clearly is able to process longer lines than 2048, since max length is 16236.

It seems to me the difference between line 1 and line 2 is perhaps explained by the first four fields in the header? That the first field in line 2 corresponds to the 5th field in the header line?

What is strange is the sudden drop in nr of fields to 362 from line 16134 onwards.

It seems to me not all of the lines contain the same number of TAB separated fields ?
What is happening on line 16134?
# 6  
Apart from solving above line length problems, here's something to start with if the problem doesn't hit system limits:
Code:
awk -F"\t" '
NR == 1         {printf "%s", substr ($0, 1, index($0, $5)-1)
                 for (i=5; i<=NF; i++) ORG[$i] = i
                 OFS = "\n"
                 sub ($1 FS $2 FS $3 FS $4 FS, "")
                 $1 = $1
                 CNT = 5
                 OFS = FS
                 while (1 == ("echo \"" $0 "\" | sort -k1.2n") | getline X)     {COL[CNT++] = X
                                                                                 HD = HD DL X
                                                                                 DL = FS
                                                                                }
                 print HD
                 next
                }
                {for (i=1; i<= 4; i++) printf "%s%c", $i, FS
                 for (i=5; i<=NF; i++) printf "%s%c", $(ORG[COL[i]]), i==NF?ORS:OFS
                }
' file
a	b	c	d	V2	V4	V7 	V8	V10
xx	z	y	1000	2	2	0	0	1
tr	v	m	1001	0	2	2	1	0
rg	s	n	1003	1	0	0	2	1

Most of the processing for the first line is for sorting the columns; my awk doesn't have a sorting algortihm, unfortunately.

Last edited by RudiC; 11-01-2017 at 01:19 PM..
This User Gave Thanks to RudiC For This Post:
# 7  
Hi.

This demonstration code:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate separate, transpose, sort, transpose, combine file matrix.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C cut transpose.pl sort paste pass-fail

FILE=${1-data1}
N=${FILE//[A-Za-z]/}
E=expected-output$N

# Utility functions: print-as-echo, print-line-with-visual-space.
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
pl " Input data file $FILE:"
head $FILE

pl " Expected output:"
head $E

pl " Prepare input, split and save first 4 columns, remainder:"
cut -f1-4 $FILE > first-four
cut -f5- $FILE |
tee remainder

pl " Results, transpose, sort:"
transpose.pl remainder |
tee t2 |
sort -k1.2,1n |
tee t3

pl " Results, re-transpose, paste:"
transpose.pl t3 > sorted-remainder
paste first-four sorted-remainder | tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C f1 "$E" || ( pe; pe " Results cannot be verified." ) >&2

exit

produces:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.9 (jessie) 
bash GNU bash 4.3.30
cut (GNU coreutils) 8.23
transpose.pl - ( local: RepRev 1.1, ~/bin/transpose.pl, 2017-01-29 )
sort (GNU coreutils) 8.23
paste (GNU coreutils) 8.23
pass-fail (local) 1.10

-----
 Input data file data1:
a       b       c       d       V10     V2      V8      V4      V7
xx      z       y       1000    1       2       0       2       0
tr      v       m       1001    0       0       1       2       2
rg      s       n       1003    1       1       2       0       0

-----
 Expected output:
a       b       c       d       V2      V4      V7      V8      V10
xx      z       y       1000    2       2       0       0       1
tr      v       m       1001    0       2       2       1       0
rg      s       n       1003    1       0       0       2       1

-----
 Prepare input, split and save first 4 columns, remainder:
V10     V2      V8      V4      V7
1       2       0       2       0
0       0       1       2       2
1       1       2       0       0

-----
 Results, transpose, sort:
V2      2       0       1
V4      2       2       0
V7      0       2       0
V8      0       1       2
V10     1       0       1

-----
 Results, re-transpose, paste:
a       b       c       d       V2      V4      V7      V8      V10
xx      z       y       1000    2       2       0       0       1
tr      v       m       1001    0       2       2       1       0
rg      s       n       1003    1       0       0       2       1

-----
 Verify results if possible:

-----
 Comparison of 4 created lines with 4 lines of desired results:
 Succeeded -- files (computed) f1 and (standard) expected-output1 have same content.

For some other codes that can transpose, see:
Code:
Transpose
        1) rs, reshape a data arrays

        2) transpose.pl
           http://www1.cuni.cz/~obo/textutils/

        3) transpose, sourceforge c
           https://sourceforge.net/projects/transpose/

        4) pspp
           'FLIP' transposes rows and columns in the active dataset.

        5) datamash
           transpose   transpose rows, columns of the input file

        *) awk, perl, python, c, R, sc, and so on:
           http://stackoverflow.com/questions/1729824/an-efficient-way-to-transpose-a-file-in-bash
           http://stackoverflow.com/questions/25331830/how-do-i-efficiently-transpose-a-matrix-in-r
           https://www.unix.com/unix-for-beginners-questions-and-answers/270683-transpose-large-data-unix.html et al

I also tried the solution with item 3 above, and it worked.

I tried an alternate file with fewer columns, and it seemed to work.

Some details on transpose.pl and transpose:
Code:
transpose.pl    Swap rows and columns in the given tab-delimited table (MR). (what)
Path    : ~/bin/transpose.pl
Version : - ( local: RepRev 1.1, ~/bin/transpose.pl, 2017-01-29 )
Length  : 28 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Home    : http://www1.cuni.cz/~obo/textutils/ (doc)

transpose       Reshapes delimited text data (help)
Path    : ~/executable/transpose
Version : - ( local: ~/executable/transpose, 2017-01-29 )
Type    : ELF64-bitLSBexecutable,x86-64,version1(SYSV ...)
Home    : https://sourceforge.net/projects/transpose/ (doc)

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #916
Difficulty: Easy
In Javascript, the console is an object which provides access to the browser debugging console.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to remove mutiple values from specific pattern, leaving a single value

In the awk below I am trying to remove all instances after a ; (semi-colon) or , (comma) in the ANN= pattern. I am using gsub to substitute an empty string in these, so that ANN= is a single value (with only one value in it the one right after the ANN=). Thank you :). I have comented my awk and... (11 Replies)
Discussion started by: cmccabe
11 Replies

2. Shell Programming and Scripting

Sort by first row - awk

how can i sort the table based on first row? thanks in advance input name d b c a l l1 l2 l3 l4 l1 1 2 3 4 l2 2 2 2 1 l3 1 1 2 2ouput name a b c d l1 l4 ... (4 Replies)
Discussion started by: quincyjones
4 Replies

3. Shell Programming and Scripting

awk transpose column to row and sort

I am trying to awk the output from below output for each port: i need separate line with comma source file Output required (3 Replies)
Discussion started by: ranjancom2000
3 Replies

4. Shell Programming and Scripting

Sort each row (horizontally) in AWK or any

Hello, How to sort each row in a document with numerical values and with more than one row. Example Input data (file1.txt): 4 6 8 1 7 2 12 9 6 10 6 1 14 5 7 and I want the the output to look like this(file2.txt): 1 4 6 7 8 2 6 9 10 12 1 5 6 7 14 I've tried sort -n file1.txt >... (12 Replies)
Discussion started by: joseamck
12 Replies

5. Shell Programming and Scripting

Keep 3 values in each row

Hi, I have n number of values like 1 2 3 4 I want the output like 1 2 3 4 5 6 - - - - - - Please help me on this:wall: (4 Replies)
Discussion started by: cns1710
4 Replies

6. Shell Programming and Scripting

Sort data from column to row

Hi, I need somebody's help with sorting data with awk. I've got a file: 10 aaa 4584 12 bbb 6138 20 ccc 4417 21 ddd 7796 10 eee 7484 12 fff ... (5 Replies)
Discussion started by: killerbee
5 Replies

7. Shell Programming and Scripting

Sort a file from specific row onwards

Hello All: I've file in below format. File name is "FIRSTN.TBL": AAAAAA N BBBBBBBBBBBBBBBBBBBBBBB N . . . . ZZZZZZZZZZZZZZZZZZZZZZZZZZ N My file row length is 40 characters and my second column will start from 25th column and it is only... (3 Replies)
Discussion started by: nvkuriseti
3 Replies

8. Shell Programming and Scripting

How to insert data befor some field in a row of data depending up on values in row

Hi I need to do some thing like "find and insert before that " in a file which contains many records. This will be clear with the following example. The original data record should be some thing like this 60119827 RTMS_LOCATION_CDR INSTANT_POSITION_QUERY 1236574686123083rtmssrv7 ... (8 Replies)
Discussion started by: aemunathan
8 Replies

9. Shell Programming and Scripting

sort and semi-duplicate row - keep latest only

I have a pipe delimited file. Key is field 2, date is field 5 (as example, my real file is more complicated of course, but the KEY and DATE are accurate) There can be duplicate rows for a key with different dates. I need to keep only rows with latest date in this case. Example data: ... (4 Replies)
Discussion started by: LisaS
4 Replies

10. Programming

copying or concatinating string from 1st bit, leaving 0th bit

Hello, If i have 2 strings str1 and str2, i would like to copy/concatenate str2 to str1, from 1st bit leaving the 0th bit. How do i do it? (2 Replies)
Discussion started by: jazz
2 Replies

Featured Tech Videos