Are we correct in assuming that each heading on the 1st line for the last 2313 fields are the single letter V followed by unique non-negative integers?
What output do you get from running the following three commands:
where file is the name of the file that contains your data.
Note that, by definition, a text file contains no lines that contain more than LINE_MAX (which is 2048 on most systems) bytes in a line (including the <newline> terminator) and most of the UNIX text processing utilities (like awk, sed, and sort) are only defined to work on text files. If the file containing your data has 2317 fields and LINE_MAX is 2048 on your system, the file containing your data is not a text file. Some versions of these utilities work even if the input files have line lengths longer than those required by the standards; other versions of these utilities will give you an error if they encounter long lines; and other versions will silently ignore some data if they encounter long lines. Hopefully, the awk script above will give us an indication of how your implementation of awk will behave. (We hope that it will just print two lines of output on standard output and not print any diagnostics.)
Dear all,
thank you for quick attempts to solve the sorting!
jim mcnamara: yes, this sorting literally means moving columns based on values in the first row.
Don Cragun: - yes, values in heading line (based on which i have to sort) contain a letter V followed by non-negative unique number.
- and your code gives me:
Mmmm, your awk clearly is able to process longer lines than 2048, since max length is 16236.
It seems to me the difference between line 1 and line 2 is perhaps explained by the first four fields in the header? That the first field in line 2 corresponds to the 5th field in the header line?
What is strange is the sudden drop in nr of fields to 362 from line 16134 onwards.
It seems to me not all of the lines contain the same number of TAB separated fields ?
What is happening on line 16134?
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
This demonstration code:
Code:
#!/usr/bin/env bash
# @(#) s1 Demonstrate separate, transpose, sort, transpose, combine file matrix.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C cut transpose.pl sort paste pass-fail
FILE=${1-data1}
N=${FILE//[A-Za-z]/}
E=expected-output$N
# Utility functions: print-as-echo, print-line-with-visual-space.
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
pl " Input data file $FILE:"
head $FILE
pl " Expected output:"
head $E
pl " Prepare input, split and save first 4 columns, remainder:"
cut -f1-4 $FILE > first-four
cut -f5- $FILE |
tee remainder
pl " Results, transpose, sort:"
transpose.pl remainder |
tee t2 |
sort -k1.2,1n |
tee t3
pl " Results, re-transpose, paste:"
transpose.pl t3 > sorted-remainder
paste first-four sorted-remainder | tee f1
pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C f1 "$E" || ( pe; pe " Results cannot be verified." ) >&2
exit
produces:
Code:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution : Debian 8.9 (jessie)
bash GNU bash 4.3.30
cut (GNU coreutils) 8.23
transpose.pl - ( local: RepRev 1.1, ~/bin/transpose.pl, 2017-01-29 )
sort (GNU coreutils) 8.23
paste (GNU coreutils) 8.23
pass-fail (local) 1.10
-----
Input data file data1:
a b c d V10 V2 V8 V4 V7
xx z y 1000 1 2 0 2 0
tr v m 1001 0 0 1 2 2
rg s n 1003 1 1 2 0 0
-----
Expected output:
a b c d V2 V4 V7 V8 V10
xx z y 1000 2 2 0 0 1
tr v m 1001 0 2 2 1 0
rg s n 1003 1 0 0 2 1
-----
Prepare input, split and save first 4 columns, remainder:
V10 V2 V8 V4 V7
1 2 0 2 0
0 0 1 2 2
1 1 2 0 0
-----
Results, transpose, sort:
V2 2 0 1
V4 2 2 0
V7 0 2 0
V8 0 1 2
V10 1 0 1
-----
Results, re-transpose, paste:
a b c d V2 V4 V7 V8 V10
xx z y 1000 2 2 0 0 1
tr v m 1001 0 2 2 1 0
rg s n 1003 1 0 0 2 1
-----
Verify results if possible:
-----
Comparison of 4 created lines with 4 lines of desired results:
Succeeded -- files (computed) f1 and (standard) expected-output1 have same content.
For some other codes that can transpose, see:
Code:
Transpose
1) rs, reshape a data arrays
2) transpose.pl
http://www1.cuni.cz/~obo/textutils/
3) transpose, sourceforge c
https://sourceforge.net/projects/transpose/
4) pspp
'FLIP' transposes rows and columns in the active dataset.
5) datamash
transpose transpose rows, columns of the input file
*) awk, perl, python, c, R, sc, and so on:
http://stackoverflow.com/questions/1729824/an-efficient-way-to-transpose-a-file-in-bash
http://stackoverflow.com/questions/25331830/how-do-i-efficiently-transpose-a-matrix-in-r
https://www.unix.com/unix-for-beginners-questions-and-answers/270683-transpose-large-data-unix.html et al
I also tried the solution with item 3 above, and it worked.
I tried an alternate file with fewer columns, and it seemed to work.
Some details on transpose.pl and transpose:
Code:
transpose.pl Swap rows and columns in the given tab-delimited table (MR). (what)
Path : ~/bin/transpose.pl
Version : - ( local: RepRev 1.1, ~/bin/transpose.pl, 2017-01-29 )
Length : 28 lines
Type : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Home : http://www1.cuni.cz/~obo/textutils/ (doc)
transpose Reshapes delimited text data (help)
Path : ~/executable/transpose
Version : - ( local: ~/executable/transpose, 2017-01-29 )
Type : ELF64-bitLSBexecutable,x86-64,version1(SYSV ...)
Home : https://sourceforge.net/projects/transpose/ (doc)
In the awk below I am trying to remove all instances after a ; (semi-colon) or , (comma) in the ANN= pattern. I am using gsub
to substitute an empty string in these, so that ANN= is a single value (with only one value in it the one right after the ANN=). Thank you :).
I have comented my awk and... (11 Replies)
how can i sort the table based on first row? thanks in advance
input
name d b c a
l l1 l2 l3 l4
l1 1 2 3 4
l2 2 2 2 1
l3 1 1 2 2ouput
name a b c d
l1 l4 ... (4 Replies)
Hello,
How to sort each row in a document with numerical values and with more than one row. Example
Input data (file1.txt):
4 6 8 1 7
2 12 9 6 10
6 1 14 5 7
and I want the the output to look like this(file2.txt):
1 4 6 7 8
2 6 9 10 12
1 5 6 7 14
I've tried
sort -n file1.txt >... (12 Replies)
Hi, I need somebody's help with sorting data with awk.
I've got a file:
10 aaa 4584
12 bbb 6138
20 ccc 4417
21 ddd 7796
10 eee 7484
12 fff ... (5 Replies)
Hello All:
I've file in below format. File name is "FIRSTN.TBL":
AAAAAA N
BBBBBBBBBBBBBBBBBBBBBBB N
.
.
.
.
ZZZZZZZZZZZZZZZZZZZZZZZZZZ N
My file row length is 40 characters and my second column will start from 25th column and it is only... (3 Replies)
Hi
I need to do some thing like "find and insert before that " in a file which contains many records. This will be clear with the following example.
The original data record should be some thing like this
60119827 RTMS_LOCATION_CDR INSTANT_POSITION_QUERY 1236574686123083rtmssrv7 ... (8 Replies)
I have a pipe delimited file. Key is field 2, date is field 5 (as example, my real file is more complicated of course, but the KEY and DATE are accurate)
There can be duplicate rows for a key with different dates.
I need to keep only rows with latest date in this case.
Example data: ... (4 Replies)