Sort by values in the 1st row, leaving first four coulumns untouched
Dear all, will be thankful if you can help on sort command.
My data looks like (tab separated; number of columns 2317; N of rows ~200000):
I need to sort my data so, that first four columns remain untouched. Rest of the columns are sorted by values in the first row. Result will look like:
Thank you a lot for your help!
Last edited by jim mcnamara; 10-31-2017 at 04:15 PM..
Are we correct in assuming that each heading on the 1st line for the last 2313 fields are the single letter V followed by unique non-negative integers?
What output do you get from running the following three commands:
where file is the name of the file that contains your data.
Note that, by definition, a text file contains no lines that contain more than LINE_MAX (which is 2048 on most systems) bytes in a line (including the <newline> terminator) and most of the UNIX text processing utilities (like awk, sed, and sort) are only defined to work on text files. If the file containing your data has 2317 fields and LINE_MAX is 2048 on your system, the file containing your data is not a text file. Some versions of these utilities work even if the input files have line lengths longer than those required by the standards; other versions of these utilities will give you an error if they encounter long lines; and other versions will silently ignore some data if they encounter long lines. Hopefully, the awk script above will give us an indication of how your implementation of awk will behave. (We hope that it will just print two lines of output on standard output and not print any diagnostics.)
Dear all,
thank you for quick attempts to solve the sorting!
jim mcnamara: yes, this sorting literally means moving columns based on values in the first row.
Don Cragun: - yes, values in heading line (based on which i have to sort) contain a letter V followed by non-negative unique number.
- and your code gives me:
Looks like it not a trivial thing. Maybe I have to try to do it in R.
But thank you once more!
Last edited by Scott; 11-01-2017 at 08:03 AM..
Reason: Code tags
Mmmm, your awk clearly is able to process longer lines than 2048, since max length is 16236.
It seems to me the difference between line 1 and line 2 is perhaps explained by the first four fields in the header? That the first field in line 2 corresponds to the 5th field in the header line?
What is strange is the sudden drop in nr of fields to 362 from line 16134 onwards.
It seems to me not all of the lines contain the same number of TAB separated fields ?
What is happening on line 16134?
Apart from solving above line length problems, here's something to start with if the problem doesn't hit system limits:
Most of the processing for the first line is for sorting the columns; my awk doesn't have a sorting algortihm, unfortunately.
In the awk below I am trying to remove all instances after a ; (semi-colon) or , (comma) in the ANN= pattern. I am using gsub
to substitute an empty string in these, so that ANN= is a single value (with only one value in it the one right after the ANN=). Thank you :).
I have comented my awk and... (11 Replies)
how can i sort the table based on first row? thanks in advance
input
name d b c a
l l1 l2 l3 l4
l1 1 2 3 4
l2 2 2 2 1
l3 1 1 2 2ouput
name a b c d
l1 l4 ... (4 Replies)
Hello,
How to sort each row in a document with numerical values and with more than one row. Example
Input data (file1.txt):
4 6 8 1 7
2 12 9 6 10
6 1 14 5 7
and I want the the output to look like this(file2.txt):
1 4 6 7 8
2 6 9 10 12
1 5 6 7 14
I've tried
sort -n file1.txt >... (12 Replies)
Hi, I need somebody's help with sorting data with awk.
I've got a file:
10 aaa 4584
12 bbb 6138
20 ccc 4417
21 ddd 7796
10 eee 7484
12 fff ... (5 Replies)
Hello All:
I've file in below format. File name is "FIRSTN.TBL":
AAAAAA N
BBBBBBBBBBBBBBBBBBBBBBB N
.
.
.
.
ZZZZZZZZZZZZZZZZZZZZZZZZZZ N
My file row length is 40 characters and my second column will start from 25th column and it is only... (3 Replies)
Hi
I need to do some thing like "find and insert before that " in a file which contains many records. This will be clear with the following example.
The original data record should be some thing like this
60119827 RTMS_LOCATION_CDR INSTANT_POSITION_QUERY 1236574686123083rtmssrv7 ... (8 Replies)
I have a pipe delimited file. Key is field 2, date is field 5 (as example, my real file is more complicated of course, but the KEY and DATE are accurate)
There can be duplicate rows for a key with different dates.
I need to keep only rows with latest date in this case.
Example data: ... (4 Replies)