If you're willing to accept output with values in each column presented in the same order as they were seen in encountered in the input, you could try something like:
which, with the sample data you provided, produces the output:
If you need the last column to be in a different order, you need to more clearly explain your requirements. Hopefully, you can use the above code as a base to get something that will do what you're trying to do.
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
These 2 Users Gave Thanks to Don Cragun For This Post:
Jumping in as Don Cragun seems to be logged out at the moment:
Hi RudiC,
Thank you for filling in while I was sleeping. I do have a couple of comments that may help explain what I was thinking when I wrote this code...
First, I'm not sure that I would say that:
is complex code, but it certainly is dense. I just started by figuring out what I want to print after I had accumulated all of the data. Although the sample data provided happened to have the same number of unique values in each input column, there doesn't seem to be any reason to assume that that will be true with real input data. So, I need:
a count of the number of unique values that have been found in each column (which is handled by the array ++count[column], with the ++ incrementing the number of unique values seen in this column),
the unique values to be displayed in each output row and column (which is handled by the array FieldValue[column, output_row] = field_value),
and a quick way to determine whether or not we have seen a given value in a given field before (which is handled by the array data[column, field_value], and as RudiC said we don't need any value to be assigned to elements of this array; we just need to know whether a given column and field_value pair have been entered into this array).
And, second, I wouldn't say that the output is in random order. The 1st row of the output will contain the 1st unique value found in each input column. The 2nd row of the output will contain the 2nd unique value found in each input column. Etc.
Note that if an output row does not have a value for a given column, that field will be an empty string (printed as just a field separator). This works because referencing an array element that has not been assigned a value will return an empty string as its value.
Note also that since I used the awk variable OFS (instead of an explicit <space> as a field separator), you can change the character that appears in the output to separate fields by setting a different value for OFS before naming the file to be processed on the last line of the script.
Note also that there is no requirement that every input row contain the same number of fields, but the output will have the same number of fields in every row.
For example, if we had a file named numbers containing:
and we wanted the output field separator to be a <comma> instead of a <space>, we could change the last line of the script from:
to
and get the output:
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
I like modular codes as well as generalized codes, so as an experiment I wrote a perl script that extracts the fields simultaneously to (effectively) files. Here is a simple example, and then the solution with the user data:
producing:
There may be a small amount of parallelism seen because the separated fields are really written to pipes, which is supplied by the --command or -c option, resulting in child processes.
The final files may be then combined to produce the output similar to that desired by the OP.
Of course, this can also be done with separate awk processes as well.
The documentation for the experimental code:
Best wishes ... cheers, drl
Last edited by drl; 10-31-2016 at 08:50 AM..
Reason: Correct minor typos.
Hi All,
I have a directory and sub-directory that having ‘n' number of .log file in nearly 1GB.
The file is comma separated file. I need to recursively grep and uniq first column values only.
I did in perl. But i wish to know more command line utilities to calculate the time for grep and... (4 Replies)
I want to bring values in the second column into single line for uniq value in the first column.
My input
jvm01, Web 2.0 Feature Pack Library
jvm01, IBM WebSphere JAX-RS
jvm01, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library
jvm02, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature... (10 Replies)
Hello
How can I get a number of occurrence count for this file;
ERR315389.1000156 CTTGAAGAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTGGAGGCTCAGGCTGAGAAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAG
ERR315389.1000281 ... (3 Replies)
I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions:
1) longest string of each pattern in column 2, ignore any sub-string, as the index;
2) all the unique patterns after 1);
3) print the whole row;
input:
1 ABCDEFGHI longest_sequence1
2 ABCDEFGH... (12 Replies)
Hi,
I have a file as listed below.. What I want to get is for each unique value in column 1 the corresponding values in the rest of the columns should be summed up..
AAK1 0 1 0 11
AAK1 0 0 1 1
AAK1 0 0 1 2... (2 Replies)
Hi Gurus,
I have a tab separated text file with two columns. I would like to make the first column values as headings for the second column values.
Ex.
>value1 subjects
>value2 priorities
>value3 requirements
...etc
and I want to have a file
>value1
subjects
>value2
priorities... (4 Replies)
Hi ,
Can You Please let Know How use unix uniq command on a single column for deleting records from file
with Below Structure.Pipe Delimter File .
Source
Name | Account_Id
A | 101
B... (2 Replies)
Hi All,
I have a file which is having 3 columns as (string string integer)
a b 1
x y 2
p k 5
y y 4
.....
.....
Question:
I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Dear All,
I want to get help for below case.
I have a file like this.
saman 1
gihan 2
saman 4
ravi 1
ravi 2
so i want to get the result,
saman 5
gihan 2
ravi 3 like this.
Pls help me. (17 Replies)