Sponsored Content
Top Forums Shell Programming and Scripting HELP - uniq values per column Post 302984724 by drl on Sunday 30th of October 2016 03:31:10 PM
Old 10-30-2016
Hi.

I like modular codes as well as generalized codes, so as an experiment I wrote a perl script that extracts the fields simultaneously to (effectively) files. Here is a simple example, and then the solution with the user data:
Code:
#!/usr/bin/env bash

# @(#) s3       Demonstrate display unique values from fields.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C cutofi

pl " Sample easy dataset data2:"
cat data2

pl " Results, field 2 extracted to file f2:"
cutofi data2
cat f2

pl " Results, field 2 piped into \"sort|uniq\" to file f2:"
cutofi -c='sort|uniq' data2
cat f2

pe
pe " Results, all files f? pasted together:"
paste f?

# Solve the user problem.
FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE:"
cat $FILE

pl " Expected output:"
cat $E

pl " Results:"
cutofi -c='sort -u' -p=xx $FILE
paste xx? |
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2

exit 0

producing:
Code:
$ ./s3

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.6 (jessie) 
bash GNU bash 4.3.30
cutofi (local) 1.4

-----
 Sample easy dataset data2:
1 1 1 
1 1 1 
1 1 1 
2 3 4

-----
 Results, field 2 extracted to file f2:
1
1
1
3

-----
 Results, field 2 piped into "sort|uniq" to file f2:
1
3

 Results, all files f? pasted together:
1       1       1
2       3       4

-----
 Input data file data1:
joe allen ibm
joe smith ibm
joe allen google
joe smith google
rachel allen google

-----
 Expected output:
joe allen google
rachel smith ibm

-----
 Results:
joe     allen   google
rachel  smith   ibm

-----
 Verify results if possible:

-----
 Comparison of 2 created lines with 2 lines of desired results:
f1 expected-output.txt differ: char 4, line 1
 Failed -- files f1 and expected-output.txt not identical -- detailed comparison follows.
 Succeeded by ignoring whitespace differences.

There may be a small amount of parallelism seen because the separated fields are really written to pipes, which is supplied by the --command or -c option, resulting in child processes.

The final files may be then combined to produce the output similar to that desired by the OP.

Of course, this can also be done with separate awk processes as well.

The documentation for the experimental code:
Code:
$ ./cutofi -h

SYNOPSIS
cutofi - cut (split) fields from text file to separate filters, files.

PURPOSE
Separate fields in text files by streaming them to a pipeline. This
allows some parallel processing, albeit at the slight risk of having
too many processes active.

This allows operations on the field as a separate dataset so
that, for example, each field could be sorted without regard to
the other fields.

        Maximum number of processes for a user in a specific distribution:
        OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
        Distribution        : Debian 8.6 (jessie) 

        $ ulimit -u
        78528

See also the Missing Text Routine:
  colwise            Cut a column, run command, paste results.
  http://www1.cuni.cz/~obo/textutils/ (2016.08)

USAGE
cutofi [options] -- [files]

options (read "|" as "or"):

-h | --help
  print this message and quit.

-d | --debug
  turn on debugging output.  Default is off.

-p | --prefix=str
  set the output file base string as "str".  Default is "f".

-f | --fields=list
  set specific fields to be split in comma-separated list: f1,f2, ... fn.
  Default is all strings.

-s | --separator=string
  set the separator to "string". Default is \s+, whitespace.

-c | --command=string
  set the output filter as "string", e.g. "sort|uniq".  Default
  is (essentially) "cat > prefix<i>" where i is the field number.

Best wishes ... cheers, drl

Last edited by drl; 10-31-2016 at 08:50 AM.. Reason: Correct minor typos.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Column sum group by uniq records

Dear All, I want to get help for below case. I have a file like this. saman 1 gihan 2 saman 4 ravi 1 ravi 2 so i want to get the result, saman 5 gihan 2 ravi 3 like this. Pls help me. (17 Replies)
Discussion started by: Nayanajith
17 Replies

2. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

3. UNIX for Dummies Questions & Answers

Re: How To Use UNIQ UNIX Command On single Column

Hi , Can You Please let Know How use unix uniq command on a single column for deleting records from file with Below Structure.Pipe Delimter File . Source Name | Account_Id A | 101 B... (2 Replies)
Discussion started by: anudeepkumar123
2 Replies

4. UNIX for Dummies Questions & Answers

shift values in one column as header for values in another column

Hi Gurus, I have a tab separated text file with two columns. I would like to make the first column values as headings for the second column values. Ex. >value1 subjects >value2 priorities >value3 requirements ...etc and I want to have a file >value1 subjects >value2 priorities... (4 Replies)
Discussion started by: Unilearn
4 Replies

5. Shell Programming and Scripting

for uniq entries add values in corresponding columns

Hi, I have a file as listed below.. What I want to get is for each unique value in column 1 the corresponding values in the rest of the columns should be summed up.. AAK1 0 1 0 11 AAK1 0 0 1 1 AAK1 0 0 1 2... (2 Replies)
Discussion started by: Diya123
2 Replies

6. Shell Programming and Scripting

awk uniq and longest string of a column as index

I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions: 1) longest string of each pattern in column 2, ignore any sub-string, as the index; 2) all the unique patterns after 1); 3) print the whole row; input: 1 ABCDEFGHI longest_sequence1 2 ABCDEFGH... (12 Replies)
Discussion started by: yifangt
12 Replies

7. Shell Programming and Scripting

Filter uniq field values (non-substring)

Hello, I want to filter column based on string value. All substring matches are filtered out and only unique master strings are picked up. infile: 1 abcd 2 abc 3 abcd 4 cdef 5 efgh 6 efgh 7 efx 8 fgh Outfile: 1 abcd 4 cdef 5 efgh 7 efxI have tried awk '!a++; match(a, $2)>0'... (32 Replies)
Discussion started by: yifangt
32 Replies

8. Shell Programming and Scripting

Uniq count second column

Hello How can I get a number of occurrence count for this file; ERR315389.1000156 CTTGAAGAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTGGAGGCTCAGGCTGAGAAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAG ERR315389.1000281 ... (3 Replies)
Discussion started by: Wan Fahmi
3 Replies

9. Shell Programming and Scripting

Bring values in the second column into single line (comma sep) for uniq value in the first column

I want to bring values in the second column into single line for uniq value in the first column. My input jvm01, Web 2.0 Feature Pack Library jvm01, IBM WebSphere JAX-RS jvm01, Custom01 Shared Library jvm02, Web 2.0 Feature Pack Library jvm02, IBM WebSphere JAX-RS jvm03, Web 2.0 Feature... (10 Replies)
Discussion started by: kchinnam
10 Replies

10. UNIX for Beginners Questions & Answers

Get first column value uniq

Hi All, I have a directory and sub-directory that having ‘n' number of .log file in nearly 1GB. The file is comma separated file. I need to recursively grep and uniq first column values only. I did in perl. But i wish to know more command line utilities to calculate the time for grep and... (4 Replies)
Discussion started by: k_manimuthu
4 Replies
All times are GMT -4. The time now is 11:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy