Print lines based upon unique values in Nth field Post: 303010212

Sponsored Content

Top Forums UNIX for Beginners Questions & Answers Print lines based upon unique values in Nth field Post 303010212 by jvoot on Thursday 28th of December 2017 12:20:52 PM

12-28-2017

Registered User

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt:

Code:

PS003,001 MZMWR/ L-DWD// *
PS003,001 B-!!BRX[/+W M(N-PN(H/J >BCLWM// BN/+W *
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS007,001 CGJWN/ L-DWD// >CR C(JR[ L-JHWH// *
PS007,001 <L DBR/J KWC=// BN/ JMJNJ/ *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *
PS011,001 L-(H-1M]]NYX[/ L-DWD// B-JHWH// XS)HJ[TJ >JK !T!>MR[W L-NPC/+J *
PS011,001 !!NWD[)JW HR/+KM YPWR/ *

The output I desire is this:

Code:

PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *

I have attempted 'sort' with appropriate flags which should work, but for some reason I cannot get it to. For example:

Code:

sort -u -k1,1

I have also tried an 'awk' solution:

Code:

awk '!a[$1]++'

Both of the latter seem to give me the first of the two repeated values in $1, such as:

Code:

PS003,001 MZMWR/ L-DWD// *
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS007,001 CGJWN/ L-DWD// >CR C(JR[ L-JHWH// *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *
PS011,001 L-(H-1M]]NYX[/ L-DWD// B-JHWH// XS)HJ[TJ >JK !T!>MR[W L-NPC/+J *

However, this is not correct. Any help would be greatly appreciated.

jvoot

View Public Profile for jvoot

Find all posts by jvoot

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find top N values for field X based on field Y's value

I want to find the top N entries for a certain field based on the values of another field. For example if N=3, we want the 3 best values for each entry: Entry1 ||| 100 Entry1 ||| 95 Entry1 ||| 30 Entry1 ||| 80 Entry1 ||| 50 Entry2 ||| 40 Entry2 ||| 20 Entry2 ||| 10 Entry2 ||| 50...

2. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the...

3. Shell Programming and Scripting

How to Print from nth field to mth fields using awk

Hi, Is there any short method to print from a particular field till another filed using awk? Example File: File1 ==== 1|2|acv|vbc|......|100|342 2|3|afg|nhj|.......|100|346 Expected output: File2 ==== acv|vbc|.....|100 afg|nhj|.....|100

4. UNIX for Dummies Questions & Answers

Print Nth to last field

Hey, I'm sure this is answered somewhere but my Googling has turned up nothing. I have a file with data in the following format: <desription of event> at <time and date>The desription of the event is variable length and hence when the list is displayed it is hard to easily see the date (and...

5. Shell Programming and Scripting

awk - printing nth field based on parameter

I have a need to print nth field based on the parameter passed. Suppose I have 3 fields in a file, passing 1 to the function should print 1st field and so on. I have attempted below function but this throws an error due to incorrect awk syntax. function calcmaxlen { FIELDMAXLEN=0 ...

6. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks.

7. Shell Programming and Scripting

awk to print unique text in field

I am trying to use awk to print the unique entries in $2 So in the example below there are 3 lines but 2 of the lines match in $2 so only one is used in the output. File.txt chr17:29667512-29667673 NF1:exon.1;NF1:exon.2;NF1:exon.38;NF1:exon.4;NF1:exon.46;NF1:exon.47 703.807...

8. Shell Programming and Scripting

awk to print unique text in field before hyphen

Trying to print the unique values in $2 before the -, currently the count is displayed. Hopefully, the below is close. Thank you :). file chr2:46603668-46603902 EPAS1-902|gc=54.3 253.1 chr2:211471445-211471675 CPS1-1205|gc=48.3 264.7 chr19:15291762-15291983 NOTCH3-1003|gc=68.8 195.8...

9. Shell Programming and Scripting

Print count of unique values

Hello experts, I am converting a number into its binary output as : read n echo "obase=2;$n" | bc I wish to count the maximum continuous occurrences of the digit 1. Example : 1. The binary equivalent of 5 = 101. Hence the output must be 1. 2. The binary...

10. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than...

LEARN ABOUT SUSE

uniq

UNIQ(1) 							   User Commands							   UNIQ(1)

NAME

       uniq - report or omit repeated lines

SYNOPSIS

       uniq [OPTION]... [INPUT [OUTPUT]]

DESCRIPTION

       Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

       Mandatory arguments to long options are mandatory for short options too.

       -c, --count
	      prefix lines by the number of occurrences

       -d, --repeated
	      only print duplicate lines

       -D, --all-repeated[=delimit-method]
	      print all duplicate lines delimit-method={none(default),prepend,separate} Delimiting is done with blank lines.

       -f, --skip-fields=N
	      avoid comparing the first N fields

       -i, --ignore-case
	      ignore differences in case when comparing

       -s, --skip-chars=N
	      avoid comparing the first N characters

       -u, --unique
	      only print unique lines

       -z, --zero-terminated
	      end lines with 0 byte, not newline

       -w, --check-chars=N
	      compare no more than N characters in lines

       --help display this help and exit

       --version
	      output version information and exit

       A field is a run of blanks (usually spaces and/or TABs), then non-blank characters.  Fields are skipped before chars.

       Note:  'uniq'  does  not  detect  repeated  lines unless they are adjacent.  You may want to sort the input first, or use `sort -u' without
       `uniq'.

AUTHOR

       Written by Richard M. Stallman and David MacKenzie.

REPORTING BUGS

       Report uniq bugs to bug-coreutils@gnu.org
       GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
       General help using GNU software: <http://www.gnu.org/gethelp/>

COPYRIGHT

       Copyright (C) 2009 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.

SEE ALSO

       The full documentation for uniq is maintained as a Texinfo manual.  If the info and uniq programs are properly installed at your site,  the
       command

	      info coreutils 'uniq invocation'

       should give you access to the complete manual.

GNU coreutils 7.1						     July 2010								   UNIQ(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find top N values for field X based on field Y's value

Discussion started by: FrancoisCN

2. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Discussion started by: rocket_dog

3. Shell Programming and Scripting

How to Print from nth field to mth fields using awk

Discussion started by: machomaddy

4. UNIX for Dummies Questions & Answers

Print Nth to last field

Discussion started by: RECrerar

5. Shell Programming and Scripting

awk - printing nth field based on parameter

Discussion started by: krishmaths

6. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

Discussion started by: cokedude

7. Shell Programming and Scripting

awk to print unique text in field

Discussion started by: cmccabe

8. Shell Programming and Scripting

awk to print unique text in field before hyphen

Discussion started by: cmccabe

9. Shell Programming and Scripting

Print count of unique values

Discussion started by: H squared

10. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

Discussion started by: cmccabe

LEARN ABOUT SUSE

uniq