Unix/Linux Go Back    


UNIX for Beginners Questions & Answers If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

Print lines based upon unique values in Nth field

UNIX for Beginners Questions & Answers


Tags
awk, sort, uniq

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 3 Weeks Ago   -   Original Discussion by jvoot
jvoot's Unix or Linux Image
jvoot jvoot is offline
Registered User
 
Join Date: Aug 2014
Last Activity: 19 January 2018, 12:43 AM EST
Posts: 30
Thanks: 28
Thanked 1 Time in 1 Post
Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt:



Code:
PS003,001 MZMWR/ L-DWD// *
PS003,001 B-!!BRX[/+W M(N-PN(H/J >BCLWM// BN/+W *
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS007,001 CGJWN/ L-DWD// >CR C(JR[ L-JHWH// *
PS007,001 <L DBR/J KWC=// BN/ JMJNJ/ *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *
PS011,001 L-(H-1M]]NYX[/ L-DWD// B-JHWH// XS)HJ[TJ >JK !T!>MR[W L-NPC/+J *
PS011,001 !!NWD[)JW HR/+KM YPWR/ *

The output I desire is this:


Code:
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *

I have attempted 'sort' with appropriate flags which should work, but for some reason I cannot get it to. For example:



Code:
sort -u -k1,1

I have also tried an 'awk' solution:



Code:
awk '!a[$1]++'

Both of the latter seem to give me the first of the two repeated values in $1, such as:



Code:
PS003,001 MZMWR/ L-DWD// *
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS007,001 CGJWN/ L-DWD// >CR C(JR[ L-JHWH// *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *
PS011,001 L-(H-1M]]NYX[/ L-DWD// B-JHWH// XS)HJ[TJ >JK !T!>MR[W L-NPC/+J *

However, this is not correct. Any help would be greatly appreciated.
Sponsored Links
    #2  
Old Unix and Linux 3 Weeks Ago   -   Original Discussion by jvoot
Scott's Unix or Linux Image
Scott Scott is online now Forum Staff  
Administrator
 
Join Date: Jun 2009
Last Activity: 22 January 2018, 3:24 AM EST
Posts: 9,017
Thanks: 389
Thanked 1,252 Times in 1,066 Posts
With a little more work, you can do this in awk, without the sort, but:


Code:
$ awk '{A[$1]++; L[$1]=$0} END { for( a in A ) if( A[a] == 1 ) print L[a] }' file | sort
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *

(noting that PS004,001, not just PS004 counts as $1, as , is not a field separator)
Sponsored Links
    #3  
Old Unix and Linux 3 Weeks Ago   -   Original Discussion by jvoot
jvoot's Unix or Linux Image
jvoot jvoot is offline
Registered User
 
Join Date: Aug 2014
Last Activity: 19 January 2018, 12:43 AM EST
Posts: 30
Thanks: 28
Thanked 1 Time in 1 Post
Thanks Scott! Worked like a charm!
    #4  
Old Unix and Linux 3 Weeks Ago   -   Original Discussion by jvoot
RudiC's Unix or Linux Image
RudiC RudiC is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 22 January 2018, 3:28 AM EST
Location: Aachen, Germany
Posts: 11,966
Thanks: 354
Thanked 3,688 Times in 3,386 Posts
Depending on your uniq version, this might also work:


Code:
uniq -uw5 file
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *

Sponsored Links
    #5  
Old Unix and Linux 2 Weeks Ago   -   Original Discussion by jvoot
MadeInGermany's Unix or Linux Image
MadeInGermany MadeInGermany is offline Forum Staff  
Moderator
 
Join Date: May 2012
Last Activity: 22 January 2018, 3:00 AM EST
Location: Simplicity
Posts: 3,947
Thanks: 335
Thanked 1,322 Times in 1,192 Posts
Provided the input file is sorted on column 1, the following awk works as well and does not consume much memory (especially with a big input file)


Code:
awk '
{
  if ($1!=p1) {
    if (c==1) print p0
    c=0
  }
  c++
  p1=$1; p0=$0
}
END {
  if (c==1) print p0
}
' file

Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk to print lines based on text in field and value in two additional fields cmccabe Shell Programming and Scripting 0 07-10-2017 09:53 AM
Print unique lines without sort or unique cokedude UNIX for Dummies Questions & Answers 7 09-18-2013 08:14 PM
awk - printing nth field based on parameter krishmaths Shell Programming and Scripting 5 06-18-2013 05:47 AM
Print Nth to last field RECrerar UNIX for Dummies Questions & Answers 8 11-10-2012 06:25 PM
Compare Tab Separated Field with AWK to all and print lines of unique fields. rocket_dog Shell Programming and Scripting 1 05-26-2011 09:03 PM



All times are GMT -4. The time now is 04:34 AM.