Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Print lines based upon unique values in Nth field Post 303010212 by jvoot on Thursday 28th of December 2017 12:20:52 PM
Old 12-28-2017
Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt:

Code:
PS003,001 MZMWR/ L-DWD// *
PS003,001 B-!!BRX[/+W M(N-PN(H/J >BCLWM// BN/+W *
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS007,001 CGJWN/ L-DWD// >CR C(JR[ L-JHWH// *
PS007,001 <L DBR/J KWC=// BN/ JMJNJ/ *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *
PS011,001 L-(H-1M]]NYX[/ L-DWD// B-JHWH// XS)HJ[TJ >JK !T!>MR[W L-NPC/+J *
PS011,001 !!NWD[)JW HR/+KM YPWR/ *

The output I desire is this:
Code:
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *

I have attempted 'sort' with appropriate flags which should work, but for some reason I cannot get it to. For example:

Code:
sort -u -k1,1

I have also tried an 'awk' solution:

Code:
awk '!a[$1]++'

Both of the latter seem to give me the first of the two repeated values in $1, such as:

Code:
PS003,001 MZMWR/ L-DWD// *
PS004,001 L-(H-1M]]NYX[/ B-NGJN(H/WT MZMWR/ L-DWD// *
PS005,001 L-(H-1M]]NYX[/ >L H-NXJLWT/(W(T MZMWR/ L-DWD// *
PS006,001 L-(H-1M]]NYX[/ B-NGJN(H/WT <L H-CMJNJ/T MZMWR/ L-DWD// *
PS007,001 CGJWN/ L-DWD// >CR C(JR[ L-JHWH// *
PS008,001 L-(H-1M]]NYX[/ <L H-GTJT/ MZMWR/ L-DWD// *
PS009,001 L-(H-1M]]NYX[/ <LMWT/ L-(H-BN/ MZMWR/ L-DWD// *
PS011,001 L-(H-1M]]NYX[/ L-DWD// B-JHWH// XS)HJ[TJ >JK !T!>MR[W L-NPC/+J *

However, this is not correct. Any help would be greatly appreciated.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find top N values for field X based on field Y's value

I want to find the top N entries for a certain field based on the values of another field. For example if N=3, we want the 3 best values for each entry: Entry1 ||| 100 Entry1 ||| 95 Entry1 ||| 30 Entry1 ||| 80 Entry1 ||| 50 Entry2 ||| 40 Entry2 ||| 20 Entry2 ||| 10 Entry2 ||| 50... (1 Reply)
Discussion started by: FrancoisCN
1 Replies

2. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Discussion started by: rocket_dog
1 Replies

3. Shell Programming and Scripting

How to Print from nth field to mth fields using awk

Hi, Is there any short method to print from a particular field till another filed using awk? Example File: File1 ==== 1|2|acv|vbc|......|100|342 2|3|afg|nhj|.......|100|346 Expected output: File2 ==== acv|vbc|.....|100 afg|nhj|.....|100 (8 Replies)
Discussion started by: machomaddy
8 Replies

4. UNIX for Dummies Questions & Answers

Print Nth to last field

Hey, I'm sure this is answered somewhere but my Googling has turned up nothing. I have a file with data in the following format: <desription of event> at <time and date>The desription of the event is variable length and hence when the list is displayed it is hard to easily see the date (and... (8 Replies)
Discussion started by: RECrerar
8 Replies

5. Shell Programming and Scripting

awk - printing nth field based on parameter

I have a need to print nth field based on the parameter passed. Suppose I have 3 fields in a file, passing 1 to the function should print 1st field and so on. I have attempted below function but this throws an error due to incorrect awk syntax. function calcmaxlen { FIELDMAXLEN=0 ... (5 Replies)
Discussion started by: krishmaths
5 Replies

6. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

7. Shell Programming and Scripting

awk to print unique text in field

I am trying to use awk to print the unique entries in $2 So in the example below there are 3 lines but 2 of the lines match in $2 so only one is used in the output. File.txt chr17:29667512-29667673 NF1:exon.1;NF1:exon.2;NF1:exon.38;NF1:exon.4;NF1:exon.46;NF1:exon.47 703.807... (5 Replies)
Discussion started by: cmccabe
5 Replies

8. Shell Programming and Scripting

awk to print unique text in field before hyphen

Trying to print the unique values in $2 before the -, currently the count is displayed. Hopefully, the below is close. Thank you :). file chr2:46603668-46603902 EPAS1-902|gc=54.3 253.1 chr2:211471445-211471675 CPS1-1205|gc=48.3 264.7 chr19:15291762-15291983 NOTCH3-1003|gc=68.8 195.8... (3 Replies)
Discussion started by: cmccabe
3 Replies

9. Shell Programming and Scripting

Print count of unique values

Hello experts, I am converting a number into its binary output as : read n echo "obase=2;$n" | bc I wish to count the maximum continuous occurrences of the digit 1. Example : 1. The binary equivalent of 5 = 101. Hence the output must be 1. 2. The binary... (3 Replies)
Discussion started by: H squared
3 Replies

10. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than... (0 Replies)
Discussion started by: cmccabe
0 Replies
CheckDigits(3pm)					User Contributed Perl Documentation					  CheckDigits(3pm)

NAME
Algorithm::CheckDigits - Perl extension to generate and test check digits SYNOPSIS
perl -MAlgorithm::CheckDigits -e Algorithm::CheckDigits::print_methods or use Algorithm::CheckDigits; @ml = Algorithm::CheckDigits->method_list(); $isbn = CheckDigits('ISBN'); if ($isbn->is_valid('3-930673-48-7')) { # do something } $cn = $isbn->complete('3-930673-48'); # $cn = '3-930673-48-7' $cd = $isbn->checkdigit('3-930673-48-7'); # $cd = '7' $bn = $isbn->basenumber('3-930673-48-7'); # $bn = '3-930673-48' ABSTRACT
This module provides a number of methods to test and generate check digits. For more information have a look at the web site www.pruefziffernberechnung.de (german). SUBROUTINES
/METHODS CheckDigits($method) Returns an object of an appropriate Algorithm::CheckDigits class for the given algorithm. Dies with an error message if called with an unknown algorithm. See below for the available algorithms. Every object understands the following methods: is_valid($number) Returns true or false if $number contains/contains no valid check digit. complete($number) Returns a string representation of $number completed with the appropriate check digit. checkdigit($number) Extracts the check digit from $number if $number contains a valid check digit. basenumber($number) Extracts the basenumber from $number if $number contains a valid check digit. Algorithm::CheckDigits::method_list() Returns a list of known methods for check digit computation. Algorithm::CheckDigits::print_methods() Returns a list of known methods for check digit computation. You may use the following to find out which methods your version of Algorithm::CheckDigits provides and where to look for further information. perl -MAlgorithm::CheckDigits -e Algorithm::CheckDigits::print_methods CHECK SUM METHODS At the moment these methods to compute check digits are provided: (vatrn - VAT Return Number, in german ustid UmsatzSTeuer-ID) m07-001 See Algorithm::CheckDigits::M07_001. euronote, m09-001 European bank notes, see Algorithm::CheckDigits::M09_001. amex, bahncard, diners, discover, enroute, eurocard, happydigits, isin, jcb, klubkarstadt, mastercard, miles&more, visa, m09-001, imei, imeisv See Algorithm::CheckDigits::M10_001. siren, siret, m10-002 See Algorithm::CheckDigits::M10_002. ismn, m10-003 See Algorithm::CheckDigits::M10_003. ean, iln, isbn13, nve, 2aus5, m10-004 See Algorithm::CheckDigits::M10_004. identcode_dp, leitcode_dp, m10-005 See Algorithm::CheckDigits::M10_005. rentenversicherung, m10-006 See Algorithm::CheckDigits::M10_006. sedol, m10-008 See Algorithm::CheckDigits::M10_008. betriebsnummer, m10-009 See Algorithm::CheckDigits::M10_009. postscheckkonti, m10-010 See Algorithm::CheckDigits::M10_010. ups, m10-011 See Algorithm::CheckDigits::M10_011. hkid, isbn, issn, nhs_gb, ustid_pt, vat_sl, wagonnr_br, m11-001 See Algorithm::CheckDigits::M11_001. pzn, m11-002 See Algorithm::CheckDigits::M11_002. pkz, m11-003 See Algorithm::CheckDigits::M11_003. cpf, titulo_eleitor, m11-004 See Algorithm::CheckDigits::M11_004. ccc_es, m11-006 See Algorithm::CheckDigits::M11_006. ustid_fi, vatrn_fi, m11-007 See Algorithm::CheckDigits::M11_007. ustid_dk, vatrn_dk, m11-008 See Algorithm::CheckDigits::M11_008. nric_sg, m11-009 See Algorithm::CheckDigits::M11_009. ahv_ch, m11-010 See Algorithm::CheckDigits::M11_010. ustid_nl, vatrn_nl, m11-011 See Algorithm::CheckDigits::M11_011. bwpk_de, m11-012 See Algorithm::CheckDigits::M11_012. ustid_gr, vatrn_gr, m11-013 See Algorithm::CheckDigits::M11_013. esr5_ch, m11-015 See Algorithm::CheckDigits::M11_015. ustid_pl, vatrn_pl, m11-016 See Algorithm::CheckDigits::M11_016. ecno, ec-no, einecs, elincs, m11-017 See Algorithm::CheckDigits::M11_017. isan, m16-001 See Algorithm::CheckDigits::M16_001. dni_es, m23-001 See Algorithm::CheckDigits::M23_001. ustid_ie, vatrn_ie, m23-002 See Algorithm::CheckDigits::M23_002. code_39, m43-001 See Algorithm::CheckDigits::M43_001. ustid_lu, vatrn_lu, m89-001 See Algorithm::CheckDigits::M89_001. ustid_be, vatrn_be, m97-001 See Algorithm::CheckDigits::M97_001. iban, m97-002 See Algorithm::CheckDigits::M97_002. upc, mbase-001 See Algorithm::CheckDigits::MBase_001. blutbeutel, bzue_de, ustid_de, vatrn_de, mbase-002 See Algorithm::CheckDigits::MBase_002. sici, mbase-003 See Algorithm::CheckDigits::MBase_003. pa_de, mxx-001 See Algorithm::CheckDigits::MXX_001. cas, mxx-002 See Algorithm::CheckDigits::MXX_002. dem, mxx-003 Old german bank notes (DEM), see Algorithm::CheckDigits::MXX_003. ustid_at, vatrn_at, mxx-004 See Algorithm::CheckDigits::MXX_004. esr9_ch, mxx-005 See Algorithm::CheckDigits::MXX_005. verhoeff, mxx-006 Verhoeff scheme, see Algorithm::CheckDigits::MXX_006 or Algorithm::Verhoeff EXPORT None by default. SEE ALSO
perl, www.pruefziffernberechnung.de. AUTHOR
Mathias Weidner, <mathias@weidner.in-bad-schmiedeberg.de> THANKS
Petri Oksanen made me aware that CheckDigits('IMEI') would invoke no test at all since there was no entry for this in the methods hash. COPYRIGHT AND LICENSE
Copyright 2004-2006 by Mathias Weidner This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.0 2008-06-06 CheckDigits(3pm)
All times are GMT -4. The time now is 05:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy