Sponsored Content
Top Forums UNIX for Dummies Questions & Answers [SOLVED] remove lines that have duplicate values in column two Post 302706275 by pathunkathunk on Wednesday 26th of September 2012 11:40:15 AM
Old 09-26-2012
[SOLVED] remove lines that have duplicate values in column two

Hi, I've got a file that I'd like to uniquely sort based on column 2 (values in column 2 begin with "comp").

I tried
Code:
sort -t -nuk2,3 file.txt

But got:
sort: multi-character tab `-nuk2,3'

"man sort" did not help me out

Any pointers?

Input:
Quote:
gi|328725975|ref|XP_003248692.1| comp47911_c0_seq1 82.02 367 66 0 1 367 78 1178 0 603
gi|328718720|ref|XP_001946259.2| comp110820_c0_seq1 46.85 111 59 0 422 532 2 334 1.00E-31 120
gi|193617875|ref|XP_001945312.1| comp110820_c0_seq1 45.13 113 62 0 535 647 2 340 7.00E-31 119
gi|328698003|ref|XP_001947254.2| comp1227639_c0_seq1 89.36 141 15 0 3 143 3 425 5.00E-82 247
gi|328725151|ref|XP_001951585.2| comp142443_c0_seq1 53.33 75 32 2 49 122 240 22 1.00E-16 73.2
gi|328725427|ref|XP_001948141.2| comp143768_c0_seq1 89.49 257 25 1 1 257 147 911 3.00E-171 483
gi|328717989|ref|XP_003246356.1| comp143768_c0_seq1 91.42 303 26 0 132 434 3 911 0 587
gi|328712467|ref|XP_001948906.2| comp143768_c0_seq1 69.81 308 87 3 69 375 3 911 1.00E-153 443
gi|328698003|ref|XP_001947254.2| comp143768_c0_seq1 94.12 102 6 0 147 248 3 308 1.00E-62 203
Output:
Quote:
gi|328725975|ref|XP_003248692.1| comp47911_c0_seq1 82.02 367 66 0 1 367 78 1178 0 603
gi|328718720|ref|XP_001946259.2| comp110820_c0_seq1 46.85 111 59 0 422 532 2 334 1.00E-31 120
gi|328698003|ref|XP_001947254.2| comp1227639_c0_seq1 89.36 141 15 0 3 143 3 425 5.00E-82 247
gi|328725151|ref|XP_001951585.2| comp142443_c0_seq1 53.33 75 32 2 49 122 240 22 1.00E-16 73.2
gi|328725427|ref|XP_001948141.2| comp143768_c0_seq1 89.49 257 25 1 1 257 147 911 3.00E-171 483
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl: filtering lines based on duplicate values in a column

Hi I have a file like this. I need to eliminate lines with first column having the same value 10 times. 13 18 1 + chromosome 1, 122638287 AGAGTATGGTCGCGGTTG 13 18 1 + chromosome 1, 128904080 AGAGTATGGTCGCGGTTG 13 18 1 - chromosome 14, 13627938 CAACCGCGACCATACTCT 13 18 1 + chromosome 1,... (5 Replies)
Discussion started by: polsum
5 Replies

2. UNIX for Dummies Questions & Answers

[Solved] How to extract single and duplicate lines from file?

Hi, I need help! I have two files, one containing a list of codes and the other a list of codes and their meaning. I need to extract from file 2 all the codes from file 1 into a new file. These are my files: File1: Metbo Metbo Memar Mth Metbo File2: Metbo Methanoculleus... (3 Replies)
Discussion started by: Lokaps
3 Replies

3. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

(14 Replies)
Discussion started by: dhruuv369
14 Replies

4. Shell Programming and Scripting

Get the average from column, and eliminate the duplicate values.

Dear Experts, Kindly help me please, I have a big file where there is duplicate values in col 11 till col 23, every 2 rows appers a new numbers, but in each row there is different coordinates x and y in col 57 till col 74. Please i will like to get a single value and average of the x and y... (8 Replies)
Discussion started by: jiam912
8 Replies

5. Shell Programming and Scripting

Remove duplicate values with condition

Hi Gents, Please can you help me to get the desired output . In the first column I have some duplicate records, The condition is that all need to reject the duplicate record keeping the last occurrence. But the condition is. If the last occurrence is equal to value 14 or 98 in column 3 and... (2 Replies)
Discussion started by: jiam912
2 Replies

6. UNIX for Dummies Questions & Answers

Remove duplicate words from column 1

Tried using sed and uniq but it's removing the entire line. Can't seem to figure a way to just remove the word. Any help is appreciated. I have a file: dog, text1, text2, text3 dog, text1, text2, text3 dog, text1, text2, text3 cat, text1, text2, text3 Trying to remove all duplicate instances... (6 Replies)
Discussion started by: jimmyf
6 Replies

7. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Hello, I have a script that is generating a tab delimited output file. num Name PCA_A1 PCA_A2 PCA_A3 0 compound_00 -3.5054 -1.1207 -2.4372 1 compound_01 -2.2641 0.4287 -1.6120 3 compound_03 -1.3053 1.8495 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

8. Shell Programming and Scripting

Remove duplicate values in a column(not in the file)

Hi Gurus, I have a file(weblog) as below abc|xyz|123|agentcode=sample code abcdeeess,agentcode=sample code abcdeeess,agentcode=sample code abcdeeess|agentadd=abcd stereet 23343,agentadd=abcd stereet 23343 sss|wwq|999|agentcode=sample1 code wqwdeeess,gentcode=sample1 code... (4 Replies)
Discussion started by: ratheeshjulk
4 Replies

9. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

10. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

I have a file with 5 columns. I want to pull out all records where the value in column 4 is not unique. For example in the sample below, I would want it to print out all lines except for the last two. 40991764 2419 724 47182 Cand A 40992936 3591 724 47182 Cand B 40993016 3671 724 47182 Cand C... (5 Replies)
Discussion started by: kaktus
5 Replies
GIT-UPDATE-REF(1)						    Git Manual							 GIT-UPDATE-REF(1)

NAME
git-update-ref - Update the object name stored in a ref safely SYNOPSIS
git update-ref [-m <reason>] (-d <ref> [<oldvalue>] | [--no-deref] <ref> <newvalue> [<oldvalue>] | --stdin [-z]) DESCRIPTION
Given two arguments, stores the <newvalue> in the <ref>, possibly dereferencing the symbolic refs. E.g. git update-ref HEAD <newvalue> updates the current branch head to the new object. Given three arguments, stores the <newvalue> in the <ref>, possibly dereferencing the symbolic refs, after verifying that the current value of the <ref> matches <oldvalue>. E.g. git update-ref refs/heads/master <newvalue> <oldvalue> updates the master branch head to <newvalue> only if its current value is <oldvalue>. You can specify 40 "0" or an empty string as <oldvalue> to make sure that the ref you are creating does not exist. It also allows a "ref" file to be a symbolic pointer to another ref file by starting with the four-byte header sequence of "ref:". More importantly, it allows the update of a ref file to follow these symbolic pointers, whether they are symlinks or these "regular file symbolic refs". It follows real symlinks only if they start with "refs/": otherwise it will just try to read them and update them as a regular file (i.e. it will allow the filesystem to follow them, but will overwrite such a symlink to somewhere else with a regular filename). If --no-deref is given, <ref> itself is overwritten, rather than the result of following the symbolic pointers. In general, using git update-ref HEAD "$head" should be a lot safer than doing echo "$head" > "$GIT_DIR/HEAD" both from a symlink following standpoint and an error checking standpoint. The "refs/" rule for symlinks means that symlinks that point to "outside" the tree are safe: they'll be followed for reading but not for writing (so we'll never write through a ref symlink to some other tree, if you have copied a whole archive by creating a symlink tree). With -d flag, it deletes the named <ref> after verifying it still contains <oldvalue>. With --stdin, update-ref reads instructions from standard input and performs all modifications together. Specify commands of the form: update SP <ref> SP <newvalue> [SP <oldvalue>] LF create SP <ref> SP <newvalue> LF delete SP <ref> [SP <oldvalue>] LF verify SP <ref> [SP <oldvalue>] LF option SP <opt> LF Quote fields containing whitespace as if they were strings in C source code. Alternatively, use -z to specify commands without quoting: update SP <ref> NUL <newvalue> NUL [<oldvalue>] NUL create SP <ref> NUL <newvalue> NUL delete SP <ref> NUL [<oldvalue>] NUL verify SP <ref> NUL [<oldvalue>] NUL option SP <opt> NUL Lines of any other format or a repeated <ref> produce an error. Command meanings are: update Set <ref> to <newvalue> after verifying <oldvalue>, if given. Specify a zero <newvalue> to ensure the ref does not exist after the update and/or a zero <oldvalue> to make sure the ref does not exist before the update. create Create <ref> with <newvalue> after verifying it does not exist. The given <newvalue> may not be zero. delete Delete <ref> after verifying it exists with <oldvalue>, if given. If given, <oldvalue> may not be zero. verify Verify <ref> against <oldvalue> but do not change it. If <oldvalue> zero or missing, the ref must not exist. option Modify behavior of the next command naming a <ref>. The only valid option is no-deref to avoid dereferencing a symbolic ref. Use 40 "0" or the empty string to specify a zero value, except that with -z an empty <oldvalue> is considered missing. If all <ref>s can be locked with matching <oldvalue>s simultaneously, all modifications are performed. Otherwise, no modifications are performed. Note that while each individual <ref> is updated or deleted atomically, a concurrent reader may still see a subset of the modifications. LOGGING UPDATES
If config parameter "core.logAllRefUpdates" is true and the ref is one under "refs/heads/", "refs/remotes/", "refs/notes/", or the symbolic ref HEAD; or the file "$GIT_DIR/logs/<ref>" exists then git update-ref will append a line to the log file "$GIT_DIR/logs/<ref>" (dereferencing all symbolic refs before creating the log name) describing the change in ref value. Log lines are formatted as: 1. oldsha1 SP newsha1 SP committer LF Where "oldsha1" is the 40 character hexadecimal value previously stored in <ref>, "newsha1" is the 40 character hexadecimal value of <newvalue> and "committer" is the committer's name, email address and date in the standard Git committer ident format. Optionally with -m: 1. oldsha1 SP newsha1 SP committer TAB message LF Where all fields are as described above and "message" is the value supplied to the -m option. An update will fail (without changing <ref>) if the current user is unable to create a new log file, append to the existing log file or does not have committer information available. GIT
Part of the git(1) suite Git 1.8.5.3 01/14/2014 GIT-UPDATE-REF(1)
All times are GMT -4. The time now is 07:12 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy