11-23-2009
Duplicate columns and lines
Hi all,
I have a tab-delimited file and want to remove identical lines, i.e. all of line 1,2,4 because the columns are the same as the columns in other lines. Any input is appreciated.
abc gi4597 9997 cgcgtgcg $%^&*()()*
abc gi4597 9997 cgcgtgcg $%^&*()()*
ttt gi9865 8879 tgcgtgtt *(())^#@!!
abc gi4597 9997 cgcgtgcg $%^&*()()*
fgy gi9876 0975 cgaggcgc @#$%^*&*((
abc gi4597 9997 ttgttgttc $%^&*()()*
---------- Post updated at 09:29 AM ---------- Previous update was at 09:19 AM ----------
It just clicked:
awk 'x[$1,$2,$3,$4,$5,$6]++' filename
Any other methods would be helpful
![Smilie Smilie](https://www.unix.com/images/smilies/smile.gif)
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
How to identify duplicate columns in a row?
Input data: may have 30 columns
9211480750 LK 120070417 920091030
9211480893 AZ 120070607
9205323621 O7 120090914 120090914 1420090914 2020090914 2020090914
9211479568 AZ 120070327 320090730
9211479571 MM 120070326
9211480892 MM 120070324... (3 Replies)
Discussion started by: suresh3566
3 Replies
2. Shell Programming and Scripting
hello,
I have an input file which looks like this:
2 C:G 17 -0.14 8.75 33.35
3 G:C 16 -2.28 0.98 28.22
4 C:G 15 0.39 11.06 29.31
5 G:C 14 2.64 5.17 36.07
6 G:C 13 -0.65 2.05 21.94
7 C:G 11 138.96 21.64 14.40
9 C:G 27 -2.40 6.95 27.98
10 C:G 26 2.89 15.60 34.33
11 G:C... (7 Replies)
Discussion started by: linux_usr
7 Replies
3. UNIX for Dummies Questions & Answers
Hi friends,
I have a xlsheet like below first column having id ABCfollowed by 7digit numbers and the next column have title against the ids. Titles are unique and duplicateboth, but ids are unique even for duplicate title.Now I need to identify those duplicate title having the highest id for... (9 Replies)
Discussion started by: umapearl
9 Replies
4. UNIX for Advanced & Expert Users
Hi All,
I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space.
I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies
5. UNIX for Dummies Questions & Answers
hello all,
I have an input file with four columns like this with a lot of lines
and for example, line 1 and line 5 match because the first 4 characters match and the fourth column matches too. I want to keep the line that has the lowest number in the third column. So I discard line 5.... (5 Replies)
Discussion started by: TheTransporter
5 Replies
6. Shell Programming and Scripting
hi friends,
my input
chr1 exon 35204 35266 gene_id "GOLGB1"; transcript_id "GOLGB1";
chr1 exon 42357 42473 gene_id "GOLGB1"; transcript_id "GOLGB1";
chr1 exon 45261 45404 gene_id "GOLGB1"; transcript_id "GOLGB1";
chr1 exon 50701 50778 gene_id "GOLGB1"; transcript_id "GOLGB1";... (2 Replies)
Discussion started by: jacobs.smith
2 Replies
7. Shell Programming and Scripting
I've a text file with below values viz. multiple rows with same values in column 3, 4 and 5, which need to be considered as duplicates. For all such cases, the rows from second occurrence onwards should be modified in a way that their values in first two columns are replaced with values as in first... (4 Replies)
Discussion started by: asyed
4 Replies
8. Shell Programming and Scripting
I have this structure:
col1 col2 col3 col4 col5
27 xxx 38 aaa ttt
2 xxx 38 aaa yyy
1 xxx 38 aaa yyy
I need to collapse duplicate lines ignoring column 1 and add values of duplicate lines (col1) so it will look like this:
col1 col2 col3 col4 col5
27 xxx 38 aaa ttt ... (3 Replies)
Discussion started by: coppuca
3 Replies
9. Shell Programming and Scripting
I have a 13gb file. It has the following columns:
The 3rd column is basically correlation values. I want to delete those rows which are repeated between the columns:
A B 0.04
B C 0.56
B B 1
A A 1
C D 1
C C 1
Desired Output: (preferably in a .csv format
A,B,0.04
B,C,0.56
C,D,1... (3 Replies)
Discussion started by: Sanchari
3 Replies
10. UNIX for Dummies Questions & Answers
Hi There,
I have an I/P which looks like --
1 2 3 4 5
1 2 3 4 6
4 7 8 9 9
5 6 7 8 9
I would like O/P to be ---
1 2 3 4 5
1 2 3 4 6
So, printing only the consecutive lines where $1,$2,$3,$4 are matching.
Is there any command to do this or small awk script?
Thanks, (12 Replies)
Discussion started by: Indra2011
12 Replies
LEARN ABOUT DEBIAN
getcol
getcol(1) General Commands Manual getcol(1)
Name
getcol - Extract specified columns from an ASCII table file
Synopsis
getcol [-amv][-n num][-r lines][-s num] filename [column number range]
Description
Extract specified columns from an ASCII table file
Options
filename
Name of a ASCII table file. At least one of these must be present for any values to be printed. If it is stdin or STDIN, an ASCII
table is expected as standard input. If there is no input file, standard input is assumed.
@filename
Name of a file containing a list of ASCII table files. If this is present, any other file names on the command line will be
ignored.
field range
Print value of these columns for the number of lines of the table specified by the -n argument after the skippiing the number of
lines specified by the -s argument. A value of 0 causes the entire input line to be printed.
-a Sum all numeric columns selected, printing the sum on the line following the result. Columns with no sum are filled with ___.
(Added in version 2.6.9)
-b Input is bar-separate table file
-c Add count of number of lines in each column at end
-d <number>
Number of decimal places in f.p. output
-e Compute medians of selected columns
-f Print range of values in selected columns
-h Print Starbase tab table header
-i Input is tab-separate table file
-k Print number of columns on first line
-l <number>
Number of lines to add to each line
-m Compute the means of all numeric columns selected, printing the mean on the line following the result (or the line following the sum
if -a is used). Columns with no mean are filled with ___. (Added in version 2.6.9)
-n num Print selected columns for this many lines. If not specified, all lines will be read after the number of lines specified by -s have
been skipped.
-o OR conditions insted of ANDing them
-p Print only sum, mmean, sigma, median, or range, not entries
-r @listfile
-r line range Print columns from the lines specified as either the first nonzero number on each line of the file listfile or the
comma- and hyphen- delimitied range; i.e. 1-5,10-12 will print values from lines 1, 2, 3, 4, 5, 10, 11, and 12. (added in version
2.6.12)
-s num Skip this many line before starting to print values. If not specified, no lines will be skipped.
-t Starbase (tab-separated) table output
-v Print more information about process.
Web Page
http://tdc-www.harvard.edu/software/wcstools/getcol.html
Author
Doug Mink, SAO (dmink@cfa.harvard.edu)
8 November 2001 WCSTools getcol(1)