Sponsored Content
Top Forums Shell Programming and Scripting bash keep only duplicate lines in file Post 302653577 by vlm on Saturday 9th of June 2012 02:02:53 PM
Old 06-09-2012
Quote:
Originally Posted by agama
To do this a bit more information is needed:

1) is the file sorted, or are the lines you wish to 'keep' adjacent to each other in the file?

2) is the order of the output important? Do the lines 'kept' need to be in the same order that they appeared in the input?

3) do some lines appear more than twice, and should those be kept as well, or do you want to keep the lines that appear exactly twice?

4) how big is the file in terms of number of lines?
1)yes the files are sorted

2)no the output order is not at all important

3)lines appear either once or twice

4)that's not known..some files have 500 lines or more and some 4...

I found this:

Code:
comm -3 file1 file2 > file3

but it stores to file3 the lines that appear only in the one file and not in the other,the exact opposite from what I wantSmilie

Actually

Code:
comm -12 file1 file2 > file3

is what I want but it makes the execution much slower

Last edited by vlm; 06-09-2012 at 03:12 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies

2. UNIX for Advanced & Expert Users

Duplicate lines in the file

Hi, I have a file with duplicate lines in it. I want to keep only the duplicate lines and delete the non duplicates. Can some one please help me? Regards Narayana Gupta (3 Replies)
Discussion started by: guptan
3 Replies

3. UNIX for Dummies Questions & Answers

removing duplicate lines from a file

Hi, I am trying to remove duplicate lines from a file. For example the contents of example.txt is: this is a test 2342 this is a test 34343 this is a test 43434 and i want to remove the "this is a test" lines only and end up with the numbers in the file, that is, end up with: 2342... (4 Replies)
Discussion started by: ocelot
4 Replies

4. UNIX for Dummies Questions & Answers

How to redirect duplicate lines from a file????

Hi, I am having a file which contains many duplicate lines. I wanted to redirect these duplicate lines into another file. Suppose I have a file called file_dup.txt which contains some line as file_dup.txt A100-R1 ACCOUNTING-CONTROL ACTONA-ACTASTOR ADMIN-AUTH-STATS ACTONA-ACTASTOR... (3 Replies)
Discussion started by: zing_foru
3 Replies

5. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

6. Shell Programming and Scripting

Duplicate lines in a file

Hi All, I am trying to remove the duplicate entries in a file and print them just once. For example, if my input file has: 00:44,37,67,56,15,12 00:44,34,67,56,15,12 00:44,58,67,56,15,12 00:44,35,67,56,15,12 00:59,37,67,56,15,12 00:59,34,67,56,15,12 00:59,35,67,56,15,12... (7 Replies)
Discussion started by: faiz1985
7 Replies

7. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

8. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

9. UNIX for Dummies Questions & Answers

Duplicate lines in a file

I have a file with following data A B C I would like to print like this n times(For eg:5 times) A B C A B C A B C A B C A (7 Replies)
Discussion started by: nsuresh316
7 Replies

10. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies
COMM(1) 						    BSD General Commands Manual 						   COMM(1)

NAME
comm -- select or reject lines common to two files SYNOPSIS
comm [-123i] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which should be sorted lexically, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. The filename ``-'' means the standard input. The following options are available: -1 Suppress printing of column 1. -2 Suppress printing of column 2. -3 Suppress printing of column 3. -i Case insensitive comparison of lines. Each column will have a number of tab characters prepended to it equal to the number of lower numbered columns that are being printed. For example, if column number two is being suppressed, lines printed in column number one will not have any tabs preceding them, and lines printed in column number three will have one. The comm utility assumes that the files are lexically sorted; all characters participate in line comparisons. ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE, and LC_CTYPE environment variables affect the execution of comm as described in environ(7). EXIT STATUS
The comm utility exits 0 on success, and >0 if an error occurs. SEE ALSO
cmp(1), diff(1), sort(1), uniq(1) STANDARDS
The comm utility conforms to IEEE Std 1003.2-1992 (``POSIX.2''). The -i option is an extension to the POSIX standard. HISTORY
A comm command appeared in Version 4 AT&T UNIX. BUGS
Input lines are limited to LINE_MAX (2048) characters in length. BSD
January 26, 2005 BSD
All times are GMT -4. The time now is 06:11 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy