bash keep only duplicate lines in file


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
bash keep only duplicate lines in file

hello all

in my bash script I have a file and I only want to keep the lines that appear twice in the file.Is there a way to do this?
thanks in advance!
# 2  
To do this a bit more information is needed:

1) is the file sorted, or are the lines you wish to 'keep' adjacent to each other in the file?

2) is the order of the output important? Do the lines 'kept' need to be in the same order that they appeared in the input?

3) do some lines appear more than twice, and should those be kept as well, or do you want to keep the lines that appear exactly twice?

4) how big is the file in terms of number of lines?

Last edited by agama; 06-09-2012 at 03:57 PM.. Reason: typo
# 3  
Quote:
Originally Posted by agama
To do this a bit more information is needed:

1) is the file sorted, or are the lines you wish to 'keep' adjacent to each other in the file?

2) is the order of the output important? Do the lines 'kept' need to be in the same order that they appeared in the input?

3) do some lines appear more than twice, and should those be kept as well, or do you want to keep the lines that appear exactly twice?

4) how big is the file in terms of number of lines?
1)yes the files are sorted

2)no the output order is not at all important

3)lines appear either once or twice

4)that's not known..some files have 500 lines or more and some 4...

I found this:

Code:
comm -3 file1 file2 > file3

but it stores to file3 the lines that appear only in the one file and not in the other,the exact opposite from what I wantSmilie

Actually

Code:
comm -12 file1 file2 > file3

is what I want but it makes the execution much slower

Last edited by vlm; 06-09-2012 at 04:12 PM..
# 4  
Quote:
Originally Posted by vlm
I found this:

Code:
comm -3 file1 file2 > file3

but it stores to file3 the lines that appear only in the one file and not in the other,the exact opposite from what I wantSmilie
Comm can output lines that are common to both files, but from your initial post you suggest that you only have one file to work with and comm won't help with that.

Given that your file is already sorted this is the easy case and something like this will probably do what you need:
Code:
awk ' p == $0; { p = $0 }' input-file >output-file

It will write each line that is duplicated once onto standard output.

---------- Post updated at 14:15 ---------- Previous update was at 14:13 ----------

Ok, looks like we crossed posts and you have two files, not one as suggested in your initial question.

Using comm -12 is the easiest, and probably most efficient method.
This User Gave Thanks to agama For This Post:
# 5  
uniq -d is meant for displaying duplicate lines....
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #806
Difficulty: Medium
The decimal value 10,995 is expressed in hexadecimal as 2AF5
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

2. UNIX for Dummies Questions & Answers

Duplicate lines in a file

I have a file with following data A B C I would like to print like this n times(For eg:5 times) A B C A B C A B C A B C A (7 Replies)
Discussion started by: nsuresh316
7 Replies

3. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

4. Shell Programming and Scripting

Duplicate lines in a file

Hi All, I am trying to remove the duplicate entries in a file and print them just once. For example, if my input file has: 00:44,37,67,56,15,12 00:44,34,67,56,15,12 00:44,58,67,56,15,12 00:44,35,67,56,15,12 00:59,37,67,56,15,12 00:59,34,67,56,15,12 00:59,35,67,56,15,12... (7 Replies)
Discussion started by: faiz1985
7 Replies

5. Shell Programming and Scripting

How to : Find duplicate number from file? with bash

Thanks AVKlinux (6 Replies)
Discussion started by: avklinux
6 Replies

6. Shell Programming and Scripting

Bash Script duplicate file names

I am trying to write a housekeeping bash script. Part of it involves searching all of my attached storage media for photographs and moving them into a single directory. The problem occurs when files have duplicate names, obviously a file called 001.jpg will get overwritten with another file... (6 Replies)
Discussion started by: stumpyuk
6 Replies

7. UNIX for Dummies Questions & Answers

How to redirect duplicate lines from a file????

Hi, I am having a file which contains many duplicate lines. I wanted to redirect these duplicate lines into another file. Suppose I have a file called file_dup.txt which contains some line as file_dup.txt A100-R1 ACCOUNTING-CONTROL ACTONA-ACTASTOR ADMIN-AUTH-STATS ACTONA-ACTASTOR... (3 Replies)
Discussion started by: zing_foru
3 Replies

8. UNIX for Dummies Questions & Answers

removing duplicate lines from a file

Hi, I am trying to remove duplicate lines from a file. For example the contents of example.txt is: this is a test 2342 this is a test 34343 this is a test 43434 and i want to remove the "this is a test" lines only and end up with the numbers in the file, that is, end up with: 2342... (4 Replies)
Discussion started by: ocelot
4 Replies

9. UNIX for Advanced & Expert Users

Duplicate lines in the file

Hi, I have a file with duplicate lines in it. I want to keep only the duplicate lines and delete the non duplicates. Can some one please help me? Regards Narayana Gupta (3 Replies)
Discussion started by: guptan
3 Replies

10. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies

Featured Tech Videos