duplicate line in a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting duplicate line in a text file
# 1  
Old 04-25-2008
duplicate line in a text file

i would like to scan file in for duplicate lines, and print the duplicates to another file,
oh and it has to be case insensitive.

example

line1
line2
line2
line3
line4
line4

outputfile:
line2
line4

any ideas
# 2  
Old 04-25-2008
Code:
perl -ne '$l = lc(); print if $m{$l}++; ' file

lc returns the specified string in lowercase; in this case, with no parameter, it defaults to the current input line. $m{$l} is the count of number of times we have seen $l; if it's nonzero, it's a duplicate, so we print it.

Last edited by era; 04-25-2008 at 02:50 PM.. Reason: Add explanation
# 3  
Old 04-25-2008
Quote:
Originally Posted by era
Code:
perl -ne '$l = lc(); print if $m{$l}++; ' file

lc returns the specified string in lowercase; in this case, with no parameter, it defaults to the current input line. $m{$l} is the count of number of times we have seen $l; if it's nonzero, it's a duplicate, so we print it.

thanks it's worked for one file, but i forgot to add that if i need it to run on a directory and make sure to do in on a file only and not subdirectory

what modifications would you do
# 4  
Old 04-25-2008
Loop over files in a directory, and remove duplicates? Do you want to replace the files?

Code:
for f in directory/*; do
  test -d "$f" && continue   # skip if it's a subdirectory
  perl -ne '$l = lc(); print if $m{$l}++; ' "$f" >"$f.tmp"
  mv "$f.tmp" "$f"
done

# 5  
Old 04-25-2008
Thank you Era, it worked, Smilie
# 6  
Old 04-29-2008
Code:
awk '{
  a[$0]++
}
END{
 for (i in a)
 if (a[i]=2)
 print i
}' filename

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicate sequences and modifying a text file

Hi. I've tried several different programs to try and solve this problem, but none of them seem to have done exactly what I want (and I need the file in a very specific format). I have a large file of DNA sequences in a multifasta file like this, with around 15 000 genes: ... (2 Replies)
Discussion started by: 4galaxy7
2 Replies

2. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22... (3 Replies)
Discussion started by: mahasona
3 Replies

3. Shell Programming and Scripting

Honey, I broke awk! (duplicate line removal in 30M line 3.7GB csv file)

I have a script that builds a database ~30 million lines, ~3.7 GB .cvs file. After multiple optimzations It takes about 62 min to bring in and parse all the files and used to take 10 min to remove duplicates until I was requested to add another column. I am using the highly optimized awk code: awk... (34 Replies)
Discussion started by: Michael Stora
34 Replies

4. Shell Programming and Scripting

Delete Duplicate line (not really) from the file

I need help in figuring out hoe to delete lines in a data file. The data file is huge. I am currently using "vi" to search and delete the lines - which is cumbersome since it takes lots of time to save that file (due to its huge size). Here is the issue. I have a data file with the following... (4 Replies)
Discussion started by: GosarJunk
4 Replies

5. Shell Programming and Scripting

Duplicate each field in a text file

Hello all, I am searching for a solution to the following problem: Given input such as this: I would like to find a way to output this: Thanks in advance! (4 Replies)
Discussion started by: hydrabane
4 Replies

6. Shell Programming and Scripting

Duplicate rows in a text file

notes: i am using cygwin and notepad++ only for checking this and my OS is XP. #!/bin/bash typeset -i totalvalue=(wc -w /cygdrive/c/cygwinfiles/database.txt) typeset -i totallines=(wc -l /cygdrive/c/cygwinfiles/database.txt) typeset -i columnlines=`expr $totalvalue / $totallines` awk -F' ' -v... (5 Replies)
Discussion started by: whitecross
5 Replies

7. Shell Programming and Scripting

How to find Duplicate Records in a text file

Hi all pls help me by providing soln for my problem I'm having a text file which contains duplicate records . Example: abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 tas 3420 3562 ... (1 Reply)
Discussion started by: G.Aavudai
1 Replies

8. Shell Programming and Scripting

concatenate all duplicate line in a file.

Hi All, i have a zip file like the format 794051400123|COM|24|0|BD|R|99.98 794051413727|COM|11|0|BD|R|28.99 794051415622|COM|23|0|BD|R|28.99 883929004676|COM|0|0|BD|R|28.99 794051400123|MOM|62|0|BD|R|99.98 794051413727|MOM|4|0|BD|R|28.99 794051415622|MOM|80|0|BD|R|28.99 ... (30 Replies)
Discussion started by: vaskarbasak
30 Replies

9. Shell Programming and Scripting

Eleminating Duplicate IPs from a text file

Hey Guys I need to eleminate duplicate IP's from a text file using bash.Any suggestions.Appreciate your help guys. --CoolKid (4 Replies)
Discussion started by: coolkid
4 Replies

10. UNIX for Advanced & Expert Users

Duplicate records from oracle to text file.

Hi, I want to fetch duplicate records from an external table to a text file. Pls suggest me. Thanks (1 Reply)
Discussion started by: shilendrajadon
1 Replies
Login or Register to Ask a Question