Sponsored Content
Top Forums Shell Programming and Scripting perl/shell need help to remove duplicate lines from files Post 302482838 by DGPickett on Wednesday 22nd of December 2010 04:58:42 PM
Old 12-22-2010
If we are talking utterly duplicate lines, it can get VM intensive to do it all in memory so you can preserve order. Are duplicate lines always in the same file? Here is a robust dup finder using sort:
Code:
for file in *.txt
do
  sort $file|uniq -d >>$file.dups
  if [ ! -s $file.dups ]
  then
    rm -f $file.dups
  fi
done

You only get one copy of each dup.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies

2. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies

3. Shell Programming and Scripting

remove all duplicate lines from all files in one folder

Hi, is it possible to remove all duplicate lines from all txt files in a specific folder? This is too hard for me maybe someone could help. lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50 each textfile has lines with text. I want all lines of all textfiles... (8 Replies)
Discussion started by: lowmaster
8 Replies

4. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance (9 Replies)
Discussion started by: cola
9 Replies

5. Shell Programming and Scripting

remove duplicate lines using awk

Hi, I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw :confused: (7 Replies)
Discussion started by: sudvishw
7 Replies

6. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

7. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

8. UNIX for Dummies Questions & Answers

Remove Duplicate Lines

Hi I need this output. Thanks. Input: TAZ YET FOO FOO VAK TAZ BAR Output: YET VAK BAR (10 Replies)
Discussion started by: tara123
10 Replies

9. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

10. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies
dup(2)								System Calls Manual							    dup(2)

Name
       dup, dup2 - duplicate an open file descriptor

Syntax
       newd = dup(oldd)
       int newd, oldd;

       dup2(oldd, newd)
       int oldd, newd;

Description
       The  system  call  duplicates  an  existing  object descriptor.	The argument oldd is a small non-negative integer index in the per-process
       descriptor table.  The value must be less than the size of the table, which is returned by The new descriptor, newd, returned by  the  call
       is the lowest numbered descriptor that is not currently in use by the process.

       The object referenced by the descriptor does not distinguish between references using oldd and newd in any way.	Thus, if newd and oldd are
       duplicate references to an open file, and calls all move a single pointer into the file.  If a separate pointer into the file is desired, a
       different object reference to the file must be obtained by issuing an additional call.

       In the second form of the call, specify the value of newd needed. If this descriptor is already in use, the descriptor is first deallocated
       as if a call had been done.

Return Values
       The value -1 is returned if an error occurs in either call.  The external variable errno indicates the cause of the error.

Diagnostics
       The and system calls fail under the following conditions:

       [EBADF]	      The oldd or newd is not a valid active descriptor.

       [EMFILE]       Too many descriptors are active.

       [EINTR]	      The or function was terminated prematurely by a signal.

See Also
       accept(2), close(2), getdtablesize(2), lseek(2), open(2), pipe(2), read(2), socket(2), socketpair(2), write(2)

																	    dup(2)
All times are GMT -4. The time now is 06:32 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy