Sponsored Content
Top Forums Shell Programming and Scripting Removing duplicates depending on file size Post 302830169 by gokcell on Monday 8th of July 2013 08:23:41 AM
Old 07-08-2013
Duplication Analyses

Hi,

I think below duplication analyses can help you.
Code:
[goksel@gokcell 2july]$ cat file1 
Goksel Yangin
Deneme Test
Goksel Yangin
Deneme Test
Ali Veli
Hasan Huseyin
Test 12345
Unix Linux
Linux Unix
Goksel Yangin
[goksel@gokcell 2july]$ cat file1  | sort | uniq -c
      1 Ali Veli
      2 Deneme Test
      3 Goksel Yangin
      1 Hasan Huseyin
      1 Linux Unix
      1 Test 12345
      1 Unix Linux
[goksel@gokcell 2july]$ cat file1  | sort | uniq -c | awk '{print $3"\t"$2"\t""DUP"$1}'
Veli    Ali     DUP1
Test    Deneme  DUP2
Yangin  Goksel  DUP3
Huseyin Hasan   DUP1
Unix    Linux   DUP1
12345   Test    DUP1
Linux   Unix    DUP1
[goksel@gokcell 2july]$ cat file1  | sort | uniq -c | awk '{print $3"\t"$2"\t""DUP"$1}' | grep "DUP1"
Veli    Ali     DUP1
Huseyin Hasan   DUP1
Unix    Linux   DUP1
12345   Test    DUP1
Linux   Unix    DUP1
[goksel@gokcell 2july]$ cat file1  | sort | uniq -c | awk '{print $3"\t"$2"\t""DUP"$1}' | grep "DUP2"
Test    Deneme  DUP2
[goksel@gokcell 2july]$ cat file1  | sort | uniq -c | awk '{print $3"\t"$2"\t""DUP"$1}' | grep "DUP3"
Yangin  Goksel  DUP3

Regards,
Goksel Yangin
Computer Engineer

Last edited by Franklin52; 07-08-2013 at 10:53 AM.. Reason: Please use code tags, thank you
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

removing duplicates from a file

i have a file with some 1000 entries it will contain entries like 1000,ram 2000,pankaj 1001,rahim 1000,ram 2532,govind 2000,pankaj 3000,venkat 2532,govind what i want is i want to extract only the distinct rows from this file so my output should contain only 1000,ram... (2 Replies)
Discussion started by: trichyselva
2 Replies

2. Shell Programming and Scripting

Removing duplicates in a sorted file by field.

I have data like this: It's sorted by the 2nd field (TID). envoy,90000000000000634600010001,04/11/2008,23:19:27,RB00266,0015,DETAIL,ERROR, envoy,90000000000000634600010001,04/12/2008,04:23:45,RB00266,0015,DETAIL,ERROR,... (1 Reply)
Discussion started by: kinksville
1 Replies

3. UNIX for Dummies Questions & Answers

removing duplicates of a pattern from a file

hey all, I need some help. I have a text file with names in it. My target is that if a particular pattern exists in that file more than once..then i want to rename all the occurences of that pattern by alternate patterns.. for e.g if i have PATTERN occuring 5 times then i want to... (3 Replies)
Discussion started by: ashisharora
3 Replies

4. Shell Programming and Scripting

Removing duplicates from log file?

I have a log file with posts looking like this: -- Messages can be delivered by different systems at different times. The id number is used to sort out duplicate messages. What I need is to strip the arrival time from each post, sort posts by id number, and reattach arrival time to respective... (2 Replies)
Discussion started by: Ilja
2 Replies

5. Shell Programming and Scripting

Removing Duplicates from file

Hi Experts, Please check the following new requirement. I got data like the following in a file. FILE_HEADER 01cbbfde7898410| 3477945| home| 1 01cbc275d2c122| 3478234| WORK| 1 01cbbe4362743da| 3496386| Rich Spare| 1 01cbc275d2c122| 3478234| WORK| 1 This is pipe separated file with... (3 Replies)
Discussion started by: tinufarid
3 Replies

6. Shell Programming and Scripting

formatting a file and removing duplicates

Hi, I have a file that I want to change the format of. It is a large file in rows but I want it to be comma separated (comma then a space). The current file looks like this: HI, Joe, Bob, Jack, Jack After I would want to remove any duplicates so it would look like this: HI, Joe,... (2 Replies)
Discussion started by: kylle345
2 Replies

7. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Hi All, I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file I,01,000131,764,2,4.00 I,01,000131,765,2,4.00 I,01,000131,772,2,4.00 I,01,000131,773,2,4.00 I,01,000168,762,2,2.00 I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies

8. UNIX for Dummies Questions & Answers

Grep from pattern file without removing duplicates?

I have been using grep to output whole lines using a pattern file with identifiers (fileA): fig|562.2322.peg.1 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.7 From fileB with corresponding identifiers in the second column: NODE_0 fig|562.2322.peg.1 peg ... (2 Replies)
Discussion started by: Mauve
2 Replies

9. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

10. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies
CAT(1)							    BSD General Commands Manual 						    CAT(1)

NAME
cat -- concatenate and print files SYNOPSIS
cat [-benstuv] [file ...] DESCRIPTION
The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash ('-') or absent, cat reads from the standard input. If file is a UNIX domain socket, cat connects to it and then reads it until EOF. This complements the UNIX domain binding capability available in inetd(8). The options are as follows: -b Number the non-blank output lines, starting at 1. -e Display non-printing characters (see the -v option), and display a dollar sign ('$') at the end of each line. -n Number the output lines, starting at 1. -s Squeeze multiple adjacent empty lines, causing the output to be single spaced. -t Display non-printing characters (see the -v option), and display tab characters as '^I'. -u Disable output buffering. -v Display non-printing characters so they are visible. Control characters print as '^X' for control-X; the delete character (octal 0177) prints as '^?'. Non-ASCII characters (with the high bit set) are printed as 'M-' (for meta) followed by the character for the low 7 bits. EXIT STATUS
The cat utility exits 0 on success, and >0 if an error occurs. EXAMPLES
The command: cat file1 will print the contents of file1 to the standard output. The command: cat file1 file2 > file3 will sequentially print the contents of file1 and file2 to the file file3, truncating file3 if it already exists. See the manual page for your shell (i.e., sh(1)) for more information on redirection. The command: cat file1 - file2 - file3 will print the contents of file1, print data it receives from the standard input until it receives an EOF ('^D') character, print the con- tents of file2, read and output contents of the standard input again, then finally output the contents of file3. Note that if the standard input referred to a file, the second dash on the command-line would have no effect, since the entire contents of the file would have already been read and printed by cat when it encountered the first '-' operand. SEE ALSO
head(1), more(1), pr(1), sh(1), tail(1), vis(1), zcat(1), setbuf(3) Rob Pike, "UNIX Style, or cat -v Considered Harmful", USENIX Summer Conference Proceedings, 1983. STANDARDS
The cat utility is compliant with the IEEE Std 1003.2-1992 (``POSIX.2'') specification. The flags [-benstv] are extensions to the specification. HISTORY
A cat utility appeared in Version 1 AT&T UNIX. Dennis Ritchie designed and wrote the first man page. It appears to have been cat(1). BUGS
Because of the shell language mechanism used to perform output redirection, the command ``cat file1 file2 > file1'' will cause the original data in file1 to be destroyed! The cat utility does not recognize multibyte characters when the -t or -v option is in effect. BSD
March 21, 2004 BSD
All times are GMT -4. The time now is 01:50 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy