Sponsored Content
Top Forums UNIX for Advanced & Expert Users In a huge file, Delete duplicate lines leaving unique lines Post 302543939 by alister on Tuesday 2nd of August 2011 12:52:36 PM
Old 08-02-2011
Quote:
Originally Posted by yazu
You can split the file (with "split" command), then "sort -u" the chunks separately and then merge them with "sort -m". (Of course whether you need it depends on the memory size of your system).
You probably won't have to split anything manually. Many (if not most) sort implementations (GNU, *BSD, Solaris, HP-UX, to name a few) will do this for you automatically. They compare the size of the file to be sorted against the system's available memory and make a conservative guess. Intermediate files are then created in $TMPDIR.

As vgersh99 pointed out, often there'll be a -T option to override the enviroment variable, although if this option is missing, you can simply override the environment default when invoking sort (TMPDIR=/lots/of/space sort ...).

Regards,
Alister
These 3 Users Gave Thanks to alister For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete lines from huge file

I have to delete 1st 7000 lines of a file which is 12GB large. As it is so large, i can't open in vi and delete these lines. Also I found one post here which gave solution using perl, but I don't have perl installed. Also some solutions were redirecting the o/p to a different file and renaming it.... (3 Replies)
Discussion started by: rahulrathod
3 Replies

2. Shell Programming and Scripting

delete semi-duplicate lines from file?

Ok here's what I'm trying to do. I need to get a listing of all the mountpoints on a system into a file, which is easy enough, just using something like "mount | awk '{print $1}'" However, on a couple of systems, they have some mount points looking like this: /stage /stand /usr /MFPIS... (2 Replies)
Discussion started by: paqman
2 Replies

3. UNIX for Dummies Questions & Answers

Delete duplicate lines and print to file

OK, I have read several things on how to do this, but can't make it work. I am writing this to a vi file then calling it as an awk script. So I need to search a file for duplicate lines, delete duplicate lines, then write the result to another file, say /home/accountant/files/docs/nodup ... (2 Replies)
Discussion started by: bfurlong
2 Replies

4. UNIX for Dummies Questions & Answers

How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file. I have a file having huge number of lines. i want to remove selected lines in it. And also if there exists duplicate lines, I want to delete the rest & just keep one of them. Please help me with any unix commands or even fortran... (7 Replies)
Discussion started by: reva
7 Replies

5. UNIX for Dummies Questions & Answers

Delete lines with duplicate strings based on date

Hey all, a relative bash/script newbie trying solve a problem. I've got a text file with lots of lines that I've been able to clean up and format with awk/sed/cut, but now I'd like to remove the lines with duplicate usernames based on time stamp. Here's what the data looks like 2007-11-03... (3 Replies)
Discussion started by: mattv
3 Replies

6. UNIX for Dummies Questions & Answers

How to delete partial duplicate lines unix

hi :) I need to delete partial duplicate lines I have this in a file sihp8027,/opt/cf20,1980182 sihp8027,/opt/oracle/10gRelIIcd,155200016 sihp8027,/opt/oracle/10gRelIIcd,155200176 sihp8027,/var/opt/ERP,10376312 and need to leave it like this: sihp8027,/opt/cf20,1980182... (2 Replies)
Discussion started by: C|KiLLeR|S
2 Replies

7. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

8. Shell Programming and Scripting

Delete duplicate lines... with a twist!

Hi, I'm sorry I'm no coder so I came here, counting on your free time and good will to beg for spoonfeeding some good code. I'll try to be quick and concise! Got file with 50k lines like this: "Heh, heh. Those darn ninjas. They're _____."*wacky The "canebrake", "timber" & "pygmy" are types... (7 Replies)
Discussion started by: shadowww
7 Replies

9. UNIX for Beginners Questions & Answers

How to delete identical lines while leaving one undeleted?

Hi, I have a file as follows. file1 Hello Hi His Hi Hi Hungry hi so I want to delete identical lines while leaving one of them undeleted. So desired output will be Hello Hi (2 Replies)
Discussion started by: beginner_99
2 Replies

10. UNIX for Beginners Questions & Answers

Delete duplicate like pattern lines

Hi I need to delete duplicate like pattern lines from a text file containing 2 duplicates only (one being subset of the other) using sed or awk preferably. Input: FM:Chicago:Development FM:Chicago:Development:Score SR:Cary:Testing:Testcases PM:Newyork:Scripting PM:Newyork:Scripting:Audit... (6 Replies)
Discussion started by: tech_frk
6 Replies
uniq(1) 							   User Commands							   uniq(1)

NAME
uniq - report or filter out repeated lines in a file SYNOPSIS
/usr/bin/uniq /usr/bin/uniq [-c | -d | -u] [-f fields] [-s char] [input_file [output_file]] /usr/bin/uniq [-c | -d | -u] [-n] [+ m] [input_file [output_file]] ksh93 uniq [-cdiu] [-D[delimit]] [-f fields] [-s chars] [-w chars] [input_file [output_file]] uniq [-cdiu] [-D[delimit]] [-n] [+m] [-w chars] [input_file [output_file]] DESCRIPTION
/usr/bin/uniq The uniq utility reads an input file comparing adjacent lines and writes one copy of each input line on the output. The second and succeed- ing copies of repeated adjacent input lines are not written. Repeated lines in the input are not detected if they are not adjacent. ksh93 The uniq built-in in ksh93 is associated with the /bin or /usr/bin path. It is invoked when uniq is executed without a pathname prefix and the pathname search finds a /bin/uniq or /usr/bin/uniq executable. uniq reads an input, comparing adjacent lines, and writing one copy of each input line on the output. The second and succeeding copies of the repeated adjacent lines are not written. If output_file is not specified, uniq writes to standard output. If input_file is not specified, or if input_file is -, uniq reads from standard input, and the start of the file is defined as the current offset. OPTIONS
/usr/bin/uniq The following options are supported by /usr/bin/uniq: -c Precedes each output line with a count of the number of times the line occurred in the input. -d Suppresses the writing of lines that are not repeated in the input. -f fields Ignores the first fields fields on each input line when doing comparisons, where fields is a positive decimal integer. A field is the maximal string matched by the basic regular expression: [[:blank:]]*[^[:blank:]]* If fields specifies more fields than appear on an input line, a null string is used for comparison. +m Equivalent to -s chars with chars set to m. -n Equivalent to -f fields with fields set to n. -s chars Ignores the first chars characters when doing comparisons, where chars is a positive decimal integer. If specified in conjunc- tion with the -f option, the first chars characters after the first fields fields is ignored. If chars specifies more charac- ters than remain on an input line, a null string is used for comparison. -u Suppresses the writing of lines that are repeated in the input. ksh93 The following options are supported by the uniq built-in command is ksh93: -c Outputs the number of times each line occurred along with the line. --count -d Outputs only duplicate lines. --repeated | duplicates -D Outputs all duplicate lines as a group with an empty line delimiter specified by delimit. --all-repeated[=delimit] Specify delimit as one of the following: none Do not delimit duplicate groups. prepend Prepend an empty line before each group. separate Separate each group with an empty line. The value for delimit can be omitted. The default value is none. -f Skips over fields number of fields before checking for uniqueness. A field is the minimal string matching the --skip-fields=fields BRE [[:blank:]]*[^[:blank:]]*. -i Ignore case in comparisons. --ignore-case +m Equivalent to the -s chars option, with chars set to m. -n Equivalent to the -f fields option, with fields set to n. -s Skips over chars number of characters before checking for uniqueness. --skip-chars=chars If specified with the -f option, the first chars after the first fields are ignored. If the chars specifies more characters than are on the line, an empty string is used for comparison. -u Outputs unique lines. --uniq -w Skips over any specified fields and characters, then compares chars number of characters. --check-chars=chars OPERANDS
The following operands are supported: input_file A path name of the input file. If input_file is not specified, or if the input_file is -, the standard input is used. output_file A path name of the output file. If output_file is not specified, the standard output is used. The results are unspecified if the file named by output_file is the file named by input_file. EXAMPLES
Example 1 Using the uniq Command The following example lists the contents of the uniq.test file and outputs a copy of the repeated lines. example% cat uniq.test This is a test. This is a test. TEST. Computer. TEST. TEST. Software. example% uniq -d uniq.test This is a test. TEST. example% The next example outputs just those lines that are not repeated in the uniq.test file. example% uniq -u uniq.test TEST. Computer. Software. example% The last example outputs a report with each line preceded by a count of the number of times each line occurred in the file: example% uniq -c uniq.test 2 This is a test. 1 TEST. 1 Computer. 2 TEST. 1 Software. example% ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of uniq: LANG, LC_ALL, LC_CTYPE, LC_MES- SAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 Successful completion. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: /usr/bin/uniq +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |Enabled | +-----------------------------+-----------------------------+ |Interface Stability |Committed | +-----------------------------+-----------------------------+ |Standard |See standards(5). | +-----------------------------+-----------------------------+ ksh93 +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsu | +-----------------------------+-----------------------------+ |Interface Stability |See below. | +-----------------------------+-----------------------------+ The ksh93 built-in binding to /bin and /usr/bin is Volatile. The built-in interfaces are Uncommitted. SEE ALSO
comm(1), ksh93(1), , pcat(1), sort(1), uncompress(1), attributes(5), environ(5), standards(5) SunOS 5.11 13 Mar 2008 uniq(1)
All times are GMT -4. The time now is 06:38 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy