Sponsored Content
Top Forums Shell Programming and Scripting Bash - remove duplicates without sort Post 302849313 by Don Cragun on Saturday 31st of August 2013 08:18:05 PM
Old 08-31-2013
Obviously, the simple way to do this is:
Code:
sort -u file

but you tell us we can't do that without saying why. Is there a requirement to output lines in the same order they were in in the input file? If so, is it important to keep a particular one of the duplicated lines in the output? Or, do you want every line that had one or more duplicates removed from the output?

In the 2.5 years you've been a member of this forum, there have seen dozens of examples using awk to do this where the 1st duplicated input line is kept or the last duplicated input line is kept, or all duplicated lines are removed. If keeping the same order is important, it is more difficult to keep the last duplicate than it is to keep the 1st duplicate.

So, what are the real requirements?
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates [sort , uniq]

Hey Guys, I have file which looks like this, Contig201#numbPA Contig1452#nmdynD6PA dm022p15.r#CG6461PA dm005e16.f#SpatPA IGU001_0015_A06.f#CG17593PA I need to remove duplicates based on the chracter matching upto '#'. for example if we consider this.. Contig201#numbPA... (4 Replies)
Discussion started by: sharatz83
4 Replies

2. Shell Programming and Scripting

Sort, Uniq, Duplicates

Input File is : ------------- 25060008,0040,03, 25136437,0030,03, 25069457,0040,02, 80303438,0014,03,1st 80321837,0009,03,1st 80321977,0009,03,1st 80341345,0007,03,1st 84176527,0047,03,1st 84176527,0047,03, 20000735,0018,03,1st 25060008,0040,03, I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies

3. UNIX for Dummies Questions & Answers

removing duplicates and sort -k

Hello experts, I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter sort -u -k 2,2 File.csv > Output.csv File.csv File Name|Document Name|Document Title|Organization Word Doc 1.doc|Word Document|Sample... (3 Replies)
Discussion started by: orahi001
3 Replies

4. Shell Programming and Scripting

bash ps; remove the header, sort and reinsert

Hi, I'm ssh'ing into a server using ruby and sending a one-liner to retrieve the output of the 'ps aux' command. So far, this is what I have: ps aux | sort -r -n -k3 | sed -e '1s/^/this is first\n/' | head -n10 With this I can insert a line at position 1, but I would rather extract the... (3 Replies)
Discussion started by: gekeha
3 Replies

5. Shell Programming and Scripting

remove duplicates and sort

Hi, I'm using the below command to sort and remove duplicates in a file. But, i need to make this applied to the same file instead of directing it to another. Thanks (6 Replies)
Discussion started by: dvah
6 Replies

6. Shell Programming and Scripting

bash - remove duplicates

I need to use a bash script to remove duplicate files from a download list, but I cannot use uniq because the urls are different. I need to go from this: http://***/fae78fe/file1.wmv http://***/39du7si/file1.wmv http://***/d8el2hd/file2.wmv http://***/h893js3/file2.wmv to this: ... (2 Replies)
Discussion started by: locoroco
2 Replies

7. Shell Programming and Scripting

Sort data by date first and then remove duplicates

Hi , I have below data inside a file named ref.psv . I want to create a shell script which will do the below 2 points : (1) sort the file content first based on the latest date which is the last column in the file (actual file its the 175th column) (2)after sorting the file based on latest date... (3 Replies)
Discussion started by: samrat dutta
3 Replies

8. Shell Programming and Scripting

Sort and Remove duplicates

Here is my task : I need to sort two input files and remove duplicates in the output files : Sort by 13 characters from 97 Ascending Sort by 1 characters from 96 Ascending If duplicates are found retain the first value in the file the input files are variable length, convert... (4 Replies)
Discussion started by: ysvsr1
4 Replies

9. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker... (4 Replies)
Discussion started by: gnnsprapa
4 Replies

10. Shell Programming and Scripting

Concatenate and sort to remove duplicates

Following is the input. 1st and 3rd block are same(block starts here with '*' and ends before blank line) , 2nd and 4th blocks are also the same: cat <file> * Wed Feb 24 2016 Tariq Saeed <tariq.x.saeed@mail.com> 2.0.7-1.0.7 - add vmcore dump support for ocfs2 * Mon Jun 8 2015 Brian Maly... (4 Replies)
Discussion started by: Paras Pandey
4 Replies
xgettext(1)							   User Commands						       xgettext(1)

NAME
xgettext - extract gettext call strings from C programs SYNOPSIS
xgettext [-ns] [-a [-x exclude-file]] [-c comment-tag] [-d default-domain] [-j] [-m prefix] [-M suffix] [-p pathname] -| filename... xgettext -h DESCRIPTION
The xgettext utility is used to automate the creation of portable message files (.po). A .po file contains copies of "C" strings that are found in ANSI C source code in filename or the standard input if `-' is specified on the command line. The .po file can be used as input to the msgfmt(1) utility, which produces a binary form of the message file that can be used by application during run-time. xgettext writes msgid strings from gettext(3C) calls in filename to the default output file messages.po. The default output file name can be changed by -d option. msgid strings in dgettext() calls are written to the output file domainname.po where domainname is the first parameter to the dgettext() call. By default, xgettext creates a .po file in the current working directory, and each entry is in the same order that the strings are extracted from filenames. When the -p option is specified, the .po file is created in the pathname directory. An existing .po file is overwritten. Duplicate msgids are written to the .po file as comment lines. When the -s option is specified, the .po is sorted by the msgid string, and all duplicated msgids are removed. All msgstr directives in the .po file are empty unless the -m option is used. OPTIONS
The following options are supported: -n Add comment lines to the output file indicating file name and line number in the source file where each extracted string is encountered. These lines appear before each msgid in the following format: # # File: filename, line: line-number -s Generate output sorted by msgids with all duplicate msgids removed. -a Extract all strings, not just those found in gettext(3C), and dgettext() () calls. Only one .po file is created. -c comment-tag The comment block beginning with comment-tag as the first token of the comment block is added to the output .po file as # delimited comments. For multiple domains, xgettext directs comments and messages to the prevailing text domain. -d default-domain Rename default output file from messages.po to default-domain .po. -j Join messages with existing message files. If a .po file does not exist, it is created. If a .po file does exist, new messages are appended. Any duplicate msgids are commented out in the resulting .po file. Domain directives in the existing .po file are ignored. Results not guaranteed if the existing message file has been edited. -m prefix Fill in the msgstr with prefix. This is useful for debugging purposes. To make msgstr identical to msgid, use an empty string ("") for prefix. -M suffix Fill in the msgstr with suffix. This is useful for debugging purposes. -p pathname Specify the directory where the output files will be placed. This option overrides the current working directory. -x exclude-file Specify a .po file that contains a list of msgids that are not to be extracted from the input files. The format of exclude-file is identical to the .po file. However, only the msgid directive line in exclude-file is used. All other lines are simply ignored. The -x option can only be used with the -a option. -h Print a help message on the standard output. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWloc | +-----------------------------+-----------------------------+ SEE ALSO
msgfmt(1), gettext(3C), attributes(5) NOTES
xgettext is not able to extract cast strings, for example ANSI C casts of literal strings to (const char *). This is unnecessary anyway, since the prototypes in <libintl.h> already specify this type. In messages and translation notes, lines greater than 2048 characters are truncated to 2048 characters and a warning message is printed to stderr. SunOS 5.11 23 Mar 1999 xgettext(1)
All times are GMT -4. The time now is 11:08 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy