Sponsored Content
Top Forums Shell Programming and Scripting Remove lines containing 2 or more duplicate strings Post 302964682 by RavinderSingh13 on Monday 18th of January 2016 05:49:09 AM
Old 01-18-2016
Hello martinsmith,

Could you please try this and let me know if this helps. I am ignoring case sensitivity here so it will match all kind of same words either they are in capital or small letters.
So let's say following is the Input_file:
Code:
One and a Two
Unix.com is the Best
This as a Line Line
Example duplicate sentence with the word DUPLICATE
UNIX is very good GOOD

Now following is the code for same.
Code:
awk 'BEGIN{IGNORECASE = 1} {for(i=1;i<=NF;i++){for(j=1;j<=NF;j++){if($j==$i){A[$i]++;}};if(A[$i]>1){for(i in A){delete A[i];next}}};print;for(i in A){delete A[i]}}'  Input_file

Output will be as follows.
Code:
One and a Two
Unix.com is the Best

Thanks,
R. Singh

Last edited by RavinderSingh13; 01-18-2016 at 06:51 AM.. Reason: Added a comment for more clarification about solution now.
This User Gave Thanks to RavinderSingh13 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies

2. UNIX for Dummies Questions & Answers

Delete lines with duplicate strings based on date

Hey all, a relative bash/script newbie trying solve a problem. I've got a text file with lots of lines that I've been able to clean up and format with awk/sed/cut, but now I'd like to remove the lines with duplicate usernames based on time stamp. Here's what the data looks like 2007-11-03... (3 Replies)
Discussion started by: mattv
3 Replies

3. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

5. Shell Programming and Scripting

Need to remove the duplicate lines from a log!!

Hello Folks, Can some one help me with the removal of duplicate lines from a log file and send it to another log file. It's bit complicated as two lines are same but only difference is the timestamp, but some lines are uniq. Line has been seperated by colon's. Log file:... (5 Replies)
Discussion started by: sim_je
5 Replies

6. Shell Programming and Scripting

remove duplicate lines with condition

hi to all Does anyone know if there's a way to remove duplicate lines which we consider the same only if they have the first and the second column the same? For example I have : us2333 bbb 5 us2333 bbb 3 us2333 bbb 2 and I want to get us2333 bbb 10 The thing is I cannot... (2 Replies)
Discussion started by: vlm
2 Replies

7. UNIX for Dummies Questions & Answers

Remove Duplicate Lines

Hi I need this output. Thanks. Input: TAZ YET FOO FOO VAK TAZ BAR Output: YET VAK BAR (10 Replies)
Discussion started by: tara123
10 Replies

8. Shell Programming and Scripting

Getting lines between two strings with duplicate set of data

if I have the following lines in a file app.log some lines here <AAAA> abc <id>123456789</id> ddd </AAAA>some lines here too <BBBB> abc <id>123456789</id> ddd </BBBB>some lines here too <AAAA> xyz <id>987654321</id> ssss </AAAA>some lines here again... How do I get the... (5 Replies)
Discussion started by: nariwithu
5 Replies

9. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

10. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies
xgettext(1)							   User Commands						       xgettext(1)

NAME
xgettext - extract gettext call strings from C programs SYNOPSIS
xgettext [-ns] [ -a [-x exclude-file]] [-c comment-tag] [-d default-domain] [-j] [-m prefix] [-M suffix] [-p pathname] -| filename... xgettext -h DESCRIPTION
The xgettext utility is used to automate the creation of portable message files (.po). A .po file contains copies of "C" strings that are found in ANSI C source code in filename or the standard input if `-' is specified on the command line. The .po file can be used as input to the msgfmt(1) utility, which produces a binary form of the message file that can be used by application during run-time. xgettext writes msgid strings from gettext(3C) calls in filename to the default output file messages.po. The default output file name can be changed by -d option. msgid strings in dgettext() calls are written to the output file domainname.po where domainname is the first parameter to the dgettext() call. By default, xgettext creates a .po file in the current working directory, and each entry is in the same order that the strings are extracted from filenames. When the -p option is specified, the .po file is created in the pathname directory. An existing .po file is overwritten. Duplicate msgids are written to the .po file as comment lines. When the -s option is specified, the .po is sorted by the msgid string, and all duplicated msgids are removed. All msgstr directives in the .po file are empty unless the -m option is used. OPTIONS
The following options are supported: -n Add comment lines to the output file indicating file name and line number in the source file where each extracted string is encountered. These lines appear before each msgid in the following format: # # File: filename, line: line-number -s Generate output sorted by msgids with all duplicate msgids removed. -a Extract all strings, not just those found in gettext(3C), and dgettext() () calls. Only one .po file is created. -c comment-tag The comment block beginning with comment-tag as the first token of the comment block is added to the output .po file as # delimited comments. For multiple domains, xgettext directs comments and messages to the prevailing text domain. -d default-domain Rename default output file from messages.po to default-domain .po. -j Join messages with existing message files. If a .po file does not exist, it is created. If a .po file does exist, new messages are appended. Any duplicate msgids are commented out in the resulting .po file. Domain directives in the existing .po file are ignored. Results not guaranteed if the existing message file has been edited. -m prefix Fill in the msgstr with prefix. This is useful for debugging purposes. To make msgstr identical to msgid, use an empty string ("") for prefix. -M suffix Fill in the msgstr with suffix. This is useful for debugging purposes. -p pathname Specify the directory where the output files will be placed. This option overrides the current working directory. -x exclude-file Specify a .po file that contains a list of msgids that are not to be extracted from the input files. The format of exclude-file is identical to the .po file. However, only the msgid directive line in exclude-file is used. All other lines are simply ignored. The -x option can only be used with the -a option. -h Print a help message on the standard output. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWloc | +-----------------------------+-----------------------------+ SEE ALSO
msgfmt(1), gettext(3C), attributes(5) NOTES
xgettext is not able to extract cast strings, for example ANSI C casts of literal strings to (const char *). This is unnecessary anyway, since the prototypes in <libintl.h> already specify this type. In messages and translation notes, lines greater than 2048 characters are truncated to 2048 characters and a warning message is printed to stderr. SunOS 5.10 23 Mar 1999 xgettext(1)
All times are GMT -4. The time now is 12:58 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy