Sponsored Content
Top Forums Shell Programming and Scripting Deleting lines containing duplicated strings Post 302969823 by jypark22 on Monday 28th of March 2016 11:29:43 PM
Old 03-29-2016
Deleting lines containing duplicated strings

Dear all,

I always appreciate your help.

I would like to delete lines containing duplicated strings in the second column.

test.txt
Code:
658	invert_d2e_q_reg_0_/Qalu_ecl_zlow_e	0.825692
659	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[31]	0.825692
660	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[63]	0.825692
661	invert_d2e_q_reg_0_/Qalu_ecl_zhigh_e	0.825692
665	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[62]	0.825692
666	invert_d2e_q_reg_0_/Qalu_ecl_zlow_e	0.825692
668	invert_d2e_q_reg_0_/Qalu_ecl_zhigh_e	0.825692
670	invert_d2e_q_reg_0_/Qalu_ecl_zhigh_e	0.825692
673	invert_d2e_q_reg_0_/Qalu_ecl_zlow_e	0.825692
675	invert_d2e_q_reg_0_/Qalu_ecl_zhigh_e	0.825692
677	invert_d2e_q_reg_0_/Qalu_ecl_zhigh_e	0.825692
678	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[27]	0.825692
679	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[27]	0.8120
.
.
.

output.txt
Code:
658	invert_d2e_q_reg_0_/Qalu_ecl_zlow_e	0.825692
659	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[31]	0.825692
660	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[63]	0.825692
661	invert_d2e_q_reg_0_/Qalu_ecl_zhigh_e	0.825692
665	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[62]	0.825692
678	invert_d2e_q_reg_0_/Qalu_byp_rd_data_e[27]	0.825692
.
.
.

I know sed can delete lines with predefined specific strings, but in my cases, I could not expect the strings are duplicated. Also, duplicated strings will be more than 1000.


I used “uniq” to do this job, but this does not work.
uniq -u -f 4 test.txt
(-u prints unique lines. -f skips the first 4 letters. )

Is there any way to do this with sed/awk/perl? Or please correct my uniq semantics.

Best,

Jaeyoung
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove duplicated lines without sort

Hi Just wondering whether or not I can remove duplicated lines without sort For example, I use the command who, which shows users who are logging on. In some cases, it shows duplicated lines of users who are logging on more than one terminal. Normally, I would do who | cut -d" " -f1 |... (6 Replies)
Discussion started by: lalelle
6 Replies

2. Shell Programming and Scripting

Help removing lines with duplicated columns

Hi Guys... Please Could you help me with the following ? aaaa bbbb cccc sdsd aaaa bbbb cccc qwer as you can see, the 2 lines are matched in three fields... how can I delete this pupicate ? I mean to delete the second one if 3 fields were duplicated ? Thanks (14 Replies)
Discussion started by: yahyaaa
14 Replies

3. UNIX for Dummies Questions & Answers

duplicated lines not recognized by sort and uniq

Hello all, I've got a strange behaviour of sort and uniq commands: they do not recognise apparently duplicated lines in a file (already sorted). The lines are identical by eye, but they must differ in smth, because when they are put in two files, those have slightly different size. What can make... (8 Replies)
Discussion started by: roussine
8 Replies

4. Shell Programming and Scripting

awk to count duplicated lines

We have an input file as follows: 2010-09-15-12.41.15 2010-09-15-12.41.15 2010-09-15-12.41.24 2010-09-15-12.41.24 2010-09-15-12.41.24 2010-09-15-12.41.24 2010-09-15-12.41.25 2010-09-15-12.41.26 2010-09-15-12.41.26 2010-09-15-12.41.26 2010-09-15-12.41.26 2010-09-15-12.41.26... (3 Replies)
Discussion started by: ux4me
3 Replies

5. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

6. UNIX for Dummies Questions & Answers

Removing duplicated lines??

Hi Guys.. I have a problem for some reason my database has copied everything 4 times. My Database looks like this: >BAC233456 rhjieaheiohjteo tjtjrj6jkk6k6 j54ju54jh54jh >ANI124365 afrhtjykulilil htrjykuk rtkjryky ukrykyrk >BAC233456 rhjieaheiohjteo tjtjrj6jkk6k6 j54ju54jh54jh... (6 Replies)
Discussion started by: Iifa
6 Replies

7. Shell Programming and Scripting

awk to insert duplicated lines

Dear All, Suppose I have a file: 1 1 1 1 2 2 2 2 3 3 3 3I want to insert new line under each old line so that the file would become: 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3How can this be accomplished using awk (or sed)? (5 Replies)
Discussion started by: littlewenwen
5 Replies

8. Shell Programming and Scripting

How to remove duplicated lines?

Hi, if i have a file like this: Query=1 a a b c c c d Query=2 b b b c c e . . . (7 Replies)
Discussion started by: the_simpsons
7 Replies

9. Shell Programming and Scripting

Deleting duplicated chunks in a file using awk/sed

Hi all, I'd always appreciate all helps from this site. I would like to delete duplicated chunks of strings on the same row(?). One chunk is comprised of four lines such as: path name starting point ending point voltage number I would like to delete duplicated chunks on the same... (5 Replies)
Discussion started by: jypark22
5 Replies
uniq(1) 							   User Commands							   uniq(1)

NAME
uniq - report or filter out repeated lines in a file SYNOPSIS
/usr/bin/uniq /usr/bin/uniq [-c | -d | -u] [-f fields] [-s char] [input_file [output_file]] /usr/bin/uniq [-c | -d | -u] [-n] [+ m] [input_file [output_file]] ksh93 uniq [-cdiu] [-D[delimit]] [-f fields] [-s chars] [-w chars] [input_file [output_file]] uniq [-cdiu] [-D[delimit]] [-n] [+m] [-w chars] [input_file [output_file]] DESCRIPTION
/usr/bin/uniq The uniq utility reads an input file comparing adjacent lines and writes one copy of each input line on the output. The second and succeed- ing copies of repeated adjacent input lines are not written. Repeated lines in the input are not detected if they are not adjacent. ksh93 The uniq built-in in ksh93 is associated with the /bin or /usr/bin path. It is invoked when uniq is executed without a pathname prefix and the pathname search finds a /bin/uniq or /usr/bin/uniq executable. uniq reads an input, comparing adjacent lines, and writing one copy of each input line on the output. The second and succeeding copies of the repeated adjacent lines are not written. If output_file is not specified, uniq writes to standard output. If input_file is not specified, or if input_file is -, uniq reads from standard input, and the start of the file is defined as the current offset. OPTIONS
/usr/bin/uniq The following options are supported by /usr/bin/uniq: -c Precedes each output line with a count of the number of times the line occurred in the input. -d Suppresses the writing of lines that are not repeated in the input. -f fields Ignores the first fields fields on each input line when doing comparisons, where fields is a positive decimal integer. A field is the maximal string matched by the basic regular expression: [[:blank:]]*[^[:blank:]]* If fields specifies more fields than appear on an input line, a null string is used for comparison. +m Equivalent to -s chars with chars set to m. -n Equivalent to -f fields with fields set to n. -s chars Ignores the first chars characters when doing comparisons, where chars is a positive decimal integer. If specified in conjunc- tion with the -f option, the first chars characters after the first fields fields is ignored. If chars specifies more charac- ters than remain on an input line, a null string is used for comparison. -u Suppresses the writing of lines that are repeated in the input. ksh93 The following options are supported by the uniq built-in command is ksh93: -c Outputs the number of times each line occurred along with the line. --count -d Outputs only duplicate lines. --repeated | duplicates -D Outputs all duplicate lines as a group with an empty line delimiter specified by delimit. --all-repeated[=delimit] Specify delimit as one of the following: none Do not delimit duplicate groups. prepend Prepend an empty line before each group. separate Separate each group with an empty line. The value for delimit can be omitted. The default value is none. -f Skips over fields number of fields before checking for uniqueness. A field is the minimal string matching the --skip-fields=fields BRE [[:blank:]]*[^[:blank:]]*. -i Ignore case in comparisons. --ignore-case +m Equivalent to the -s chars option, with chars set to m. -n Equivalent to the -f fields option, with fields set to n. -s Skips over chars number of characters before checking for uniqueness. --skip-chars=chars If specified with the -f option, the first chars after the first fields are ignored. If the chars specifies more characters than are on the line, an empty string is used for comparison. -u Outputs unique lines. --uniq -w Skips over any specified fields and characters, then compares chars number of characters. --check-chars=chars OPERANDS
The following operands are supported: input_file A path name of the input file. If input_file is not specified, or if the input_file is -, the standard input is used. output_file A path name of the output file. If output_file is not specified, the standard output is used. The results are unspecified if the file named by output_file is the file named by input_file. EXAMPLES
Example 1 Using the uniq Command The following example lists the contents of the uniq.test file and outputs a copy of the repeated lines. example% cat uniq.test This is a test. This is a test. TEST. Computer. TEST. TEST. Software. example% uniq -d uniq.test This is a test. TEST. example% The next example outputs just those lines that are not repeated in the uniq.test file. example% uniq -u uniq.test TEST. Computer. Software. example% The last example outputs a report with each line preceded by a count of the number of times each line occurred in the file: example% uniq -c uniq.test 2 This is a test. 1 TEST. 1 Computer. 2 TEST. 1 Software. example% ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of uniq: LANG, LC_ALL, LC_CTYPE, LC_MES- SAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 Successful completion. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: /usr/bin/uniq +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |Enabled | +-----------------------------+-----------------------------+ |Interface Stability |Committed | +-----------------------------+-----------------------------+ |Standard |See standards(5). | +-----------------------------+-----------------------------+ ksh93 +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsu | +-----------------------------+-----------------------------+ |Interface Stability |See below. | +-----------------------------+-----------------------------+ The ksh93 built-in binding to /bin and /usr/bin is Volatile. The built-in interfaces are Uncommitted. SEE ALSO
comm(1), ksh93(1), , pcat(1), sort(1), uncompress(1), attributes(5), environ(5), standards(5) SunOS 5.11 13 Mar 2008 uniq(1)
All times are GMT -4. The time now is 04:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy