Delete duplicate lines... with a twist!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete duplicate lines... with a twist!
# 8  
Old 11-23-2011
Sorry, mentioning of '*' in my last post is irrelevant for problem/solution.

Again, I believe your code is way to go, its only problem is that when searches for duplicates it uses only a-z for its criteria instead a-z and 0-9.

I've been trying to improve your code on my own and I came up with:

Code:
awk '{s=tolower($0);gsub("[^[:alnum:]]","",s);x[s]=$0} END {for(i in x) print x[i]}' file

which reduces my question file from 55983 lines to 40907 (it doesn't delete Algebra: and similar entries from file) and I'm quite happy with it.



edit: tyler's perl code just eats too much lines, have no idea why...

Last edited by shadowww; 11-23-2011 at 03:52 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Delete duplicate like pattern lines

Hi I need to delete duplicate like pattern lines from a text file containing 2 duplicates only (one being subset of the other) using sed or awk preferably. Input: FM:Chicago:Development FM:Chicago:Development:Score SR:Cary:Testing:Testcases PM:Newyork:Scripting PM:Newyork:Scripting:Audit... (6 Replies)
Discussion started by: tech_frk
6 Replies

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

3. Shell Programming and Scripting

Delete duplicate rows

Hi, This is a followup to my earlier post him mno klm 20 76 . + . klm_mango unix_00000001; alp fdc klm 123 456 . + . klm_mango unix_0000103; her tkr klm 415 439 . + . klm_mango unix_00001043; abc tvr klm 20 76 . + . klm_mango unix_00000001; abc def klm 83 84 . + . klm_mango... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

6. UNIX for Dummies Questions & Answers

How to delete partial duplicate lines unix

hi :) I need to delete partial duplicate lines I have this in a file sihp8027,/opt/cf20,1980182 sihp8027,/opt/oracle/10gRelIIcd,155200016 sihp8027,/opt/oracle/10gRelIIcd,155200176 sihp8027,/var/opt/ERP,10376312 and need to leave it like this: sihp8027,/opt/cf20,1980182... (2 Replies)
Discussion started by: C|KiLLeR|S
2 Replies

7. UNIX for Dummies Questions & Answers

Delete lines with duplicate strings based on date

Hey all, a relative bash/script newbie trying solve a problem. I've got a text file with lots of lines that I've been able to clean up and format with awk/sed/cut, but now I'd like to remove the lines with duplicate usernames based on time stamp. Here's what the data looks like 2007-11-03... (3 Replies)
Discussion started by: mattv
3 Replies

8. UNIX for Dummies Questions & Answers

How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file. I have a file having huge number of lines. i want to remove selected lines in it. And also if there exists duplicate lines, I want to delete the rest & just keep one of them. Please help me with any unix commands or even fortran... (7 Replies)
Discussion started by: reva
7 Replies

9. UNIX for Dummies Questions & Answers

Delete duplicate lines and print to file

OK, I have read several things on how to do this, but can't make it work. I am writing this to a vi file then calling it as an awk script. So I need to search a file for duplicate lines, delete duplicate lines, then write the result to another file, say /home/accountant/files/docs/nodup ... (2 Replies)
Discussion started by: bfurlong
2 Replies

10. Shell Programming and Scripting

delete semi-duplicate lines from file?

Ok here's what I'm trying to do. I need to get a listing of all the mountpoints on a system into a file, which is easy enough, just using something like "mount | awk '{print $1}'" However, on a couple of systems, they have some mount points looking like this: /stage /stand /usr /MFPIS... (2 Replies)
Discussion started by: paqman
2 Replies
Login or Register to Ask a Question
CQTEST(8C)																CQTEST(8C)

NAME
cqtest - HylaFAX copy quality checking test program SYNOPSIS
/usr/sbin/cqtest [ options ] input.tif DESCRIPTION
cqtest is a program for testing the copy quality checking support in the HylaFAX software (specifically, in the faxgetty(8C) program). cqtest takes a TIFF/F (TIFF Class F) file and generates a new TIFF/F file that is a copy of the input file, but with any erroneous scan- lines replaced/regenerated. In addition, cqtest prints diagnostic messages describing its actions and indicates whether the input data has acceptable copy quality according to the copy quality checking threshold parameters. Options are provided for specifying copy quality checking threshold parameters OPTIONS
-m badlines Set the maximum consecutive bad lines of data that may appear in each acceptable page of input data. This is equivalent to the MaxConsecutiveBadLines configuration parameter; c.f. hylafax-config(5F). By default cqtest accepts no more than 5 con- secutive bad lines in a page. -o file Write output to file. By default output is written to the file cq.tif. -p %goodlines Set the minimum percentage of ``good lines'' of data that may appear in acceptable page of input data. A line is good if it decodes without error to a row of pixels that is the expected width. This is equivalent to the PercentGoodLines configura- tion parameter; c.f. hylafax-config(5F). By default cqtest requires that 95% of the rows of each page be good. EXAMPLES
The following shows a multi-page, high-resolution document with a single error on each page. Each page has acceptable copy quality using the default threshold parameters. hyla% /usr/sbin/cqtest ~/tiff/pics/faxix.tif 1728 x 297, 7.7 line/mm, 1-D MH, lsb-to-msb RECV/CQ: Bad 1D pixel count, row 245, got 1616, expected 1728 RECV: 2234 total lines, 1 bad lines, 1 consecutive bad lines 1728 x 297, 7.7 line/mm, 1-D MH, lsb-to-msb RECV/CQ: Bad 1D pixel count, row 148, got 3023, expected 1728 RECV: 2234 total lines, 1 bad lines, 1 consecutive bad lines 1728 x 297, 7.7 line/mm, 1-D MH, lsb-to-msb RECV/CQ: Bad 1D pixel count, row 151, got 1722, expected 1728 RECV: 2234 total lines, 1 bad lines, 1 consecutive bad lines 1728 x 297, 7.7 line/mm, 1-D MH, lsb-to-msb RECV/CQ: Bad 1D pixel count, row 148, got 1776, expected 1728 RECV: 2234 total lines, 1 bad lines, 1 consecutive bad lines SEE ALSO
faxgetty(8C), hylafax-config(5F) October 3, 1995 CQTEST(8C)