Remove Doubles Without Sort?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Remove Doubles Without Sort?
# 1  
Old 12-11-2012
Remove Doubles Without Sort?

Hi!
I have concatenated two files which are wordlists, i.e., one word per line. The new file contains some doubles, but I cannot use sort and uniq as I need to keep the sort order that it is already in, which is not alphabetical, and uniq only compares adjacent lines, and the doubles are not on adjacent lines. Is there another simple way to remove doubles without altering the sort order? Unfortunately, there is no common pattern I can use to pick them out.
# 2  
Old 12-11-2012
Code:
awk '!arr[$0]++' wordlist_file

This User Gave Thanks to Yoda For This Post:
# 3  
Old 12-11-2012
Quote:
Originally Posted by bipinajith
Code:
awk '!arr[$0]++' wordlist_file

Hey bipinajith, thanks for your reply! Would you mind explaining how that pattern works? I thought I knew a little about regexes, but I've never seen anything like that.
# 4  
Old 12-11-2012
This User Gave Thanks to rdcwayx For This Post:
# 5  
Old 12-11-2012
Quote:
Originally Posted by sudon't
I thought I knew a little about regexes, but I've never seen anything like that.
I'd be more worried if you had, as it's not a regex. It's more like C than anything.

It's an array with a string as the index. It checks if it's zero, then adds to it. The first time the index appears, it will print, the next times it won't.
This User Gave Thanks to Corona688 For This Post:
# 6  
Old 12-11-2012
Quote:
Originally Posted by rdcwayx
Whew! I kinda think I get it. At least, until I try to type out my own explanation. You know, I think I'm going to read something about awk and come back tomorrow.
# 7  
Old 12-12-2012
This User Gave Thanks to jim mcnamara For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Concatenate and sort to remove duplicates

Following is the input. 1st and 3rd block are same(block starts here with '*' and ends before blank line) , 2nd and 4th blocks are also the same: cat <file> * Wed Feb 24 2016 Tariq Saeed <tariq.x.saeed@mail.com> 2.0.7-1.0.7 - add vmcore dump support for ocfs2 * Mon Jun 8 2015 Brian Maly... (4 Replies)
Discussion started by: Paras Pandey
4 Replies

2. Shell Programming and Scripting

Sort and Remove duplicates

Here is my task : I need to sort two input files and remove duplicates in the output files : Sort by 13 characters from 97 Ascending Sort by 1 characters from 96 Ascending If duplicates are found retain the first value in the file the input files are variable length, convert... (4 Replies)
Discussion started by: ysvsr1
4 Replies

3. Shell Programming and Scripting

Bash - remove duplicates without sort

I need to use bash to remove duplicates without using sort first. I can not use: cat file | sort | uniq But when I use only cat file | uniq some duplicates are not removed. (4 Replies)
Discussion started by: locoroco
4 Replies

4. UNIX for Dummies Questions & Answers

Grep words with X doubles only

Hi! I'm trying to figure out how to find words with X number of doubles, only. I'm searching a dictionary, (one word per line). For instance, if you want to find words containing only one pair of double letters, you could do something like this: egrep '(.)\1' wordlist.txt |egrep -v '(.)\1.*(.)\2'... (3 Replies)
Discussion started by: sudon't
3 Replies

5. Shell Programming and Scripting

awk syntax mistake doubles desired output

I am trying to add a line to a BASH shell script to print out a large variable length table on a web page. I am very new to this obviously, but I tried this with awk and it prints out every line twice. What I am doing wrong? echo "1^2^3%4^5^6%7^8^9%" | awk 'BEGIN { RS="%"; FS="^"; } {for (i =... (6 Replies)
Discussion started by: awknewb123
6 Replies

6. Shell Programming and Scripting

remove duplicates and sort

Hi, I'm using the below command to sort and remove duplicates in a file. But, i need to make this applied to the same file instead of directing it to another. Thanks (6 Replies)
Discussion started by: dvah
6 Replies

7. UNIX Desktop Questions & Answers

need help writing a program to look for doubles

to determine if two two doubles are equal, we check to see if their absolute difference is very close to zero. . .if two numbers are less than .00001 apart, theyre equal. keep a count field in each record (as you did in p5). once the list is complete, ask the user to see if an element is on... (2 Replies)
Discussion started by: rickym2626
2 Replies

8. Shell Programming and Scripting

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (19 Replies)
Discussion started by: svenkatareddy
19 Replies

9. Solaris

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (2 Replies)
Discussion started by: svenkatareddy
2 Replies

10. Programming

long doubles

hey there, i've been trrying to calculate the first 10000 fibonacci numbers using a long double. weird thing is that from a certain value it returns Inf. i'm declaring the vars as long double var; and printing them to a file using: fprintf(filepointer, "%.0Ld\n", var); am i doing... (1 Reply)
Discussion started by: crashnburn
1 Replies
Login or Register to Ask a Question