Finding unique entries without sorting


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding unique entries without sorting
# 1  
Old 02-02-2010
Finding unique entries without sorting

Hi Guys,

I have two files that I am using:

File1 is as follows:

Code:
 
wwe
khfgv
jfo
jhgfd
hoaha
hao
lkahe

This is like a master file which has entries in the order which I want.

File 2 looks like this:

Code:
wwe
khfgv
jfo
wwe
jhgfd
wwe
wwe
hoaha
hao
lkahe
hoaha
wwe
hao

So I want to parse the second file and count the occurence of each of the entries in the second file. Then I want a third file which has the following: (the number adjacent to each entry is the number of times that entry has occured in the file)

Code:
wwe  5
khfgv 1
jfo     1
jhgfd  1
hoaha 2
hao    2
lkahe  1

I tried sort|uniq-c but that reorders the file. Is there an easier way to find unique entries without sorting the file?

Thanks in advance.
# 2  
Old 02-02-2010
Code:
awk 'NR==FNR{a[$0]++;next}a[$0]{print $0, a[$0]}' file2 file1

# 3  
Old 02-02-2010
Quote:
Originally Posted by Franklin52
Code:
awk 'NR==FNR{a[$0]++;next}a[$0]{print $0, a[$0]}' file2 file1

Hi Franklin,

Can you please explain briefly how does this part work in your code
Code:
awk 'NR==FNR{a[$0]++;next}.....

thanks in advance
# 4  
Old 02-02-2010
Or in a perhaps more familiar way, you could try something like this:


Code:
> file3.txt

cat file1.txt | while read line
do
        occurences=`grep -c "$line" file2.txt`
        echo  "$line $occurences" >> file3.txt
done

Obviously not as concise as the awk version but maybe a little easier to understand if you're a beginner.
# 5  
Old 02-02-2010
Quote:
Originally Posted by EAGL€
Hi Franklin,

Can you please explain briefly how does this part work in your code
Code:
awk 'NR==FNR{a[$0]++;next}.....

thanks in advance
Code:
awk 'NR==FNR{a[$0]++;next}

If we read file2, increase array a[$0]. This is how it works:

Code:
line 1: a[wwe]++ == 1
line 2: a[khgv]++ == 1
line 3: a[jfo]++ == 1
line 4: a[wwe]++ == 2
line 5: a[jhgfd]++ == 1
line 6: a[wwe]++ == 3
line 7: a[wwe]++ == 4
.
.

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sorting unique by column

I am trying to sort, do uniq by 1st column and report this 4 columns tab delimiter table , eg chr10:112174128 rs2255141 2E-10 Cholesterol, total chr10:112174128 rs2255141 7E-16 LDL chr10:17218291 rs10904908 3E-11 HDL Cholesterol chr10:17218291 rs970548 8E-9 TG... (4 Replies)
Discussion started by: fat
4 Replies

2. Shell Programming and Scripting

Sorting out unique values from output of for loop.

Hi , i have a belwo script which is used to get sectors per track value extarcted from Solaris machine: for DISK in /dev/dsk/c*t*d*s*; do value=`prtvtoc "$DISK" | sed -n -e '/Dimensions/,/Flags/{/Dimensions/d; /Flags/d; p; }' | sed -n -e '/sectors\/track/p'`; if ; then echo... (4 Replies)
Discussion started by: omkar.jadhav
4 Replies

3. UNIX for Dummies Questions & Answers

Sorting and saving values based on unique entries

Hi all, I wanted to save the values of a file that contains unique entries based on a specific column (column 4). my sample file looks like the following: input file: 200006-07file.txt 145 35 10 3 147 35 12 4 146 36 11 3 145 34 12 5 143 31 15 4 146 30 14 5 desired output files:... (5 Replies)
Discussion started by: ida1215
5 Replies

4. Shell Programming and Scripting

Finding unique values in a hash (Perl)

Hi, I have a hash with unique keys associated with some data. my %FINALcontigs = ( 'mira_rep_c765:119reads**', 'ctctactggaagactgac', 'mira_rep_c7454:54reads**', 'atggatactgcgctgttgctaactactgga', 'mira_rep_c6803:12reads**', 'atcgactggatgcagggttgtggtttcta', ... (2 Replies)
Discussion started by: jdilts
2 Replies

5. Shell Programming and Scripting

Finding the number of unique words in a file

find the number of unique words in a file using sort com- mand. (7 Replies)
Discussion started by: abhikamune
7 Replies

6. UNIX for Dummies Questions & Answers

need help sorting/deleting non-unique things

I don't really know much about UNIX commands, so if someone could help me understand how to do this, I'd really appreciate it. I have a text file with data that looks like this (filename: numbers.txt): 1 1 1 1 1 1 1 1 1 2 1 1_2 2_1 1 1 1 1 1 1 1 1 2 1 2 1_2 2_1 1 1 1 1 1 1 1 1 2 1 2 1_2 2_1... (12 Replies)
Discussion started by: zac100
12 Replies

7. UNIX for Dummies Questions & Answers

Sorting with unique piping for a lot of files

Hi power user, if I have this file: file1.txt: 1111 1111 2222 2222 3333 3333 3333 4444 4444 4444 when I run the sort file1.txt | uniq > data1.txt the result is (2 Replies)
Discussion started by: anjas
2 Replies

8. UNIX for Dummies Questions & Answers

Finding Unique strings which match pattern

I need to grep for a pattern in a file. Files are huge and have several repeated occurances of the strings which match pattern. I just need the strings which contain the pattern in the output. For eg. The contents of my file are as follows. The pattern I want to match by is ABCD ... (5 Replies)
Discussion started by: tektips
5 Replies

9. Shell Programming and Scripting

Finding unique reocrds at a particular field

I have a pipe delimited flat file. I want to grep the records that are unique in the 4th field and repeat only once in the file for e.g.. if the file contains this 3 records i want to get the o/p as: I just gave a sample here and the file is huge one and i cant just grep from the... (7 Replies)
Discussion started by: dsravan
7 Replies

10. Shell Programming and Scripting

sorting file and unique commnad..

hello everyone.. I was wondering is there a effective way to sort file that contains colomns and numeric one. file 218900012192 8938929 8B8DF3664 1E7E2D59D5 0000 26538 1234 74024415 218900012979 8938929 8B8DF3664 1E7E2D59D5 0000 26538 1234 74024415 218900012992 8938929 8B8DF3664... (2 Replies)
Discussion started by: amon
2 Replies
Login or Register to Ask a Question