match similar rows. uniq?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers match similar rows. uniq?
# 1  
Old 03-18-2008
match similar rows. uniq?

hi

i have data which is in two columns (such as below). i need to compare two rows against each other and if one row matches the other row (except for different case), and their values in the second column are different, then it prints out one of the rows (either is fine).

here is an example:

Code:
BaLL   1
bAt     2
foo     1
FOO    2
dog     1
DoG    2

the output for this would be:
Code:
foo
dog

im quite sure that uniq is what i should be using, and i have already used it once to remove any exact duplicates ie:
Code:
ball 1
ball 1
ball 2

becomes:
ball1
ball2

however, from here im not sure where to go
# 2  
Old 03-18-2008
Code:
awk '!y[tolower($1),$2]++&&x[tolower($1)]++==1' file

Use nawk or /usr/xpg4/bin/awk on Solaris.
# 3  
Old 03-18-2008
thanks for the response but that is WAY over my head and i would prefer to do this in a way that i would actually understand. is it possible to do with uniq?
# 4  
Old 03-18-2008
Hi.

The GNU/Linux version of uniq -- uniq (coreutils) 5.2.1 -- has these options (among others):
Code:
       -f, --skip-fields=N
              avoid comparing the first N fields

       -i, --ignore-case
              ignore differences in case when comparing

       -t, --separator=SEP
              use SEParator to delimit fields

       -W, --check-fields=N
              compare no more than N fields in lines

However, if you have a fewer-featured uniq, then the solution from radoulov would be useful. I'm sure he'd be willing to explain it if you asked politely ... cheers, drl
# 5  
Old 03-18-2008
if in case your uniq supports this:
Code:
       -i, --ignore-case
              ignore differences in case when comparing
       -w, --check-chars=N
              compare no more than N characters in lines

Code:
uniq -i -w 6 filename

limitation is that if words are more than 6 characters long, this won't work; so you have to adjust -w 6 accordingly
# 6  
Old 03-18-2008
How about:

Code:
cat file | sort | uniq -id

The "i" stands for case-insensitive, and the "d" means it will only return records which were duplicates.

ShawnMilo
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To group the text (rows) by similar columns-names in a file

As part of some report generation, I've written a script to fetch the values from DB. But, unluckily, for certain Time ranges(1-9.99,10-19.99 etc), I don't have data in DB. In such cases, I would like to write zero (0) instead of empty. The desired output will be exported to csv file. ... (1 Reply)
Discussion started by: kumar_karpuram
1 Replies

2. Shell Programming and Scripting

Transposing rows to columns with multiple similar lines

Hi, I am trying to transpose rows to columns for thousands of records. The problem is there are records that have the same lines that need to be separated. the input file as below:- ID 1A02_HUMAN AC P01892; O19619; P06338; P10313; P30444; P30445; P30446; P30514; AC Q29680; Q29837;... (2 Replies)
Discussion started by: redse171
2 Replies

3. Shell Programming and Scripting

Uniq or sort -u or similar only between { }

Hi ! I am trying to remove doubbled entrys in a textfile only between delimiters. Like that example but i dont know how to do that with sort or similar. input: { aaa aaa } { aaa aaa } output: { aaa } { (8 Replies)
Discussion started by: fugitivus
8 Replies

4. Shell Programming and Scripting

Rows to Columns with match criteria

Hello Friends, I have a input file having hundreds of rows. I want them to translate in to columns if column 1 is same. Input data: zp06 xxx zp06 rrr zp06 hhh zp06 aaa zp06 ggg zp06 qwer zp06 ser zl11 old3 zl11 old4 zl11 old5 zl11 old6 zl11 old7 zm14 luri zm14 body zm14 ucp (9 Replies)
Discussion started by: suresh3566
9 Replies

5. Shell Programming and Scripting

Perl match multiple numbers from a variable similar to egrep

I want to match the number exactly from the variable which has multiple numbers seperated by pipe symbol similar to search in egrep.below is the code which i tried #!/usr/bin/perl my $searchnum = $ARGV; my $num = "148|1|0|256"; print $num; if ($searchnum =~ /$num/) { print "found"; }... (2 Replies)
Discussion started by: kar_333
2 Replies

6. Shell Programming and Scripting

match sentence and word adn fetch similar words in alist

Hi all, I have ot match sentence list and word list anf fetch similar words in a separate file second file with 2 columns So I want the output shuld be 2 columns like this (3 Replies)
Discussion started by: manigrover
3 Replies

7. UNIX for Dummies Questions & Answers

[diff] hide missing rows, show similar

Hi all! Having the following two csv files: file1 AAA;0000;RED CCC;9900;GREEN file2 AAA;0000;BLACK BBB;0099;BLU What's the correct syntax to hide only the missing rows (BBB,CCC) and show the rows that differ only with last field? I expect something like this: diff <options> file1... (2 Replies)
Discussion started by: Evan
2 Replies

8. Shell Programming and Scripting

Awk match rows

Hi, I am pretty new to awk. I have a text file of the following style a b c d e f g h i 1 a b c d e f g h i 2 a b c d e f g h i 3 j k l m n o p q r 4 s t u v w x y z # 5 s t u v w x y z #7 I want the minimum of 10th column if the first 9 columns match with its before and after... (6 Replies)
Discussion started by: jacobs.smith
6 Replies

9. Shell Programming and Scripting

printing 3 files side by side based on similar values in rows

Hi I'm trying to compare 3 or more files based on similar values and outputting them into 3 columns. For example: file1 ABC DEF GHI file2 DEF DER file3 ABC DER The output should come out like this file1 file2 file3 ABC ABC (4 Replies)
Discussion started by: zerofire123
4 Replies

10. Shell Programming and Scripting

merge similar rows

I have a large file (10M lines) that contains two columns: a frequency and a string, ex: 3 aaaaa 4 bbbbb 2 ccccc 5 aaaaa 1 ddddd 4 ccccc I need to merge the lines whose string part is the same, while updating the frequency. The output should look like this: 8 aaaaa 4 bbbbb 5 ccccc... (2 Replies)
Discussion started by: tootles564
2 Replies
Login or Register to Ask a Question