Matching and reporting near-similar lines in a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Matching and reporting near-similar lines in a file
# 1  
Old 11-21-2011
Matching and reporting near-similar lines in a file

Hi,

I have a file with the lines as below:
Code:
C_10_A05_T7
C_10_A06_SP6
C_10_B05_SP6
C_10_B05_T7
C_10_B01_SP6
C_10_B01_T7
C_12_G07_SP6
C_12_G11_SP6
C_12_G11_T7
C_2_H18_T7
C_2_I02_SP6
C_2_I02_T7
C_2_I13_SP6
C_2_I17_SP6

The four segments of each line are connected by '_' symbols. I want to count and print those pair of lines which are similar in the first three segments and only differ in the last segment i.e. 'T7' and 'SP6'. For example for the above data the output would be:

4 pairs.
Code:
C_10_B05_SP6
C_10_B05_T7
C_10_B01_SP6
C_10_B01_T7
C_12_G11_SP6
C_12_G11_T7
C_2_I02_SP6
C_2_I02_T7

Thanks for your help.


Moderator's Comments:
Mod Comment How to use code tags

Last edited by Franklin52; 11-24-2011 at 03:41 AM.. Reason: Please use code tags for code and data samples, thank you
# 2  
Old 11-21-2011
Code:
 
$ nawk -F_ '{if(a==$3){print b;print $0}{a=$3;b=$0;next}}' inputfile
C_10_B05_SP6
C_10_B05_T7
C_10_B01_SP6
C_10_B01_T7
C_12_G11_SP6
C_12_G11_T7
C_2_I02_SP6
C_2_I02_T7

---------- Post updated at 12:00 PM ---------- Previous update was at 11:55 AM ----------

this will compare the first three fields
Code:
awk -F_ '{if(a==$1"_"$2"_"$3){print b;print $0}{a=$1"_"$2"_"$3;b=$0;next}}' inputfile

This User Gave Thanks to itkamaraj For This Post:
# 3  
Old 11-21-2011
Thanks for your reply. However, it seems I don't have 'nawk' in my MacOSX 10.6.8. Any idea how to run it or any 'awk' equivalent? Thanks.
# 4  
Old 11-21-2011
try awk instead of nawk
# 5  
Old 11-21-2011
Working Fine. Thanks a lot!
# 6  
Old 11-21-2011
Actually, it does not work fine. It will only work for even number of similar lines. Look here:
Code:
$ cat input
C_10_B05_SP6
C_10_B05_T7
C_10_B05_last
C_10_B01_SP6
C_10_B01_T7
C_12_G11_SP6
C_12_G11_T7
C_2_I02_SP6
C_2_I02_T7
$ awk -F_ '{if(a==$1"_"$2"_"$3){print b;print $0}{a=$1"_"$2"_"$3;b=$0;next}}' input
C_10_B05_SP6
C_10_B05_T7
C_10_B05_T7
C_10_B05_last
C_10_B01_SP6
C_10_B01_T7
C_12_G11_SP6
C_12_G11_T7
C_2_I02_SP6
C_2_I02_T7

Try this instead:
Code:
$ awk -F_ '!cnt[$1 $2 $3]{if(a)print a; a=$0};cnt[$1 $2 $3]++{a=a "\n" $0}END{  if(cnt[$1 $2 $3]>1) {print a} }' input
C_10_B05_SP6
C_10_B05_T7
C_10_B05_last
C_10_B01_SP6
C_10_B01_T7
C_12_G11_SP6
C_12_G11_T7
C_2_I02_SP6
C_2_I02_T7

To explain:
Code:
$ awk -F_ '
!cnt[$1 $2 $3]{
  if(a)  #dont print empty line at the beginning
    print a;  #print all stored lines
  a=$0  #remember first line
}
cnt[$1 $2 $3]++{  #increment cnt
  a=a "\n" $0  #if already encountered, append the line to 'a'
}
END{
  if(cnt[$1 $2 $3]>1)
    print a   #print last stored entry, if cnt > 1
}'


Last edited by mirni; 11-21-2011 at 04:56 AM.. Reason: comments
# 7  
Old 11-23-2011
Thanks for your reply mirni. Just two issues:
1) If I am not mistaken your code seems producing the output exactly as the input
2) On top of the criteria I described at the beginning of the post. what happens if the names/lines are not ordered. For example:
Code:
C_10_B05_SP6
C_10_B01_T7
C_10_B05_last
C_10_B01_SP6
C_10_B05_T7
C_12_G11_SP6
C_20_Z1_SP6
C_18_Y12_SP6
C_12_I02_T7
C_2_I02_SP6
C_2_G11_T7


Last edited by Franklin52; 11-24-2011 at 03:42 AM.. Reason: Code tags
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Reducing text file using similar lines

Hello, I am a java programmer but want to try unix for a purpose where I need to reduce a file using its first field.. Here is the sample data: admin;2;0;; admission;8;0;; aman;1;0;; caroline;0;4;; cook;0;4;; cook;2;0;; far;0;3;; far;1;5;; I am explaining the dataset first. There... (5 Replies)
Discussion started by: shekhar2010us
5 Replies

2. Shell Programming and Scripting

Join all the lines matching similar pattern

I am trying to Join all the lines matching similar pattern. Example ; I wanted to join all the lines which has sam to a single line. In next line, i wanted to have all the lines with jones to a single line....etc > cat sample.txt sam 2012/11/23 sam 2012/12/5 sam 2012/12/5 jones... (2 Replies)
Discussion started by: evrurs
2 Replies

3. Shell Programming and Scripting

removing lines with similar values from file

Hello, got a file with this structure: 33274 171030 02/29/2012 37897 P_GEH 2012-02-29 10:31:26 33275 171049 02/29/2012 38132 P_GEH 2012-02-29 10:35:27 33276 171058 02/29/2012 38515 P_GEH 2012-02-29 10:43:26 33277 170748 02/29/2012 40685 P_KOM ... (3 Replies)
Discussion started by: krecik28
3 Replies

4. Shell Programming and Scripting

extracting lines from a file with similar first name

consider i have two files cat onlyviews1.sql CREATE VIEW V11 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V22 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V33 AS (10 Replies)
Discussion started by: vivek d r
10 Replies

5. UNIX for Dummies Questions & Answers

merge lines within a file that start with a similar pattern

Hello! i have a text file.. which contains the data as follows i want to merge the declarations lines pertaining to one datatype in to a single line as follows i've searched the forum for help.. but couldn't find much help.. how can i do this?? (1 Reply)
Discussion started by: a_ba
1 Replies

6. Shell Programming and Scripting

remove one of each similar lines in a file

Hello folks I have a question for you gurus of sed or grep (maybe awk, but I would prefer the first two) I have a file (f1) that says: (actually, these are not numbers but md5sum, but for simplicity, let's assume these numbers.) 1 2 3 4 5And I have a file (f2) that says 1|a 1|b 1|c 2|d... (3 Replies)
Discussion started by: tukuyomi
3 Replies

7. Shell Programming and Scripting

Finding lines matching the Pattern and their previous lines in a file

Hi, I am trying to locate the occurences of certain pattern like 'Possible network disconnect' in a text file. I can get the actual lines matching the pttern using: grep -w 'Possible network disconnect' file_name. But I am more interested in getting the timing of these events which are... (7 Replies)
Discussion started by: sagarparadkar
7 Replies

8. Shell Programming and Scripting

Counting similar lines from file UNIX

I have a file which contains data as below: nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/common/index.jsf nbk1j7o pageName=/jsp/common/index.jsf nbk1wqe... (6 Replies)
Discussion started by: mohsin.quazi
6 Replies

9. Infrastructure Monitoring

Remove Similar Lines from a File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead... (4 Replies)
Discussion started by: Nysif Steve
4 Replies

10. Shell Programming and Scripting

How to sort a file and then print similar lines once

Hi! I have a trouble with the sort and the uniq. I know I have to use them, I just have trouble with putting them in the right order. I have a text file with unsorted lines (each line has a few words, the first word in the line is a number.). I need to sort this file in order to be... (6 Replies)
Discussion started by: shira
6 Replies
Login or Register to Ask a Question