Getting similar lines in two files


 
Thread Tools Search this Thread
Operating Systems Solaris Getting similar lines in two files
# 1  
Old 07-06-2017
Getting similar lines in two files

Hi,

I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no output listed.

This is the first file :

Code:
# cat sortvs1.csv
mastersam:0:MasterSAM user ID, 03072017
root:0:Super-User
daemon:1:
bin:2:
sys:3:
adm:4:Admin
uucp:5:uucp Admin
nuucp:9:uucp Admin
smmsp:25:SendMail Message Submission Program
listen:37:Network Admin
gdm:50:GDM Reserved UID
lp:71:Line Printer Admin
webservd:80:WebServer Reserved UID
postgres:90:PostgreSQL Reserved UID
svctag:95:Service Tag UID
unknown:96:Unknown Remote UID
fdsuser:1001:FDS system user
oracle:40098:
vs:40099:FTP user
Umvsftp:40100:FTP user
crmftp:40101:CRM ftp user
skmmftp:40103:SKMM ftp user
etiger:40104:
fouser:40107:
pmuser:40118:Preventive Maintenance Team
erohsik:40119:Rohan Sikka, Ericsson GNOC
IAN:40124:Ian Chew Yue Yen, umobile
emansab:40134:Manpreet Singh Sabharwal, Ericsson GNOC
ebfhiil:40136:Gaurav Mehra, Ericsson GNOC
nasha:40137:Nasha Baharom, UMobile Developer, 1072015
egi34597:40143:ANAND KUMAR, MO 2nd LA, 31052016
egi35624:40144:Rinki Saha, MO 2nd LA, 31052016
egi37391:40145:Richa Sharma, MO 2nd LA, 31052016
egi40026:40146:AMARNATH BHUNIA, MO 2nd LA, 31052016
eshisea:40147:Seah Shiao Yin, BSCS SME, 31052016
eqsvvvx:40148:Vishal Gupta, Ericsson DRP Project, 13062016
emoaigin:40158:Aigini Navaneethan, Ericsson Sysadmin, 25082016
weiping:40159:Wan Wei Ping, weiping.wan@u.com.my, 28112016
emoazizu:40162:
nobody:60001:NFS Anonymous Access User
noaccess:60002:No Access User
nobody4:65534:SunOS 4.x NFS Anonymous Access User
#

This is the 2nd file :

Code:
# cat sortvs2.csv
mastersam:0:MasterSAM User ID, 03072017
root:0:Super-User
daemon:1:
bin:2:
sys:3:
adm:4:Admin
uucp:5:uucp Admin
nuucp:9:uucp Admin
smmsp:25:SendMail Message Submission Program
listen:37:Network Admin
gdm:50:GDM Reserved UID
lp:71:Line Printer Admin
webservd:80:WebServer Reserved UID
postgres:90:PostgreSQL Reserved UID
svctag:95:Service Tag UID
unknown:96:Unknown Remote UID
fdsuser:1001:FDS system user
oracle:40098:
vs:40099:FTP user
Umvsftp:40100:FTP user
crmftp:40101:CRM ftp user
skmmftp:40102:SKMM ftp user
etiger:40103:
fouser:40107:
emorajen:40116:Rajendra Nagireddy, Ericsson GNOC
pmuser:40118:Preventive Maintenance Team
erohsik:40119:Rohan Sikka, Ericsson GNOC
IAN:40125:Ian Chew Yue Yen, umobile
emansab:40134:Manpreet Singh Sabharwal, Ericsson GNOC
ebfhiil:40136:Gaurav Mehra, Ericsson GNOC
nasha:40137:Nasha Baharom, UMobile Developer, 1072015
egi34597:40143:ANAND KUMAR, MO 2nd LA, 31052016
egi35624:40144:Rinki Saha, MO 2nd LA, 31052016
egi37391:40145:Richa Sharma, MO 2nd LA, 31052016
egi40026:40146:AMARNATH BHUNIA, MO 2nd LA, 31052016
eshisea:40147:Seah Shiao Yin, BSCS SME, 31052016
eqsvvvx:40148:Vishal Gupta, Ericsson DRP Project, 13062016
weiping:40158:Wan Wei Ping, weiping.wan@u.com.my, 28112016
emoaigin:40161:
emoazizu:40162:
nobody:60001:NFS Anonymous Access User
noaccess:60002:No Access User
nobody4:65534:SunOS 4.x NFS Anonymous Access User
#

This is the output after comparing both the files (I used comm) :

Code:
# comm -12 sortvs1.csv sortvs2.csv
root:0:Super-User
daemon:1:
bin:2:
sys:3:
adm:4:Admin
uucp:5:uucp Admin
nuucp:9:uucp Admin
smmsp:25:SendMail Message Submission Program
listen:37:Network Admin
gdm:50:GDM Reserved UID
lp:71:Line Printer Admin
webservd:80:WebServer Reserved UID
postgres:90:PostgreSQL Reserved UID
svctag:95:Service Tag UID
unknown:96:Unknown Remote UID
fdsuser:1001:FDS system user
oracle:40098:
vs:40099:FTP user
Umvsftp:40100:FTP user
crmftp:40101:CRM ftp user
#

As you can see from the compared output, the list only lists users till the user ID 40101. But if you see the first two files, there are other IDs in it, like 40162. Why does it not list this ID?

Also is it not possible to get the similar names if I sort both the files using username (first column)? Is there any other way to do so? This is because there are some usernames with different UIDs in each server.

Another question is, how (which command to use) to list similar user IDs in multiple files (10 - 15).

Last edited by anaigini45; 07-06-2017 at 01:09 AM..
# 2  
Old 07-06-2017
I'm a bit surprized that you get any output at all, as I get
Code:
comm -12 file1 file2
comm: file 1 is not in sorted order
comm: file 2 is not in sorted order

with your two files.
The reason for emoazizu not being found is the DOS line terminator <CR> (^M, \r. 0x0D) in one file that is missing in the other:
Code:
grep emoazizu file[12] | hd
00000000  66 69 6c 65 31 3a 65 6d  6f 61 7a 69 7a 75 3a 34  |file1:emoazizu:4|
00000010  30 31 36 32 3a 0a 66 69  6c 65 32 3a 65 6d 6f 61  |0162:.file2:emoa|
00000020  7a 69 7a 75 3a 34 30 31  36 32 3a 0d 0a           |zizu:40162:..|

Mayhap diff (if the options shown are offered by your version) could help you?
Code:
diff -y -b --suppress-common-lines file1 file2
mastersam:0:MasterSAM user ID, 03072017                  |    mastersam:0:MasterSAM User ID, 03072017
skmmftp:40103:SKMM ftp user                      |    skmmftp:40102:SKMM ftp user
etiger:40104:                              |    etiger:40103:
                                  >    emorajen:40116:Rajendra Nagireddy, Ericsson GNOC
IAN:40124:Ian Chew Yue Yen, umobile                  |    IAN:40125:Ian Chew Yue Yen, umobile
emoaigin:40158:Aigini Navaneethan, Ericsson Sysadmin, 2508201 |    weiping:40158:Wan Wei Ping, weiping.wan@u.com.my, 28112016
weiping:40159:Wan Wei Ping, weiping.wan@u.com.my, 28112016    |    emoaigin:40161:

And, what are "similar" names? In a (digital!) IS environment, you have TRUE or FALSE, EQUAL or NOT EQUAL, YES or NO, and that's it. Some commands offer some fuzzyness, like "ignore case", or "ignore unequal space char count", but to find "similarities", there's quite a long way to go. You first have to define "what is similar" (do you need to respect locale features, do you want to accept typographical errors), and then you need to code a program implementing these definitions and rules, as I'm afraid there's not too many of them out of the box...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count lines with similar tokens

I have 2 files, and I wish to count number of lines with this characteristic: if any token at line x in file1, is similar to a token at line x in file2. Here's an example: file1: ab, abc ef fg file2: ab cd ef gh In this case I wish to get 3. Note that token of file1 are... (3 Replies)
Discussion started by: Viernes
3 Replies

2. Shell Programming and Scripting

Join all the lines matching similar pattern

I am trying to Join all the lines matching similar pattern. Example ; I wanted to join all the lines which has sam to a single line. In next line, i wanted to have all the lines with jones to a single line....etc > cat sample.txt sam 2012/11/23 sam 2012/12/5 sam 2012/12/5 jones... (2 Replies)
Discussion started by: evrurs
2 Replies

3. Shell Programming and Scripting

extracting lines from a file with similar first name

consider i have two files cat onlyviews1.sql CREATE VIEW V11 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V22 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V33 AS (10 Replies)
Discussion started by: vivek d r
10 Replies

4. Shell Programming and Scripting

Maximum Value of similar lines

Hi, Pretty new to scripting sed awk etc. I'm trying to speed up calculations of disk space allocation. I've extracted the data i want and cleaned it up but i cant figure out the final step. I need to discover a Maximum value of 1 field where the value of another field is the same using awk so... (4 Replies)
Discussion started by: imarcs
4 Replies

5. Shell Programming and Scripting

remove one of each similar lines in a file

Hello folks I have a question for you gurus of sed or grep (maybe awk, but I would prefer the first two) I have a file (f1) that says: (actually, these are not numbers but md5sum, but for simplicity, let's assume these numbers.) 1 2 3 4 5And I have a file (f2) that says 1|a 1|b 1|c 2|d... (3 Replies)
Discussion started by: tukuyomi
3 Replies

6. Shell Programming and Scripting

Counting similar lines

Hi, I have a little problem with counting lines. I know similar topics from this forum, but they don't resolve my problem. I have file with lines like this: 2009-05-25 16:55:32,143 some text some regular expressions ect. 2009-05-25 16:55:32,144 some text. 2009-05-28 18:15:12,148 some... (4 Replies)
Discussion started by: marcinnnn
4 Replies

7. Shell Programming and Scripting

merging similar lines

Greetings, I have been trying to merge the following lines: Sat. May 9 8:00 PM Sat. May 9 8:00 PM CW Sat. May 9 8:00 PM CW Cursed Sat. May 9 9:00 PM Sat. May 9 9:00 PM CW Sat. May 9 9:00 PM CW Sanctuary Sat. May 16 8:00 PM Sat. May 16 8:00 PM CW Sat. May 16 8:00 PM CW Sanctuary Sat. May... (2 Replies)
Discussion started by: adambot
2 Replies

8. Infrastructure Monitoring

Remove Similar Lines from a File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead... (4 Replies)
Discussion started by: Nysif Steve
4 Replies

9. Shell Programming and Scripting

Deleting the similar lines

Dear Friends myself Avinash working in bash shell The problem goes like this I have a file called work.txt assume that first colum=mac address second colum= IP third colum = port number ---------------------------------------- 00:12:23:34 192.168.50.1 2 00:12:23:35 192.168.50.1 5... (2 Replies)
Discussion started by: avi.skynet
2 Replies

10. Shell Programming and Scripting

Urgent : Merge similar lines

Hi, I have a file like this. please notice that ./usr/orders1/order_new_2627 appears more than once, thus needs to be merged. I would like to merge the lines where the first column match so the output should be like this: Please help (2 Replies)
Discussion started by: rakeshou
2 Replies
Login or Register to Ask a Question