Is there a UNIX command that can compare fields of files with differing number of fields?


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Is there a UNIX command that can compare fields of files with differing number of fields?
# 1  
Old 06-02-2019
Is there a UNIX command that can compare fields of files with differing number of fields?

Hi,

Below are the sample files. x.txt is from an Excel file that is a list of users from Windows and y.txt is a list of database account.

Code:
$ head -500 x.txt y.txt
==> x.txt <==
TEST01  APP_USER_PROFILE
USER03  APP_USER_PROFILE
TEST02  APP_USER_EXP_PROFILE
TEST04  APP_USER_PROFILE
USER01  APP_USER_PROFILE
USER02  APP_USER_EXP_PROFILE
USER04  APP_USER_PROFILE
MINNIE  APP_USER_EXP_PROFILE
XXXX02  APP_USER_PROFILE
XXXX04  APP_USER_PROFILE
ABCD01  APP_USER_PROFILE
ABCD02  APP_USER_PROFILE
TEST03  APP_USER_PROFILE
ABCD03  APP_USER_PROFILE
ZZZZ03  APP_USER_PROFILE
PLUTO   APP_USER_PROFILE
ABCD04  APP_USER_PROFILE

==> y.txt <==
TEST01  APP_USER_PROFILE        LOCKED  2016-08-05=14:00
USER03  APP_USER_PROFILE        OPEN
TEST02  APP_USER_EXP_PROFILE    EXPIRED 2017-01-01=13:00
TEST04  APP_USER_PROFILE        LOCKED  2017-12-25=15:00
USER01  APP_USER_PROFILE        EXPIRED 2017-02-14=14:00
USER02  APP_USER_EXP_PROFILE    LOCKED  2018-12-25=11:00
USER04  APP_USER_PROFILE        OPEN
XXXX01  APP_USER_EXP_PROFILE    LOCKED  2016-04-01=12:00
XXXX02  APP_USER_PROFILE        LOCKED  2019-01-01=01:00
XXXX04  APP_USER_PROFILE        OPEN
ABCD01  APP_USER_PROFILE        OPEN
ABCD02  APP_USER_PROFILE        LOCKED  2019-01-23=15:00
TEST03  APP_USER_PROFILE        OPEN
ABCD03  APP_USER_PROFILE        OPEN
ZZZZ03  APP_USER_PROFILE        OPEN
ABCD04  APP_USER_PROFILE        OPEN
MICKEY  APP_USER_PROFILE        LOCKED  2018-04-01=16:00
DONALD  APP_USER_PROFILE        OPEN

And at the moment I am using the script below to check whether each line of x.txt exist in y.txt or not. That is whether a Windows account in x.txt exist as a database account in y.txt

Code:
$ cat x.ksh
#!/bin/ksh
#

search_list="x.txt"
search_from="y.txt"

while read list
do
   if [[ ! -z `grep "^$list" ${search_from}` ]] ; then
      echo "- FOUND ==> `grep "^$list" ${search_from}`"
   else
      echo "- NOT FOUND !!! ${list}"
   fi
done < ${search_list}

Sample run of the script below:

Code:
$ ./x.ksh
- FOUND ==> TEST01      APP_USER_PROFILE        LOCKED  2016-08-05=14:00
- FOUND ==> USER03      APP_USER_PROFILE        OPEN
- FOUND ==> TEST02      APP_USER_EXP_PROFILE    EXPIRED 2017-01-01=13:00
- FOUND ==> TEST04      APP_USER_PROFILE        LOCKED  2017-12-25=15:00
- FOUND ==> USER01      APP_USER_PROFILE        EXPIRED 2017-02-14=14:00
- FOUND ==> USER02      APP_USER_EXP_PROFILE    LOCKED  2018-12-25=11:00
- FOUND ==> USER04      APP_USER_PROFILE        OPEN
- NOT FOUND !!! MINNIE  APP_USER_EXP_PROFILE
- FOUND ==> XXXX02      APP_USER_PROFILE        LOCKED  2019-01-01=01:00
- FOUND ==> XXXX04      APP_USER_PROFILE        OPEN
- FOUND ==> ABCD01      APP_USER_PROFILE        OPEN
- FOUND ==> ABCD02      APP_USER_PROFILE        LOCKED  2019-01-23=15:00
- FOUND ==> TEST03      APP_USER_PROFILE        OPEN
- FOUND ==> ABCD03      APP_USER_PROFILE        OPEN
- FOUND ==> ZZZZ03      APP_USER_PROFILE        OPEN
- NOT FOUND !!! PLUTO   APP_USER_PROFILE
- FOUND ==> ABCD04      APP_USER_PROFILE        OPEN

Converting x.txt and y.txt to pipe delimited files and sort | uniq so that they now look as below.

Code:
x.txt:
ABCD01|APP_USER_PROFILE
ABCD02|APP_USER_PROFILE
ABCD03|APP_USER_PROFILE
ABCD04|APP_USER_PROFILE
MINNIE|APP_USER_EXP_PROFILE
PLUTO|APP_USER_PROFILE
TEST01|APP_USER_PROFILE
TEST02|APP_USER_EXP_PROFILE
TEST03|APP_USER_PROFILE
TEST04|APP_USER_PROFILE
USER01|APP_USER_PROFILE
USER02|APP_USER_EXP_PROFILE
USER03|APP_USER_PROFILE
USER04|APP_USER_PROFILE
XXXX02|APP_USER_PROFILE
XXXX04|APP_USER_PROFILE
ZZZZ03|APP_USER_PROFILE

y.txt:
ABCD01|APP_USER_PROFILE|OPEN|
ABCD02|APP_USER_PROFILE|LOCKED|2019-01-23=15:00
ABCD03|APP_USER_PROFILE|OPEN|
ABCD04|APP_USER_PROFILE|OPEN|
DONALD|APP_USER_PROFILE|OPEN|
MICKEY|APP_USER_PROFILE|LOCKED|2018-04-01=16:00
TEST01|APP_USER_PROFILE|LOCKED|2016-08-05=14:00
TEST02|APP_USER_EXP_PROFILE|EXPIRED|2017-01-01=13:00
TEST03|APP_USER_PROFILE|OPEN|
TEST04|APP_USER_PROFILE|LOCKED|2017-12-25=15:00
USER01|APP_USER_PROFILE|EXPIRED|2017-02-14=14:00
USER02|APP_USER_EXP_PROFILE|LOCKED|2018-12-25=11:00
USER03|APP_USER_PROFILE|OPEN|
USER04|APP_USER_PROFILE|OPEN|
XXXX01|APP_USER_EXP_PROFILE|LOCKED|2016-04-01=12:00
XXXX02|APP_USER_PROFILE|LOCKED|2019-01-01=01:00
XXXX04|APP_USER_PROFILE|OPEN|
ZZZZ03|APP_USER_PROFILE|OPEN|

Is there a UNIX command of some sort that can be used to maybe do the same as what the script is doing to simply display matching and non-matching first and second field between the two files? I can't use diff because the two files are completely different given they both don't have the same number of fields and will never be.

The script works as I want it to be, it just took a while for where the number of lines is in the thousands instead of a few lines like it is in the sample file.

I thought I saw a question about the same in the forum but can't find it now.
# 2  
Old 06-02-2019
Try
Code:
grep -vffile1 file2
XXXX01  APP_USER_EXP_PROFILE    LOCKED  2016-04-01=12:00
MICKEY  APP_USER_PROFILE        LOCKED  2018-04-01=16:00
DONALD  APP_USER_PROFILE        OPEN

and
Code:
grep -vf<(cut -d" " -f1-3 file2) file1
MINNIE  APP_USER_EXP_PROFILE
PLUTO   APP_USER_PROFILE

# 3  
Old 06-02-2019
Hi.

Using RudiC's technique for cut, also consider diff and comm:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate comparison of selected fields, diff, comm.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C diff comm

pl " Input data files data[12], pasted for brevity:"
paste data[12] | expand -30

pl " Results, diff:"
diff data1 <(cut -d "|" -f1,2 data2)

pl " Results, comm:"
comm -3 data1 <(cut -d "|" -f1,2 data2)

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-7-amd64, x86_64
Distribution        : Debian 8.11 (jessie) 
bash GNU bash 4.3.30
diff (GNU diffutils) 3.3
comm (GNU coreutils) 8.23

-----
 Input data files data[12], pasted for brevity:
ABCD01|APP_USER_PROFILE       ABCD01|APP_USER_PROFILE|OPEN|
ABCD02|APP_USER_PROFILE       ABCD02|APP_USER_PROFILE|LOCKED|2019-01-23=15:00
ABCD03|APP_USER_PROFILE       ABCD03|APP_USER_PROFILE|OPEN|
ABCD04|APP_USER_PROFILE       ABCD04|APP_USER_PROFILE|OPEN|
MINNIE|APP_USER_EXP_PROFILE   DONALD|APP_USER_PROFILE|OPEN|
PLUTO|APP_USER_PROFILE        MICKEY|APP_USER_PROFILE|LOCKED|2018-04-01=16:00
TEST01|APP_USER_PROFILE       TEST01|APP_USER_PROFILE|LOCKED|2016-08-05=14:00
TEST02|APP_USER_EXP_PROFILE   TEST02|APP_USER_EXP_PROFILE|EXPIRED|2017-01-01=13:00
TEST03|APP_USER_PROFILE       TEST03|APP_USER_PROFILE|OPEN|
TEST04|APP_USER_PROFILE       TEST04|APP_USER_PROFILE|LOCKED|2017-12-25=15:00
USER01|APP_USER_PROFILE       USER01|APP_USER_PROFILE|EXPIRED|2017-02-14=14:00
USER02|APP_USER_EXP_PROFILE   USER02|APP_USER_EXP_PROFILE|LOCKED|2018-12-25=11:00
USER03|APP_USER_PROFILE       USER03|APP_USER_PROFILE|OPEN|
USER04|APP_USER_PROFILE       USER04|APP_USER_PROFILE|OPEN|
XXXX02|APP_USER_PROFILE       XXXX01|APP_USER_EXP_PROFILE|LOCKED|2016-04-01=12:00
XXXX04|APP_USER_PROFILE       XXXX02|APP_USER_PROFILE|LOCKED|2019-01-01=01:00
ZZZZ03|APP_USER_PROFILE       XXXX04|APP_USER_PROFILE|OPEN|
                              ZZZZ03|APP_USER_PROFILE|OPEN|

-----
 Results, diff:
5,6c5,6
< MINNIE|APP_USER_EXP_PROFILE
< PLUTO|APP_USER_PROFILE
---
> DONALD|APP_USER_PROFILE
> MICKEY|APP_USER_PROFILE
14a15
> XXXX01|APP_USER_EXP_PROFILE

-----
 Results, comm:
        DONALD|APP_USER_PROFILE
        MICKEY|APP_USER_PROFILE
MINNIE|APP_USER_EXP_PROFILE
PLUTO|APP_USER_PROFILE
        XXXX01|APP_USER_EXP_PROFILE

Best wishes ... cheers, drl
# 4  
Old 06-03-2019
Quote:
Originally Posted by RudiC
Try
Code:
grep -vffile1 file2
XXXX01  APP_USER_EXP_PROFILE    LOCKED  2016-04-01=12:00
MICKEY  APP_USER_PROFILE        LOCKED  2018-04-01=16:00
DONALD  APP_USER_PROFILE        OPEN

and
Code:
grep -vf<(cut -d" " -f1-3 file2) file1
MINNIE  APP_USER_EXP_PROFILE
PLUTO   APP_USER_PROFILE

Note that this way, the spacing needs to be exactly the same for this to work. Also note that had PLUTO been present in y.txt it would still have been either listed as absent, because of the cut utility's single space separator, if the spacing were different, or listed as present on the basis of column 1 only and the exactly two spaces after it.

IMO, a more robust way is to use the spacing tolerance of for example awk plus exact string matching :
Code:
awk 'NR==FNR{A[$1,$2]; next} !(($1,$2) in A)' y.txt x.txt

or the other way around:
Code:
awk 'NR==FNR{A[$1,$2]; next} !(($1,$2) in A)' x.txt y.txt

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare fields in two files

Hi, I am trying to check two files based on certain string and field. cat f1 source=\GREP\" hi this \\ source=\SED\" skdmsmd dnksdns source=\PERL\" cat f2 source=\SED\" source=\GREP\" vlamskds amdksk m source=\AWK\" awk \here\" (3 Replies)
Discussion started by: greet_sed
3 Replies

2. Shell Programming and Scripting

awk - compare 1st 15 fields of record with 20 fields

I'm trying to compare 2 files for differences in a selct number of fields. When differnces are found it will write the whole record of the second file including appending '|C' out to a delta file. Each record will have 20 fields, but only want to do comparison of 1st 15 fields. The 1st field of... (7 Replies)
Discussion started by: sljnk
7 Replies

3. UNIX for Dummies Questions & Answers

Compare 2 fields in 2 files

I am trying to compare two files (separted by a pipe) using 2 fields (field 1,3 from fileA and 1,2 from fileB) if the two files match i want the whole record of fileA adding the extra fields left from fileB. 1. A.txt cat|floffy|12|anything|anythings cat|kitty|15|lala|lalala... (6 Replies)
Discussion started by: sabercats
6 Replies

4. Shell Programming and Scripting

Compare three files based on two fields

Guys, I tried searching on the internet and I couldn't get the answer for this problem. I have 3 files. First 2 fields of all of them are of same type, say they come from various databases but first two fields in the 3 files means the same. I need to verify the entries that are not present... (4 Replies)
Discussion started by: PikK45
4 Replies

5. UNIX for Dummies Questions & Answers

[Solved] Help correcting file with differing number of fields

Hi all, I have a tab separated file, and one of the fields is sub-delimited by colon. The problem is there can be zero to 4 colons within this field. When I try to change colons to tabs the result is a file with a differing number of fields. I want to go from: a:b:c:d:e a:b:c a:b:c:d:e a... (4 Replies)
Discussion started by: torchij
4 Replies

6. Shell Programming and Scripting

Compare two fields in text files?

Hi, I have two text files, compare column one in both the files and if it matches then the output should contain the id in column one, the number and the description. Both the files are sorted. Is there a one liner to get this done, kindly help. Thank you File 1: NC_000964 92.33 ... (2 Replies)
Discussion started by: pulikoti
2 Replies

7. Shell Programming and Scripting

Compare fields in files

Hi, I need the most efficient way of comparing the following and arriving at the result I have a file which has entries like, File1: 1|2|5|7|8|2|3|6|3|1 File2: 1|2|3|1|2|7|9|2 I need to compare the entries in these two file with those of a general file, 1|2|3|5|2|5|6|9|3|1... (7 Replies)
Discussion started by: pradebban
7 Replies

8. Shell Programming and Scripting

compare fields in different files

HI I'm having some troubles to compare and permut diffrent fields indexed with another filed like the following example `: file1 1 1 2 2 3 3 file2 7 1 9 2 10 3 result------------------- (6 Replies)
Discussion started by: yassinegoth
6 Replies

9. Shell Programming and Scripting

Compare fields in 2 files using AWK

Hi unix gurus, I have a urgent requirement, I need to write a AWK script to compare each fields in 2 files using AWK. Basically my output should be like this. file1 row|num1|num2|num3 1|one|two|three 2|one|two|three file2 row|num1|num2|num3 1|one|two|three 2|one|two|four ... (5 Replies)
Discussion started by: rashmisb
5 Replies

10. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Discussion started by: axo959
4 Replies
Login or Register to Ask a Question