Comparing two one-line files and selecting what does not match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing two one-line files and selecting what does not match
# 1  
Old 08-15-2018
Comparing two one-line files and selecting what does not match

I have two files. One is consisting of one line, with data separated by spaces and each number appearing only once.
The other is consisting of one column and multiple lines which can have some numbers appearing more than once.
It looks something like this:

file 1:
Code:
20 700 15 30

file2:
Code:
10
10 
200
200
700
700
700
20
30
30
50

(The files are a result of some other processing and scripts so there could be some extra spaces or tabs that I cannot easily influence/remove)

I would like to print the lines from file2 that do not have a match in file1. It is very important that in case there aren't any lines in file2 that do not have a match in file1 (i.e. when the file2 doesn't contain any numbers that aren't already in file1), I get a completely empty file, and not spaces or any other characters.

I have found some ways to do it when both files are columns, but not when one of them is a one line. When I tried transforming the one line file into a one column file, I got some unwanted spaces in the output.

Thank you!


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 08-15-2018 at 07:17 PM.. Reason: Added CODE tags.
# 2  
Old 08-15-2018
Welcome to the forum.


Please show the attempts you made and where you got stuck.
# 3  
Old 08-15-2018
I tried with turning file1 into a column file with:


Code:
tr ' ' '\n' < file1 | awk '{print $0}' > file1_new

and then solving it by working with columns


Code:
awk '{k = $1} NR==FNR{a[k]; next} !(k in a)' file1_new  file2

However, I then got an empty line as the output (instead of the wanted empty file) when both files contained the same numbers (as described in the end of my original post). I would like to solve it without modifying file1, but I don't know how to approach and start there.
# 4  
Old 08-15-2018
comm -23 will work for you
# 5  
Old 08-15-2018
thanks I've tried that now but I still have the same problem as when using the code from my second post.
With the files containing different numbers as in my first post, I get empty lines as first and last line.



Since my data is not in columns but in one line for file1, and they are a part of a cshell script and come as results in a loop,it would be difficult to be sure that it will never have any extra characters, I would rather keep them as a one line instead of converting to a column.

Is there a way to use indices with lines as with the columns in awk?
# 6  
Old 08-15-2018
You don't need to change the original files, but you can do whatever is needed to your own work files.

You were getting close by breaking many numbers on one line to one per line. From there, sort copies of both files the same way (file 2 may need a unique sort) and then run them through diff. This process won't work if diff outputs "c" lines with both "<" and ">", but if not you can take out lines containing a or d, then take out the first two characters of all other lines. For example:
Code:
diff file1 file2 |grep -v d |  sed 's/..//' >outputfile

# 7  
Old 08-15-2018
Sorry I don't understand what do you mean by c lines and lines containing a or d?


Also, this code gave me the which has in its second line two numbers separated by comma which are in neither of the files, is that some counter of data entries?

Last edited by maya3; 08-15-2018 at 06:35 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Selecting section and removing match

I have a file with contents as shown in file.texi Would like to keep only the sections that have inlineifset till the empty line is reached. Finally replace the following string with a space @inlineifset{mrg, @opar{@bullet{} I had written the following command but it messed my file ... (6 Replies)
Discussion started by: Danette
6 Replies

2. UNIX for Beginners Questions & Answers

Comparing two files and list the difference with common first line content of both files

I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ? Source File ************* # file: /local/test_1 # owner: own #... (4 Replies)
Discussion started by: sarathy_a35
4 Replies

3. Shell Programming and Scripting

Comparing two columns in two files and printing a third based on a match

Hello all, First post here. I did not notice a previous post to help me down the right path. I am looking to compare a column in a CSV file against another file (which is not a column match one for one) but more or less when a match is made, I would like to append a third column that contains a... (17 Replies)
Discussion started by: dis0wned
17 Replies

4. Shell Programming and Scripting

Match string from two files and print line

Hi, I have been trying to find help with my issue and I'm thinking awk may be able to do it. I have two files eg file1.txt STRING1 230 400 0.36 STRING2 400 230 -0.13 STRING3 130 349 1 file2.txt CUFFFLINKS 1 1394 93932 . + STRING1 CUFFFLINKS ... (9 Replies)
Discussion started by: zward
9 Replies

5. Shell Programming and Scripting

Selecting nearest pattern match

I'm looking to match an error code against a list of possible codes and get the nearest match. The code would be a 6 character hexadecimal string. I have a file of error codes all of which have a specific first 3 characters, however, after that the last 3 characters may be specific or generic as... (3 Replies)
Discussion started by: dazedandconfuse
3 Replies

6. Shell Programming and Scripting

Comparing two files line by line

Hi All, I want to compare two files using shell script. One file will be input file and each line of input file will be compared against the other file. for e.g. File 1 10.3.242.170 saquatch Tesr.adc.unix.com jndi_p1 jndi_p1.unix.com 10.3.242.171 ness... (10 Replies)
Discussion started by: sharsour
10 Replies

7. UNIX for Dummies Questions & Answers

Comparing two test files and printing out the values that do not match

Hi, I have two text files with matching first columns. Some of the values in the second column do not match. I want to write a script to print out the rows (only the first column) where the values in the second column do not match. Example: Input 1 A 1 B 2 C 3 D 4 Input 2 A 2 B 2... (6 Replies)
Discussion started by: evelibertine
6 Replies

8. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

9. Shell Programming and Scripting

Comparing two files and printing 2nd column if match found

Hi guys, I'm rather new at using UNIX based systems, and when it comes to scripting etc I'm even newer. I have two files which i need to compare. file1: (some random ID's) 451245 451288 136588 784522 file2: (random ID's + e-mail assigned to ID) 123888 xc@xc.com 451245 ... (21 Replies)
Discussion started by: spirm8
21 Replies

10. UNIX for Dummies Questions & Answers

Comparing two files and count number of lines that match

Hello all, I always found help for my problems using the search option, but this time my request is too specific. I have two files that I want to compare. File1 is the index and File2 contains the data: File1: chr1 protein_coding exon 500 600 . + . gene_id "20532";... (0 Replies)
Discussion started by: DerSeb
0 Replies
Login or Register to Ask a Question