How-to check if file1 a subset of file2 ?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How-to check if file1 a subset of file2 ?
# 1  
Old 11-17-2016
Bug How-to check if file1 a subset of file2 ?

I need to know if file1 is a subset of file2 i.e all the contents of file1 are present in file2 or not.

Here is how i would do it.

Read line by line file1 and grep every line in file2 in a for loop. any failing grep would means that it is not a subset.

Is there a quicker or easier way to check that ?

Code:
uname -a
SunOS mymac 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise

# 2  
Old 11-17-2016
You can also use the -vf flags of grep something like this:-
Code:
grep -vf file2 file1

This will display lines from file1 that are not in file2. No output infers that file1 lines are all found in file2, however there is no consideration on the order of the lines, if there are duplicate lines etc.


Would this do what you need?

I haven't got a SunOS server available to test on, so you might need to adjust your code to perhaps use fgrep and drop the -f flag.



Robin

Last edited by rbatte1; 11-17-2016 at 09:15 AM.. Reason: Note about 'grep -vf' maybe needing to become 'fgrep -v'
This User Gave Thanks to rbatte1 For This Post:
# 3  
Old 11-17-2016
Quote:
Originally Posted by rbatte1
You can also use the -vf flags of grep something like this:-
Code:
grep -vf file2 file1

This will display lines from file1 that are not in file2. No output infers that file1 lines are all found in file2, however there is no consideration on the order of the lines, if there are duplicate lines etc.


Would this do what you need?

I haven't got a SunOS server available to test on, so you might need to adjust your code to perhaps use fgrep and drop the -f flag.



Robin
Works but how can i ignore newlines / whitespace lines ? Becoz the files shows as different just becoz if has a few new / blank lines.
# 4  
Old 11-17-2016
rbatte's approach is a good one.

Consider using tr or grep -v to remove blank line in both files.
By blank I assume '/n' only on a "blank line"
Code:
grep -v '^$' file1> tmpfile1

Use tmpfile1 instead of file1

But.
If you have files with thousands of lines, the run time for the approach will be large. And if you have to do this for a large number of files you could think of this project as a full-time hobby for the next few weeks.

Check back after you try it. If it works well for your dataset, wonderful. If not, well, we may be able to help you set up some kind of parallelism to reduce run-times by a large factor.
This User Gave Thanks to jim mcnamara For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

How to check if file2 is a subset of file1?

In-order to check and print if file2 is a subset of file one i do the below. var1=$(cat //tmp/file1 | sort -u | wc) var2=$(cat /tmp/file2 /tmp/file1 | sort -u | wc) if ; then echo "file2 is a subset of file1 becoz var1 and var2 have the same values." fi However, i get the following error ... (1 Reply)
Discussion started by: mohtashims
1 Replies

3. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

4. Shell Programming and Scripting

If file1 and file2 exist then

HI, I would like a little help on writing a if statement. What i have so far is: #!/bin/bash FILE1=path/to/file1 FILE2=path/to/file2 echo ${FILE1} ${FILE2} if ] then echo file1 and file2 not found else echo FILE ok fi (6 Replies)
Discussion started by: techy1
6 Replies

5. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

grep -f file1 file2

Hi I started to learn bash a week ago. I need filter the strings from the last column of a "file2" that match with a column from an other "file1" file1: chr10100036394-100038350AK077761 chr10100041065-100046547AK032226 chr10100041065-100046547AK016270 chr10100041065-100046547AK078231 ...... (6 Replies)
Discussion started by: geparada88
6 Replies

7. Shell Programming and Scripting

file1 newer then file2

Hello, I am new to shell scripting and i need to create a script with the following directions and I can not figure it out. Create a shell script called newest.bash that takes two filenames as input arguments ($1 and $2) and prints out the name of the newest file (i.e. the file with the... (1 Reply)
Discussion started by: mandylynn78
1 Replies

8. Shell Programming and Scripting

grep -f file1 file2

Wat does this command do? fileA is a subset of fileB..now, i need to find the lines in fileB that are not in fileA...i.e fileA - fileB. diff fileA fileB gives the ouput but the format looks no good.... I just need the contents alone not the line num etc. (7 Replies)
Discussion started by: vijay_0209
7 Replies

9. Shell Programming and Scripting

Based on num of records in file1 need to check records in file2 to set some condns

Hi All, I have two files say file1 and file2. I want to check the number of records in file1 and if its atleast 2 (i.e., 2 or greater than 2 ) then I have to check records in file2 .If records in file2 is atleast 1 (i.e. if its not empty ) i have to set some conditions . Could you pls... (3 Replies)
Discussion started by: mavesum
3 Replies

10. Shell Programming and Scripting

match value from file1 in file2

Hi, i've two files (file1, file2) i want to take value (in column1) and search in file2 if the they match print the value from file2. this is what i have so far. awk 'FILENAME=="file1"{ arr=$1 } FILENAME=="file2" {print $0} ' file1 file2 (2 Replies)
Discussion started by: myguess21
2 Replies
Login or Register to Ask a Question