Generate separate files with similar and dissimilar contents


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Generate separate files with similar and dissimilar contents
# 1  
Old 07-22-2016
Generate separate files with similar and dissimilar contents

Hello experts,

I have 2 files 1.txt (10,000 lines of text) and 2.txt (7500 lines of text).
Both files have similar as well as dissimilar entries.
Is there a way(s) where i can perform the following operations :

1. Generate a file which will have all similar lines.
2. Generate a file which will have all dissimilar lines.

On my part, I performed the following command in order to, generate a file which will have all dissimilar lines :

Code:
fgrep -v -f 1.txt 2.txt > 3.txt


Example of file 1.txt

Code:
 1
 2
 4
 6
 8
 3
 g
 f


Example of file 2.txt

Code:
 1
 x
 z
 3
 m
 0
 8




Could you please help with both these queries.

Thank you.

Regards,
Haider

Last edited by Scrutinizer; 07-22-2016 at 05:48 AM.. Reason: adding example; [mod] icode tags changed to code tags. Added code tags for data samples
# 2  
Old 07-22-2016
First of all I would recommend you to read the man pages of grep and understand the working of switch "-v". You'll get your answer :-)
# 3  
Old 07-22-2016
Note that grep -Fvf 1.txt 2.txt will just give you a list of lines in 2.txt that are not in 1.txt. To also get the list of lines in 1.txt that are not in 2.txt, you'll need a second grep.
# 4  
Old 07-25-2016
Thanks for the idea, I tried the following code to get the
combined result of dissimilar elements from both files

Code:
 grep -Fvf 2.txt 1.txt && grep -Fvf 1.txt 2.txt

 2
4
6
g
f
x
z
m
0

With regards to similar elements, I tried the following code

Code:
 grep -Fxf 2.txt 1.txt
1
8
3
 
grep -Fxf 1.txt 2.txt
1
3
8

The result is the same in both cases.
Moderator's Comments:
Mod Comment Please use CODE tags (note ICODE tags) for full-line and multi-line sample input, output, and code segments.

Last edited by Don Cragun; 07-25-2016 at 04:15 AM.. Reason: Change ICODE tags to CODE tags.
# 5  
Old 07-25-2016
Hello H squared,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print $1 >> "similar_ones.txt";delete A[$1];next} !($1 in A){print $1 >> "dissimilar_ones.txt"} END{for(i in A){print A[i] >> "dissimilar_ones.txt"}}'  Input_file1   Input_file2

Above will create 2 files named similar_ones.txt and dissimilar_ones.txt, which will be as follows.
Code:
cat similar_ones.txt
1
3
8

cat dissimilar_ones.txt
x
z
m
0
4
6
f
2
g

EDIT: Adding a non-one liner form of solution now.
Code:
awk 'FNR==NR{
             A[$1]=$1;
             next
            }
     ($1 in A){
                print $1 >> "similar_ones.txt";
                delete A[$1];
                next
              }
     !($1 in A){
                print $1 >> "dissimilar_ones.txt"
               }
     END{
                for(i in A){
                                print A[i] >> "dissimilar_ones.txt"
                           }
        }
    '  Input_file1   Input_file2

NOTE: File dissimilar_ones.txt will have difference of both the files, means: will have contents which are in Input_file1 and NOT in Input_file2
+ will have contents which are in Input_file2 and NOT in Input_file1.

Thanks,
R. Singh

Last edited by RavinderSingh13; 07-25-2016 at 04:40 AM.. Reason: Added a non-one liner form of solution now.
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 07-25-2016
Note that:
Code:
grep -Fvf 2.txt 1.txt && grep -Fvf 1.txt 2.txt

won't look for lines in 2.txt that are not in 1.txt if there aren't any lines in 1.txt that are not in 2.txt. It would seem that:
Code:
grep -Fvf 2.txt 1.txt ; grep -Fvf 1.txt 2.txt

would be more likely to give you what you want.

Note also that if you are only looking for complete line matches when looking for matching lines, you probably also want complete line matches when looking for non-matching lines. That would be:
Code:
grep -Fxvf 2.txt 1.txt; grep -Fxvf 1.txt 2.txt

Is there some reason why you would expect that:
Code:
grep -Fxf 2.txt 1.txt

and:
Code:
grep -Fxf 1.txt 2.txt

would produce different output (other than the order of matching lines found)? These commands both print lines that are present in both files. Why would lines that are found in 1.txt and found in 2.txt be different from lines that are found in 2.txt and found in 1.txt.
# 7  
Old 07-25-2016
The output of scenario 1 (dissimilar elements) can be redirected to a file as :

Code:
 { grep -Fvf 2.txt 1.txt && grep -Fvf 1.txt 2.txt; } > 3.txt

For scenario 2, it is relatively easier as the contents output is the same :

Code:
grep -Fxf 1.txt 2.txt > 3.txt

Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) when displaying full-line and multi-line sample input, sample output, and code segments.

Last edited by Don Cragun; 07-25-2016 at 09:23 PM.. Reason: Change ICODE tags to CODE tags and get rid of COLOR and FONT tags.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Getting similar lines in two files

Hi, I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no... (1 Reply)
Discussion started by: anaigini45
1 Replies

2. UNIX for Dummies Questions & Answers

How to generate one long column by merging two separate two columns in a single file?

Dear all, I have a simple question. I have a file like below (separated by tab): col1 col2 col3 col4 col5 col6 col7 21 66745 rs1234 21 rs5678 23334 0.89 21 66745 rs2334 21 rs9978 23334 0.89 21 66745 ... (4 Replies)
Discussion started by: forevertl
4 Replies

3. Shell Programming and Scripting

Merge files and generate a resume in two files

Dear Gents, Please I need your help... I need small script :) to do the following. I have a thousand of files in a folder produced daily. I need first to merge all files called. txt (0009.txt, 0010.txt, 0011.txt) and and to output a resume of all information on 2 separate files in csv... (14 Replies)
Discussion started by: jiam912
14 Replies

4. UNIX for Dummies Questions & Answers

Finding similar strings between two files

Hi, I have a file1 like this: ABAT ABCA1 ABCC1 ABCC5 ABCC8 ABCE1 ABHD2 ABL1 CAMTA1 ACBD3 ACCN1 And I have a second file like this: chr19 46118590 46119564 MACS_peak_1499 3100.00 chr19 46122009 46148405 CYP2B7P1 -2445 chr1 7430312 7430990... (7 Replies)
Discussion started by: a_bahreini
7 Replies

5. Shell Programming and Scripting

Looking to find files that are similar.

Hello all, I have a server that is running AIX, running a tool that converts various printstreams (AFP/Metadata) to PDF. This is done using a rexx script and an off the shelf utility. Each report (there's around 125) uses a certain script file, it's basically a text file. I am trying... (5 Replies)
Discussion started by: jeffs42885
5 Replies

6. Shell Programming and Scripting

Using bash to separate files files based on parts of a filename

Hey guys, Sorry for the basic question but I have a lot of files that I want to separate into groups based on filenames which I can then cat together. Eg I have: (a_b_c.txt) WB34_2_SLA8.txt WB34_1_SLA8.txt WB34_1_DB10.txt WB34_2_DB10.txt WB34_1_SLA8.txt WB34_2_SLA8.txt 77_1_SLA8.txt... (1 Reply)
Discussion started by: Breentax
1 Replies

7. Shell Programming and Scripting

appending data from similar files

I am familiar with scripting, but I am trying to see if there is an easy way to append files from similar files into one file. For example, if there is file1_20121201, file1_20121202, file1_20121203, file2_20121201, file2_20121202, file2_20121203 I want to be able to combine all the data from... (3 Replies)
Discussion started by: mrbean1975
3 Replies

8. Shell Programming and Scripting

Read file contents and separate the lines when completes with =

Hi, I have a file like this cpsSystemNotifyTrap='2010/12/14 11:05:31 CST' Manufacturer=IBM ReportingMTMS=n/a ProbNm=26 LparName=n/a FailingEnclosureMTMS=7946-IQL*99G4874 SRC=B3031107 EventText=Problem reported by customer. CallHome=true Calendar I want to have a output like this... (6 Replies)
Discussion started by: dbashyam
6 Replies

9. Shell Programming and Scripting

compare the similar files

I got many pair files, which only have small difference, such as more space, or more empty line, and some unreadable characters. If list by commend "diff", I can see many many difference. So I'd like to write a script to compare the pair files, if 95% contents are same, I will think they are... (2 Replies)
Discussion started by: rdcwayx
2 Replies

10. Shell Programming and Scripting

How to print Dissimilar keys and their values?

Hi guyz I have been using this script to find similar keys in 2 files and merge the keys along with their values. Therefore it prints similar keys by leaving dissimilar. Any one knows how to print only Dissimilar leaving Similar. Help would be appreciated. The script I'm using for similar... (4 Replies)
Discussion started by: repinementer
4 Replies
Login or Register to Ask a Question