compare the similar files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting compare the similar files
# 1  
Old 06-24-2010
compare the similar files

I got many pair files, which only have small difference, such as more space, or more empty line, and some unreadable characters.

If list by commend "diff", I can see many many difference.

So I'd like to write a script to compare the pair files, if 95% contents are same, I will think they are similar.

Any suggestion for it?
# 2  
Old 06-24-2010
you can use
Code:
diff -w file1 file2

to ignore all the whitespace differences.
That will not take care of the unprintable characters though.
# 3  
Old 06-24-2010
Hi.

I would normalize the files first: remove empty lines, squeeze multiple spaces to one, remove the unprintable characters, etc., and then compare the files.

You may also be able to get some guidelines from The software and text similarity tester SIM

Good luck ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Bash selection of files with similar name

Hi all, This is my first day on Linux shell!!! So, I am trying to write a script that that will pick up pairs of files with the same name (not the same content) but that are different in one character (one is *R1 the other is *R2)... Something like: look ate the files, whenever they are the... (3 Replies)
Discussion started by: ALou
3 Replies

2. UNIX for Beginners Questions & Answers

How to compare two files in UNIX using similar to vlookup?

Hi, I want to compare same column in two files, if values match then display the column or display "NA". Ex : File 1 : 123 abc xyz pqr File 2: 122 aab fdf pqr fff qqq rrr (1 Reply)
Discussion started by: hkoshekay
1 Replies

3. Solaris

Getting similar lines in two files

Hi, I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no... (1 Reply)
Discussion started by: anaigini45
1 Replies

4. Shell Programming and Scripting

Editing files with sed or something similar

{ "AFafa": "FAFA","AFafa": "FAFA" "baseball":"soccer","wrestling":"dancing" "rhinos":"crocodiles","roles":"foodchain" } I need to insert a new line before the closing brackets "}" so that the final output looks like this: { "AFafa": "FAFA","AFafa": "FAFA"... (6 Replies)
Discussion started by: SkySmart
6 Replies

5. UNIX for Dummies Questions & Answers

Finding similar strings between two files

Hi, I have a file1 like this: ABAT ABCA1 ABCC1 ABCC5 ABCC8 ABCE1 ABHD2 ABL1 CAMTA1 ACBD3 ACCN1 And I have a second file like this: chr19 46118590 46119564 MACS_peak_1499 3100.00 chr19 46122009 46148405 CYP2B7P1 -2445 chr1 7430312 7430990... (7 Replies)
Discussion started by: a_bahreini
7 Replies

6. Shell Programming and Scripting

Looking to find files that are similar.

Hello all, I have a server that is running AIX, running a tool that converts various printstreams (AFP/Metadata) to PDF. This is done using a rexx script and an off the shelf utility. Each report (there's around 125) uses a certain script file, it's basically a text file. I am trying... (5 Replies)
Discussion started by: jeffs42885
5 Replies

7. Shell Programming and Scripting

concatenating similar files in a directory

Hi, I am new in unix. I have below requirement: I have two files at the same directory location File1.txt and File2.txt (just an example, real scenario we might have File2 and File3 OR File6 and File7....) File1.txt has : header1 record1 trailer1 File2.txt has: header2 record2... (4 Replies)
Discussion started by: Deepak62828r
4 Replies

8. Shell Programming and Scripting

Require compare command to compare 4 files

I have four files, I need to compare these files together. As such i know "sdiff and comm" commands but these commands compare 2 files together. If I use sdiff command then i have to compare each file with other which will increase the codes. Please suggest if you know some commands whcih can... (6 Replies)
Discussion started by: nehashine
6 Replies

9. Shell Programming and Scripting

Comparing similar columns in two different files

Hi, I have two text files.The first and the 2nd file have data in the same format For e.g. The first file has table_name1 column1 sum(column1) max(column1) min(column1) table_name1 column2 sum(column2) max(column2) min(column2) table_name1 coulmn3 sum(column3) max(column3) min(column3) ... (13 Replies)
Discussion started by: ragavhere
13 Replies

10. UNIX for Dummies Questions & Answers

Compare directories then move similar ones

I would like to know how to compare a listing of directories that begin with the same four numbers ie. /1234cat /1234tree /1234fish and move all these directories into one directory Thanks in advance (2 Replies)
Discussion started by: tgibson2
2 Replies
Login or Register to Ask a Question