Find text that is different in two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find text that is different in two files
# 1  
Old 05-27-2015
Find text that is different in two files

In the attached files, I am trying to use import.txt to find what is missing in all.txt and print the missing lines in missing.txt. I used SQL to import a list into a database and got errors and need to figure out what didn't import correctly. The below script is close, I think, but doesn't result in the desired output (all the lines that do not have a match with import.txt)

So if the text in import.txt is in all.txt that line is not printed, however if the text in import.txt is not in all.txt, then the entire line is printed in missing.txt. Thank you Smilie.


Code:
 awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) print k }' import.txt all.txt > missing.txt

Also tried:

Code:
 diff import.txt all.txt | perl -lne 'if(/^[<>]/){s/^..//g;print}' > missing.txt


Last edited by cmccabe; 05-27-2015 at 01:26 PM..
# 2  
Old 05-27-2015
cmccabe,
I'm having trouble making out the lines on the smaller file you attached (the other one is 87.7mb). Can you attach smaller exmaple files? Also, can you show your expected results and the undesired results you are getting?
If you are comparing the entire lines on the files (e.g. not matching on specific keys) and are only looking for lines in import.txt not in all.txt, another way you can try is the below if files are sorted (or you can pre-sort):
Code:
comm -23 import.txt all.txt > missing.txt

Re: using SQL to import into a database and you need to figure out what didn't import correctly, are you not capturing the records that failed to load at that point? e.g. using Oracle SQL Loader and a .bad file, you can store the records that failed to load during the insert.
# 3  
Old 05-27-2015
I have attached smaller files of each. Basically, the desired output.txt would be all the lines that do not match import.txt (should be 6 out of the 10) - All the PXL- do not match so they are written to output.txt. Thank you Smilie.

import.txt
Code:
ADAMTS10E10for
ADAMTS10E10rev
ADAMTS10E11for
ADAMTS10E11rev
ADAMTS10E20for
ADAMTS10E20rev

all.txt
Code:
NULL	NULL	NULL	NULL	20152005	630	admin	Imported	PXL-A0285435ref	26950850	NULL	Y	NULL	37	NULL	NULL	NULL	pxlence	SeqRxn4
NULL	NULL	NULL	NULL	20152005	630	admin	Imported	PXL-A0285435antiref	26951039	NULL	Y	NULL	37	NULL	NULL	NULL	pxlence	SeqRxn4
NULL	NULL	NULL	NULL	20152005	630	admin	Imported	PXL-A0285441ref	26980056	NULL	Y	NULL	37	NULL	NULL	NULL	pxlence	SeqRxn4
NULL	NULL	NULL	NULL	20152005	630	admin	Imported	PXL-A0285441antiref	26980301	NULL	Y	NULL	37	NULL	NULL	NULL	pxlence	SeqRxn4
NULL	NULL	NULL	NULL	20152005	630	admin	Imported	PXL-A0285472ref	27190068	NULL	Y	NULL	37	NULL	NULL	NULL	pxlence	SeqRxn4
NULL	NULL	NULL	NULL	20152005	630	admin	Imported	PXL-A0285472antiref	27190236	NULL	Y	NULL	37	NULL	NULL	NULL	pxlence	SeqRxn4
NULL	NULL	NULL	NULL	20141009	630	admin	Imported	ADAMTS10E10for	8661383	8661400	19	CGCCTATGAAGGCAGTGG	37	20130823	20130903	20160901	GC rich region - no M13 primers	SeqRxn2
NULL	NULL	NULL	NULL	20141009	630	admin	Imported	ADAMTS10E10rev	8661119	8661101	19	AATCTGGGGAAAGGGGTGT	37	20130823	20130903	20160901	GC rich region - no M13 primers	SeqRxn2
NULL	NULL	NULL	NULL	20141009	630	admin	Imported	ADAMTS10E11for	8661258	8661276	19	ATGTGTGAGCGCGAGAGAA	37	20131007	20131007	20161001	GC rich region - no M13 primers	SeqRxn2
NULL	NULL	NULL	NULL	20141009	630	admin	Imported	ADAMTS10E11rev	8660932	8660914	19	ATGAGTGTGACCCGCTCTG	37	20131007	20131007	20161001	GC rich region - no M13 primers	SeqRxn2

The undesired output is also attached from:

Code:
 awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) print k }' import.txt all.txt > missing.txt

# 4  
Old 05-27-2015
Try sometrhing like this:

Code:
$nawk 'FNR==NR{map[$1]=$0;next} !map[$18]' import.txt all.txt       
1       1       20152005        630     admin   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    20152005        630     admin   Imported        PXL-A0285435ref 26950850        NULL    Y       NULL    37      NULL    NULL    NULL    pxlence SeqRxn4
1       1       20152005        630     admin   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    20152005        630     admin   Imported        PXL-A0285435antiref     26951039        NULL    Y       NULL    37      NULL    NULL    NULL    pxlence SeqRxn4
1       1       20152005        630     admin   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    20152005        630     admin   Imported        PXL-A0285441ref 26980056        NULL    Y       NULL    37      NULL    NULL    NULL    pxlence SeqRxn4
1       1       20152005        630     admin   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    20152005        630     admin   Imported        PXL-A0285441antiref     26980301        NULL    Y       NULL    37      NULL    NULL    NULL    pxlence SeqRxn4
1       1       20152005        630     admin   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    20152005        630     admin   Imported        PXL-A0285472ref 27190068        NULL    Y       NULL    37      NULL    NULL    NULL    pxlence SeqRxn4
1       1       20152005        630     admin   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    20152005        630     admin   Imported        PXL-A0285472antiref     27190236        NULL    Y       NULL    37      NULL    NULL    NULL    pxlence SeqRxn4

This User Gave Thanks to mjf For This Post:
# 5  
Old 05-27-2015
Works perfect.... thank you Smilie.
# 6  
Old 05-27-2015
Good, you're welcome. But have you looked at identifying these records (or the next ones that you might load) that failed at insert time? Or perhaps this was a one time load and not something you will repeat again?
This User Gave Thanks to mjf For This Post:
# 7  
Old 05-27-2015
This was a one time load, that, hopefully, will not be repeated again. I think the error "timed-out" and only completed half of the files. But I had noo idea which ones until now.... thanks again Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Linux

Search only text files with 'find' command?

I've been using this to search an entire directory recursively for a specific phrase in my code (html, css, php, javascript, etc.): find dir_name -type f -exec grep -l "phrase" {} \; The problem is that it searches ALL files in the directory 'dir_name', even binary ones such as large JPEG... (2 Replies)
Discussion started by: Collider
2 Replies

2. Shell Programming and Scripting

Find and replace using 2 text files as arrays.

Here's the nonfunctional code I have so far #!/bin/bash searchFor=(`cat filea.txt` ) replaceWith=(`cat fileb.txt`) myMax=${#searchFor} myCounter=1 while ; do sed -i 's/${$searchFor}/${$replaceWith}/g' done The goal is to use each line in filea.txt as a search term, and each line... (2 Replies)
Discussion started by: Erulisseuiin
2 Replies

3. Shell Programming and Scripting

How to find text in files without using the word itself but the assigned variable of it

I'm having a problem how to find the specific word in a file without using the word itself as a search but using the assigned variable which is the $passwd.. what command should I use to find the value of $passwd written in different script? how do I use the command to print the value in this... (7 Replies)
Discussion started by: jenimesh19
7 Replies

4. Shell Programming and Scripting

Find and add/replace text in text files

Hi. I would like to have experts help on below action. I have text files in which page nubmers exists in form like PAGE : 1 PAGE : 2 PAGE : 3 and so on there is other text too. I would like to know is it possible to check the last occurance of Page... (6 Replies)
Discussion started by: lodhi1978
6 Replies

5. Shell Programming and Scripting

Find text containing paths and replace with a string in all the python files

I have 100+ python files in a single directory. I need to replace a specific path occurrence with a variable name. Following are the find and the replace strings: Findstring--"projects\\Debugger\\debugger_dp8051_01\\debugger_dp8051_01.cywrk" Replacestring--self.projpath I tried... (5 Replies)
Discussion started by: noorsam
5 Replies

6. Shell Programming and Scripting

Bash snippet to find files based on a text file?

Evening all. I'm having a terrible time with a script I've been working on for a few days now... Say I have a text file named top10song.tm2, with the following in it: kernkraft 400 Imagine i kissed a girl Thriller animals hallelujah paint it black psychosocial Oi to the world... (14 Replies)
Discussion started by: DJ Charlie
14 Replies

7. UNIX for Dummies Questions & Answers

sorting files with find command before sending to text file

i need help with my script.... i am suppose to grab files within a certain date range now i have done that already using the touch and find command (found them in other threads) touch -d "$date_start" ./tmp1 touch -d "$date_end" ./tmp2 find "$data_location" -maxdepth 1 -newer ./tmp1 !... (6 Replies)
Discussion started by: deking
6 Replies

8. Shell Programming and Scripting

find files where text case is different

I need to search a directory for files that have certain text in the file name. I use the following command to do that successfully - find /abc/indicator -name '*midday*.ind' The problem is some file names are lower case, some mixed case and some upper case. Is there a way to do the find... (5 Replies)
Discussion started by: schipper
5 Replies

9. UNIX for Dummies Questions & Answers

How to find a text in jar and zip files.??

Hi, I have classes dir, in that I have jar and zip files, I need to find "Param.class" is in which zip or jar file? (1 Reply)
Discussion started by: redlotus72
1 Replies

10. UNIX for Dummies Questions & Answers

Find files containing text

How do I find the files containing some text. eg. I want to find alll the files that contain the word 'hello' grep hello * will give me only for the specific directory. How do I find for entire system. Thanks for help in advance.. (5 Replies)
Discussion started by: sushrut
5 Replies
Login or Register to Ask a Question