Text match in two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Text match in two files
# 1  
Old 02-07-2015
Text match in two files

Trying to match the text from file1 to file2 and print what matches in a new file (match.txt) and what does not in another (missing.txt).

Code:
 awk -F'|' 'NR==FNR{c[$1$2]++;next};c[$1$2] > 0' flugent.txt IDT.txt > match.txt

Thank you Smilie.
# 2  
Old 02-07-2015
Something like the following might do for an overview:
Code:
sed 's/,[ ]*/\n/g' flugent.txt |comm - IDT.txt

Or
Code:
sed 's/,[ ]*/\n/g' flugent.txt |comm -12 - IDT.txt

for the matching list.
This User Gave Thanks to Walter Misar For This Post:
# 3  
Old 02-07-2015
Why |? Your first input file has only ,
With awk you best use RS (record separator) to split into lines.
Code:
awk 'NR==FNR{c[$0]; next} ($0 in c)' RS="," flugent.txt RS="\n" IDT.txt > match.txt
awk 'NR==FNR{c[$0]; next} !($0 in c)' RS="," flugent.txt RS="\n" IDT.txt > missing.txt

This User Gave Thanks to MadeInGermany For This Post:
# 4  
Old 02-07-2015
Thank you Smilie
# 5  
Old 02-07-2015
Building on MadeInGermany's proposal try
Code:
awk 'NR==FNR{c[$0]; next} ($0 in c) {print >"match.txt"; next}    {print > "missing.txt"' RS="," flugent.txt RS="\n" IDT.txt

This User Gave Thanks to RudiC For This Post:
# 6  
Old 02-07-2015
I get the below error:

Code:
 awk 'NR==FNR{c[$0]; next} ($0 in c) {print >"match.txt"; next}    {print > "missing.txt"' RS="," flugent.txt RS="\n" IDT.txt
awk: cmd. line:1: NR==FNR{c[$0]; next} ($0 in c) {print >"match.txt"; next}    {print > "missing.txt"
awk: cmd. line:1:                                                                                    ^ unexpected newline or end of string

Thank you Smilie.

---------- Post updated at 01:15 PM ---------- Previous update was at 01:14 PM ----------

Never mind I forgot I removed some files (IDT.txt and flugent.txt)
# 7  
Old 02-07-2015
Hi.

Minor alternate change to read file just once:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate special case of matching to separate files, awk.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C pll specimen awk

pl " Data files data[12]:"
pll data1
pe
specimen 3 data2

pl " Results:"
awk 'NR==FNR{c[$0]; next} { if ($0 in c) {print > "match.txt"} else {print} } ' RS="," data1 RS="\n" data2 > missing.txt

wc match.txt missing.txt
pe
specimen 3 match.txt missing.txt

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
pll (local) 1.24
specimen (local) 1.17
awk GNU Awk 3.1.5

-----
 Data files data[12]:
 (Longest line: 28129; fit into lines of length 78)
         1         2         3     ...9        2810        2811        2812
12345678901234567890123456789012345...3456789012345678901234567890123456789
A2M,A4GALT,A4GNT,AAAS,AADAC,AADACL2...F750,ZNF75D,ZNF804A,ZNF81,ZNHIT6,ZPBP

Edges: 3:0:3 of 4631 lines in file "data2"
A2M
A4GALT
A4GNT
   ---
ZNF81
ZPBP
ZPBP2

-----
 Results:
 3863  3863 23201 match.txt
  768   768  4837 missing.txt
 4631  4631 28038 total

Edges: 3:0:3 of 3863 lines in file "match.txt"
A2M
A4GALT
A4GNT
   ---
ZNF750
ZNF75D
ZNF81

Edges: 3:0:3 of 768 lines in file "missing.txt"
AAGAB
ABL1
ACAD10
   ---
ZNF80
ZPBP
ZPBP2

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

Match text to lines in a file, iterate backwards until text or text substring matches, print to file

hi all, trying this using shell/bash with sed/awk/grep I have two files, one containing one column, the other containing multiple columns (comma delimited). file1.txt abc12345 def12345 ghi54321 ... file2.txt abc1,text1,texta abc,text2,textb def123,text3,textc gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies

3. Shell Programming and Scripting

Bash to add portion of text to files in directory using numerical match

In the below bash I am trying to rename eachof the 3 text files in /home/cmccabe/Desktop/percent by matching the numerical portion of each file to lines 3,4, or 5 in /home/cmccabe/Desktop/analysis.txt. There will always be a match between the files. When a match is found each text file in... (2 Replies)
Discussion started by: cmccabe
2 Replies

4. Shell Programming and Scripting

awk to match field between two files and use conditions on match

I am trying to look for $2 of file1 (skipping the header) in $2 of file2 (skipping the header) and if they match and the value in $10 is > 30 and $11 is > 49, then print the line from file1 to a output file. If no match is foung the line is not printed. Both the input and output are tab-delimited.... (3 Replies)
Discussion started by: cmccabe
3 Replies

5. Shell Programming and Scripting

Display match or no match and write a text file to a directory

The below bash connects to a site, downloads a file, searches that file based of user input - could be multiple (all that seems to work). What I am not able to figure out is how to display on the screen match found or no match found" and write a file to a directory (C:\Users\cmccabe\Desktop\wget)... (4 Replies)
Discussion started by: cmccabe
4 Replies

6. UNIX for Dummies Questions & Answers

Search String, Out matched text and input text for no match.

I need to search a string for some specific text which is no big deal using grep. My problem is when the search fails to find the text. I need to add text like "na" when my search does not match. I have tried this command but it does not work when I put the command in a loop in a bash script: ... (12 Replies)
Discussion started by: jojojmac5
12 Replies

7. UNIX for Dummies Questions & Answers

Match values/IDs from column and text files

Hello, I am trying to modify 2 files, to yield results in a 3rd file. File-1 is a 8-columned file, separted with tab. 1234:1 xyz1234 blah blah blah blah blah blah 1234:1 xyz1233 blah blah blah blah blah blah 1234:1 abc1234 blah blah blah blah blah blah n/a RRR0000 blah blah blah... (1 Reply)
Discussion started by: ad23
1 Replies

8. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

9. Shell Programming and Scripting

match text from two files and write to a third file

Hi all I have two files X.txt and Y.txt. Both file contains same number of sentences. The content of X.txt is The filter described above may be combined. and the content of Y.txt is The filter describ+ed above may be combin+ed. Some of the words are separated with "+"... (2 Replies)
Discussion started by: my_Perl
2 Replies

10. UNIX for Dummies Questions & Answers

Replace text in match files

Hi all, I want to replace text 'DEF' for those files containing text 'ABC'. I can only locate the files containing text 'ABC', but I don't know how to replace the text 'ABC' with 'DEF'. I use the following command to locate the files containing 'ABC' find . -exec grep -l 'ABC'... (1 Reply)
Discussion started by: wilsonchan1000
1 Replies
Login or Register to Ask a Question