Sponsored Content
Top Forums Shell Programming and Scripting Using AWK to match CSV files with duplicate patterns Post 302597415 by isuewing on Friday 10th of February 2012 08:49:50 AM
Old 02-10-2012
Thanks @ahamed101.

I did try grep -f, and there are two problems. I found that a pattern file with duplicate entries found unique matches, thus destroying the order of File1.csv, which I am trying to preserve. The other issue is that grep is notoriously inefficient for this task. For 173k patterns I would need to split File1.csv into chunks in a loop, and use each chunk to search against File2.csv. Even in this case, using grep to search >10k patterns begins to take several seconds. Other posts have profiled similar performance. While this second consideration is not a total deal-breaker, (a) I am going to have to perform a large number of these kinds of matches, (b) with bigger pattern files, awk is *fast*, so it would be great if I could find a more efficient solution.

-i
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed/awk help to match list of patterns and remove from org file

Hi, From the pattern mentioned below remove lines based on pattern range. Conditions 1 Look For all lines starting with ALTER TABLE and Ending with ; and contains the word MOVE.I wanto to remove these lines from the file sample below. Note : The above pattern list could be found in... (1 Reply)
Discussion started by: rajan_san
1 Replies

2. Shell Programming and Scripting

script to match patterns in 2 different files.

I am new to shell scripting and need some help. I googled, but couldn't find a similar scenario. Basically, I need to rename a datafile. This is the scenario - I have a file, readonly.txt that has 2 columns - file# and name. I have another file,missing_files.txt that has id and name. Both the... (3 Replies)
Discussion started by: mathews
3 Replies

3. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

4. Shell Programming and Scripting

Match paragraph between two patterns, delete the duplicate paragraphs

Hello all I have a file my DNS server where there are duplicate paragrapsh like below. How can I remove the duplicate paragraph so that only one paragraph remains. BEGIN; replace into domains (name,type) values ('225.168.192.in-addr.arpa','MASTER'); replace into records (domain_id,... (2 Replies)
Discussion started by: sb245
2 Replies

5. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

6. Shell Programming and Scripting

Match multiple patterns sequentially in order - grep or awk

Hello. grep v2.21 Debian 8 I wish to search for and output these patterns in order; "From " "To: " "Subject: " "Message-Id: " "Date: " "To: " grep works, but not in strict order... $ grep -a -E "^From |^Subject:|^From: |^Message-Id: |^Date: |^To: " InboxResult; From - Wed Feb 18... (10 Replies)
Discussion started by: DSommers
10 Replies

7. UNIX for Beginners Questions & Answers

Match duplicate ids in two files

I have two text files. File 1 has 150 ids but all the ids exists in duplicates so it has 300 ids in total. File 2 has 1500 ids but all exists in duplicates so file 2 has 300 ids in total. i want to match the first occurance of every id in file 1 with first occurance of thet id in file 2 and 2nd... (2 Replies)
Discussion started by: limd
2 Replies

8. Shell Programming and Scripting

awk pattern match by looping through search patterns

Hi I am using Solaris 5.10 & ksh Wanted to loop through a pattern file by reading it and passing it to the awk to match that value present in column 1 of rawdata.txt , if so print column 1 & 2 in to Avlblpatterns.txt. Using the following code but it seems some mistakes and it is running for... (2 Replies)
Discussion started by: ananan
2 Replies

9. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

10. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Hi, I need help to match patterns from between two different files and extract region of strings. inputfile1.fa >l-WR24-1:1 GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG GGCGGAGGGCGACGGCGGGTGGTGAGCGGCCCGGGAGGGGCCGGGCGGTGGGGTCACGTG... (4 Replies)
Discussion started by: bunny_merah19
4 Replies
ZGREP(1)						    BSD General Commands Manual 						  ZGREP(1)

NAME
zgrep, zegrep, zfgrep -- print lines matching a pattern in gzip-compressed files SYNOPSIS
zgrep [grep-flags] [--] pattern [files ...] zegrep [grep-flags] [--] pattern [file ...] zfgrep [grep-flags] [--] pattern [file ...] DESCRIPTION
zgrep runs grep(1) on files or stdin, if no files argument is given, after decompressing them with zcat(1). The grep-flags and pattern arguments are passed on to grep(1). If an -e flag is found in the grep-flags, zgrep will not look for a pattern argument. zegrep calls egrep(1), while zfgrep calls fgrep(1). EXIT STATUS
In case of missing arguments or missing pattern, 1 will be returned, otherwise 0. SEE ALSO
egrep(1), fgrep(1), grep(1), gzip(1), zcat(1) AUTHORS
Thomas Klausner <wiz@NetBSD.org> BSD
December 28, 2003 BSD
All times are GMT -4. The time now is 10:58 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy