Sponsored Content
Top Forums Shell Programming and Scripting Using AWK to match CSV files with duplicate patterns Post 302597415 by isuewing on Friday 10th of February 2012 08:49:50 AM
Old 02-10-2012
Thanks @ahamed101.

I did try grep -f, and there are two problems. I found that a pattern file with duplicate entries found unique matches, thus destroying the order of File1.csv, which I am trying to preserve. The other issue is that grep is notoriously inefficient for this task. For 173k patterns I would need to split File1.csv into chunks in a loop, and use each chunk to search against File2.csv. Even in this case, using grep to search >10k patterns begins to take several seconds. Other posts have profiled similar performance. While this second consideration is not a total deal-breaker, (a) I am going to have to perform a large number of these kinds of matches, (b) with bigger pattern files, awk is *fast*, so it would be great if I could find a more efficient solution.

-i
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed/awk help to match list of patterns and remove from org file

Hi, From the pattern mentioned below remove lines based on pattern range. Conditions 1 Look For all lines starting with ALTER TABLE and Ending with ; and contains the word MOVE.I wanto to remove these lines from the file sample below. Note : The above pattern list could be found in... (1 Reply)
Discussion started by: rajan_san
1 Replies

2. Shell Programming and Scripting

script to match patterns in 2 different files.

I am new to shell scripting and need some help. I googled, but couldn't find a similar scenario. Basically, I need to rename a datafile. This is the scenario - I have a file, readonly.txt that has 2 columns - file# and name. I have another file,missing_files.txt that has id and name. Both the... (3 Replies)
Discussion started by: mathews
3 Replies

3. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

4. Shell Programming and Scripting

Match paragraph between two patterns, delete the duplicate paragraphs

Hello all I have a file my DNS server where there are duplicate paragrapsh like below. How can I remove the duplicate paragraph so that only one paragraph remains. BEGIN; replace into domains (name,type) values ('225.168.192.in-addr.arpa','MASTER'); replace into records (domain_id,... (2 Replies)
Discussion started by: sb245
2 Replies

5. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

6. Shell Programming and Scripting

Match multiple patterns sequentially in order - grep or awk

Hello. grep v2.21 Debian 8 I wish to search for and output these patterns in order; "From " "To: " "Subject: " "Message-Id: " "Date: " "To: " grep works, but not in strict order... $ grep -a -E "^From |^Subject:|^From: |^Message-Id: |^Date: |^To: " InboxResult; From - Wed Feb 18... (10 Replies)
Discussion started by: DSommers
10 Replies

7. UNIX for Beginners Questions & Answers

Match duplicate ids in two files

I have two text files. File 1 has 150 ids but all the ids exists in duplicates so it has 300 ids in total. File 2 has 1500 ids but all exists in duplicates so file 2 has 300 ids in total. i want to match the first occurance of every id in file 1 with first occurance of thet id in file 2 and 2nd... (2 Replies)
Discussion started by: limd
2 Replies

8. Shell Programming and Scripting

awk pattern match by looping through search patterns

Hi I am using Solaris 5.10 & ksh Wanted to loop through a pattern file by reading it and passing it to the awk to match that value present in column 1 of rawdata.txt , if so print column 1 & 2 in to Avlblpatterns.txt. Using the following code but it seems some mistakes and it is running for... (2 Replies)
Discussion started by: ananan
2 Replies

9. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

10. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Hi, I need help to match patterns from between two different files and extract region of strings. inputfile1.fa >l-WR24-1:1 GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG GGCGGAGGGCGACGGCGGGTGGTGAGCGGCCCGGGAGGGGCCGGGCGGTGGGGTCACGTG... (4 Replies)
Discussion started by: bunny_merah19
4 Replies
fgrep(1)							   User Commands							  fgrep(1)

NAME
fgrep - search a file for a fixed-character string SYNOPSIS
/usr/bin/fgrep [-bchilnsvx] -e pattern_list [file...] /usr/bin/fgrep [-bchilnsvx] -f file [file...] /usr/bin/fgrep [-bchilnsvx] pattern [file...] /usr/xpg4/bin/fgrep [-bchilnqsvx] -e pattern_list [-f file] [file...] /usr/xpg4/bin/fgrep [-bchilnqsvx] [-e pattern_list] -f file [file...] /usr/xpg4/bin/fgrep [-bchilnqsvx] pattern [file...] DESCRIPTION
The fgrep (fast grep) utility searches files for a character string and prints all lines that contain that string. fgrep is different from grep(1) and from egrep(1) because it searches for a string, instead of searching for a pattern that matches an expression. fgrep uses a fast and compact algorithm. The characters $, *, [, ^, |, (, ), and are interpreted literally by fgrep, that is, fgrep does not recognize full regular expressions as does egrep. These characters have special meaning to the shell. Therefore, to be safe, enclose the entire string within single quotes (a'). If no files are specified, fgrep assumes standard input. Normally, each line that is found is copied to the standard output. The file name is printed before each line that is found if there is more than one input file. OPTIONS
The following options are supported for both /usr/bin/fgrep and /usr/xpg4/bin/fgrep: -b Precedes each line by the block number on which the line was found. This can be useful in locating block numbers by con- text. The first block is 0. -c Prints only a count of the lines that contain the pattern. -e pattern_list Searches for a string in pattern-list. This is useful when the string begins with a -. -f pattern-file Takes the list of patterns from pattern-file. -h Suppresses printing of files when searching multiple files. -i Ignores upper/lower case distinction during comparisons. -l Prints the names of files with matching lines once, separated by new-lines. Does not repeat the names of files when the pattern is found more than once. -n Precedes each line by its line number in the file. The first line is 1. -s Works silently, that is, displays nothing except error messages. This is useful for checking the error status. -v Prints all lines except those that contain the pattern. -x Prints only lines that are matched entirely. /usr/xpg4/bin/fgrep The following options are supported for /usr/xpg4/bin/fgrep only: -q Quiet. Does not write anything to the standard output, regardless of matching lines. Exits with zero status if an input line is selected. OPERANDS
The following operands are supported: file Specifies a path name of a file to be searched for the patterns. If no file operands are specified, the standard input will be used. /usr/bin/fgrep pattern Specifies a pattern to be used during the search for input. /usr/xpg4/bin/fgrep pattern Specifies one or more patterns to be used during the search for input. This operand is treated as if it were specified as -e pattern_list. USAGE
See largefile(5) for the description of the behavior of fgrep when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes). ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of fgrep: LC_COLLATE, LC_CTYPE, LC_MES- SAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 If any matches are found 1 If no matches are found 2 For syntax errors or inaccessible files, even if matches were found. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: /usr/bin/fgrep +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsu | +-----------------------------+-----------------------------+ /usr/xpg4/bin/fgrep +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWxcu4 | +-----------------------------+-----------------------------+ |CSI |Enabled | +-----------------------------+-----------------------------+ SEE ALSO
ed(1), egrep(1), grep(1), sed(1), sh(1), attributes(5), environ(5), largefile(5), XPG4(5) NOTES
Ideally, there should be only one grep command, but there is not a single algorithm that spans a wide enough range of space-time tradeoffs. Lines are limited only by the size of the available virtual memory. /usr/xpg4/bin/fgrep The /usr/xpg4/bin/fgrep utility is identical to /usr/xpg4/bin/grep -F (see grep(1)). Portable applications should use /usr/xpg4/bin/grep -F. SunOS 5.11 24 Mar 2006 fgrep(1)
All times are GMT -4. The time now is 07:33 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy