Sponsored Content
Top Forums Shell Programming and Scripting Using AWK to match CSV files with duplicate patterns Post 302597415 by isuewing on Friday 10th of February 2012 08:49:50 AM
Old 02-10-2012
Thanks @ahamed101.

I did try grep -f, and there are two problems. I found that a pattern file with duplicate entries found unique matches, thus destroying the order of File1.csv, which I am trying to preserve. The other issue is that grep is notoriously inefficient for this task. For 173k patterns I would need to split File1.csv into chunks in a loop, and use each chunk to search against File2.csv. Even in this case, using grep to search >10k patterns begins to take several seconds. Other posts have profiled similar performance. While this second consideration is not a total deal-breaker, (a) I am going to have to perform a large number of these kinds of matches, (b) with bigger pattern files, awk is *fast*, so it would be great if I could find a more efficient solution.

-i
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed/awk help to match list of patterns and remove from org file

Hi, From the pattern mentioned below remove lines based on pattern range. Conditions 1 Look For all lines starting with ALTER TABLE and Ending with ; and contains the word MOVE.I wanto to remove these lines from the file sample below. Note : The above pattern list could be found in... (1 Reply)
Discussion started by: rajan_san
1 Replies

2. Shell Programming and Scripting

script to match patterns in 2 different files.

I am new to shell scripting and need some help. I googled, but couldn't find a similar scenario. Basically, I need to rename a datafile. This is the scenario - I have a file, readonly.txt that has 2 columns - file# and name. I have another file,missing_files.txt that has id and name. Both the... (3 Replies)
Discussion started by: mathews
3 Replies

3. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

4. Shell Programming and Scripting

Match paragraph between two patterns, delete the duplicate paragraphs

Hello all I have a file my DNS server where there are duplicate paragrapsh like below. How can I remove the duplicate paragraph so that only one paragraph remains. BEGIN; replace into domains (name,type) values ('225.168.192.in-addr.arpa','MASTER'); replace into records (domain_id,... (2 Replies)
Discussion started by: sb245
2 Replies

5. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

6. Shell Programming and Scripting

Match multiple patterns sequentially in order - grep or awk

Hello. grep v2.21 Debian 8 I wish to search for and output these patterns in order; "From " "To: " "Subject: " "Message-Id: " "Date: " "To: " grep works, but not in strict order... $ grep -a -E "^From |^Subject:|^From: |^Message-Id: |^Date: |^To: " InboxResult; From - Wed Feb 18... (10 Replies)
Discussion started by: DSommers
10 Replies

7. UNIX for Beginners Questions & Answers

Match duplicate ids in two files

I have two text files. File 1 has 150 ids but all the ids exists in duplicates so it has 300 ids in total. File 2 has 1500 ids but all exists in duplicates so file 2 has 300 ids in total. i want to match the first occurance of every id in file 1 with first occurance of thet id in file 2 and 2nd... (2 Replies)
Discussion started by: limd
2 Replies

8. Shell Programming and Scripting

awk pattern match by looping through search patterns

Hi I am using Solaris 5.10 & ksh Wanted to loop through a pattern file by reading it and passing it to the awk to match that value present in column 1 of rawdata.txt , if so print column 1 & 2 in to Avlblpatterns.txt. Using the following code but it seems some mistakes and it is running for... (2 Replies)
Discussion started by: ananan
2 Replies

9. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

10. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Hi, I need help to match patterns from between two different files and extract region of strings. inputfile1.fa >l-WR24-1:1 GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG GGCGGAGGGCGACGGCGGGTGGTGAGCGGCCCGGGAGGGGCCGGGCGGTGGGGTCACGTG... (4 Replies)
Discussion started by: bunny_merah19
4 Replies
CSV(3pm)						User Contributed Perl Documentation						  CSV(3pm)

NAME
Class::CSV - Class based CSV parser/writer SYNOPSIS
use Class::CSV; my $csv = Class::CSV->parse( filename => 'test.csv', fields => [qw/item qty sub_total/] ); foreach my $line (@{$csv->lines()}) { $line->sub_total('$'. sprintf("%0.2f", $line->sub_total())); print 'Item: '. $line->item(). " ". 'Qty: '. $line->qty(). " ". 'SubTotal: '. $line->sub_total(). " "; } my $cvs_as_string = $csv->string(); $csv->print(); my $csv = Class::CSV->new( fields => [qw/userid username/], line_separator => " "; ); $csv->add_line([2063, 'testuser']); $csv->add_line({ userid => 2064, username => 'testuser2' }); DESCRIPTION
This module can be used to create objects from CSV files, or to create CSV files from objects. Text::CSV_XS is used for parsing and creating CSV file lines, so any limitations in Text::CSV_XS will of course be inherant in this module. EXPORT None by default. METHOD
CONSTRUCTOR parse the parse constructor takes a hash as its paramater, the various options that can be in this hash are detailed below. Required Options o fields - an array ref containing the list of field names to use for each row. there are some reserved words that cannot be used as field names, there is no checking done for this at the moment but it is something to be aware of. the reserved field names are as follows: "string", "set", "get". also field names cannot contain whitespace or any characters that would not be allowed in a method name. Source Options (only one of these is needed) o filename - the path of the CSV file to be opened and parsed. o filehandle - the file handle of the CSV file to be parsed. o objects - an array ref of objects (e.g. Class::DBI objects). for this to work properly the field names provided in fields needs to correspond to the field names of the objects in the array ref. o classdbi_objects - depreciated use objects instead - using classdbi_objects will still work but its advisable to update your code. Optional Options o line_separator - the line seperator to be included at the end of every line. defaulting to " " (unix carriage return). new the new constructor takes a hash as its paramater, the same options detailed in parse apply to new however no Source Options can be used. this constructor creates a blank CSV object of which lines can be added via add_line. ACCESSING lines returns an array ref containing objects of each CSV line (made via Class::Accessor). the field names given upon construction are available as accessors and can be set or get. for more information please see the notes below or the perldoc for Class::Accessor. the lines accessor is also able to be updated/retrieved in the same way as individual lines fields (examples below). Example retrieving the lines: my @lines = @{$csv->lines()}; removing the first line: pop @lines; $csv->lines(@lines); sorting the lines: @lines = sort { $a->userid() <=> $b->userid() } @lines: $csv->lines(@lines); sorting the lines (all-in-one way): $csv->lines([ sort { $a->userid() <=> $b->userid() } @{$csv->lines()} ]); Retrieving a fields value there is two ways to retrieve a fields value (as documented in Class::Accessor). firstly you can call the field name on the object and secondly you can call "get" on the object with the field name as the argument (multiple field names can be specified to retrieve an array of values). examples are below. my $value = $line->test(); OR my $value = $line->get('test'); OR my @values = $line->get(qw/test test2 test3/); Setting a fields value setting a fields value is simmilar to getting a fields value. there are two ways to set a fields value (as documented in Class::Accessor). firstly you can simply call the field name on the object with the value as the argument or secondly you can call "set" on the object with a hash of fields and their values to set (this isn't standard in Class::Accessor, i have overloaded the "set" method to allow this). examples are below. $line->test('123'); OR $line->set( test => '123' ); OR $line->set( test => '123', test2 => '456' ); Retrieving a line as a string to retrieve a line as a string simply call "string" on the object. my $string = $line->string(); new_line returns a new line object, this can be useful for to "splice" a line into lines (see example below). you can pass the values of the line as an ARRAY ref or a HASH ref. Example my $line = $csv->new_line({ userid => 123, domainname => 'splicey.com' }); my @lines = $csv->lines(); splice(@lines, 1, 0, $line); OR splice(@{$csv->lines()}, 1, 0, $csv->new_line({ userid => 123, domainname => 'splicey.com' })); add_line adds a line to the lines stack. this is mainly useful when the new constructor is used but can of course be used with any constructor. it will add a new line to the end of the lines stack. you can pass the values of the line as an ARRAY ref or a HASH ref. examples of how to use this are below. Example $csv->add_line(['house', 100000, 4]); $csv->add_line({ item => 'house', cost => 100000, bedrooms => 4 }); OUTPUT string returns the object as a string (CSV file format). print calls "print" on string (prints the CSV to STDOUT). SEE ALSO
Text::CSV_XS, Class::Accessor AUTHOR
David Radunz, <david@boxen.net> COPYRIGHT AND LICENSE
Copyright 2004 by David Radunz This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.0 2007-02-08 CSV(3pm)
All times are GMT -4. The time now is 04:00 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy