Sponsored Content
Top Forums Shell Programming and Scripting Using AWK to match CSV files with duplicate patterns Post 302597294 by isuewing on Thursday 9th of February 2012 11:54:00 PM
Old 02-10-2012
Using AWK to match CSV files with duplicate patterns

Dear awk users,

I am trying to use awk to match records across two moderately large CSV files. File1 is a pattern file with 173,200 lines, many of which are repeated. The order in which these lines are displayed is important, and I would like to preserve it. File2 is a data file with 456,000 unique lines.

File1.csv:
Code:
_Year01
23_01_192001
23_02_192001
23_01_192001
23_04_192001
23_05_192001
23_03_192001
23_02_192001
23_03_192001
23_05_192001
23_04_192001
23_05_192001
23_05_192001
23_06_192001
_192001
25_01_192001
25_02_192001
...

File2.csv:
Code:
23,01,192001,0.09,23.40,-0.79,0.,1252.,23_01_192001
23,03,192001,0.79,28.30,-0.63,0.,1110.,23_03_192001
23,04,192002,0.15,37.40,-0.98,0.,748.,23_04_192002
23,06,192002,1.42,38.70,2.78,0.,720.,23_06_192002
23,03,192002,0.54,34.30,-1.05,0.,832.,23_03_192002
23,02,192002,0.54,31.50,-1.04,0.,918.,23_02_192002
23,01,192002,0.77,30.60,-0.82,0.,935.,23_01_192002
23,05,192002,0.65,36.30,-1.00,0.,784.,23_05_192002
23,04,192003,5.18,45.10,0.58,6.,595.,23_04_192003
23,02,192003,5.24,42.30,0.94,0.,682.,23_02_192003

I want to extract the lines in File2 for which column 9 matches the alphanumeric key in File 1, preserving the key order in the latter file. After reading several posts on this and other forums, I tried:
Code:
awk -F\, 'FNR==NR{a[$1]=$9;next}{print $0, a[$1]}' File1.csv File2.csv

which just prints out File2.csv, and

Code:
awk -F\, 'FNR==NR{a[$1]=$9;next}$9 in a' File1.csv File2.csv


which generates results in the wrong order.

I realize this type of question is a simple and well-documented. I thought I had correctly grasped the logic of the program, but apparently I have not. I would be very grateful for pointers as to where I am going wrong.

-i


Last edited by Franklin52; 02-10-2012 at 04:03 AM.. Reason: Please use code tags for code and data samples, thank you
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed/awk help to match list of patterns and remove from org file

Hi, From the pattern mentioned below remove lines based on pattern range. Conditions 1 Look For all lines starting with ALTER TABLE and Ending with ; and contains the word MOVE.I wanto to remove these lines from the file sample below. Note : The above pattern list could be found in... (1 Reply)
Discussion started by: rajan_san
1 Replies

2. Shell Programming and Scripting

script to match patterns in 2 different files.

I am new to shell scripting and need some help. I googled, but couldn't find a similar scenario. Basically, I need to rename a datafile. This is the scenario - I have a file, readonly.txt that has 2 columns - file# and name. I have another file,missing_files.txt that has id and name. Both the... (3 Replies)
Discussion started by: mathews
3 Replies

3. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

4. Shell Programming and Scripting

Match paragraph between two patterns, delete the duplicate paragraphs

Hello all I have a file my DNS server where there are duplicate paragrapsh like below. How can I remove the duplicate paragraph so that only one paragraph remains. BEGIN; replace into domains (name,type) values ('225.168.192.in-addr.arpa','MASTER'); replace into records (domain_id,... (2 Replies)
Discussion started by: sb245
2 Replies

5. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

6. Shell Programming and Scripting

Match multiple patterns sequentially in order - grep or awk

Hello. grep v2.21 Debian 8 I wish to search for and output these patterns in order; "From " "To: " "Subject: " "Message-Id: " "Date: " "To: " grep works, but not in strict order... $ grep -a -E "^From |^Subject:|^From: |^Message-Id: |^Date: |^To: " InboxResult; From - Wed Feb 18... (10 Replies)
Discussion started by: DSommers
10 Replies

7. UNIX for Beginners Questions & Answers

Match duplicate ids in two files

I have two text files. File 1 has 150 ids but all the ids exists in duplicates so it has 300 ids in total. File 2 has 1500 ids but all exists in duplicates so file 2 has 300 ids in total. i want to match the first occurance of every id in file 1 with first occurance of thet id in file 2 and 2nd... (2 Replies)
Discussion started by: limd
2 Replies

8. Shell Programming and Scripting

awk pattern match by looping through search patterns

Hi I am using Solaris 5.10 & ksh Wanted to loop through a pattern file by reading it and passing it to the awk to match that value present in column 1 of rawdata.txt , if so print column 1 & 2 in to Avlblpatterns.txt. Using the following code but it seems some mistakes and it is running for... (2 Replies)
Discussion started by: ananan
2 Replies

9. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

10. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Hi, I need help to match patterns from between two different files and extract region of strings. inputfile1.fa >l-WR24-1:1 GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG GGCGGAGGGCGACGGCGGGTGGTGAGCGGCCCGGGAGGGGCCGGGCGGTGGGGTCACGTG... (4 Replies)
Discussion started by: bunny_merah19
4 Replies
ifpps(8)							netsniff-ng-toolkit							  ifpps(8)

NAME
ifpps - fetch and format kernel network statistics SYNOPSIS
ifpps -d|--dev <netdev> [-t|--interval <sec>][-p|--promisc][-c|--term] [-C|--csv][-H|--csv-tablehead][-l|--loop][-v|--version][-h|--help] DESCRIPTION
A tiny tool to provide top-like reliable networking statistics. ifpps reads out the 'real' kernel statistics, so it does not give erroneous statistics on high I/O load. OPTIONS
ifpps --dev eth0 Fetch eth0 interface statistics. ifpps --dev eth0 --interval 60 --csv Output eth0 interface statistics every minute in CSV format. OPTIONS
-h|--help Print help text and lists all options. -v|--version Print version. -d|--dev <netdev> Device to fetch statistics for i.e., eth0. -p|--promisc Put the device in promiscuous mode -t|--interval <time> Refresh time in sec (default 1 sec) -c|--term Output to terminal -C|--csv Output in CSV format. E.g. post-processing with Gnuplot et al. -H|--csv-tablehead Print CSV table head. -l|--loop Loop terminal output. AUTHOR
Written by Daniel Borkmann <daniel@netsniff-ng.org> DOCUMENTATION
Documentation by Emmanuel Roullit <emmanuel@netsniff-ng.org> BUGS
Please report bugs to <bugs@netsniff-ng.org> 2012-06-29 ifpps(8)
All times are GMT -4. The time now is 10:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy