Can awk do lookups to other files and process results
I know that 'brute-force' scripting could accomplish this with lots of cat/echo/cut/grep and more. But, because my real file has 800k records, and the matching files have 10-20k records, this is not time-possible or efficient.
I have input file:
Code:
> cat file_in
1234567890123456789012345678901234567890
Joe 123456 30 Main St 1234 F
Jim 101362 1492 Hugh 0101 P
Kerry 040419 6091 Lost St 0101 F
Linda 123456 50 High Way 1235
Matt 242424 48 Speedway Dr4343 F
Kerrin180118 99 Skaters Way2012 P *
(you can ignore the first line - just a help since a fixed record file)
(tail +2 file_in skips over this line during testing)
Begin by only reviewing records where position 40 is blank = still need to process.
Want to see those records that cannot be processed because (a) the data in columns 7-12 does not exist in the following file:
which would create an output file, but only records I want to even consider that are not yet marked as processed. So, yes I intend to start with 6 records and make a file of 5 records. I now need to add those two codes at position 39 when appropriate.
Edit: actually you should jump to the second example. The first assumes that posision 39 is always empty.
The code below sets 1 for Linda, because she's not present in the example file_cd1:
Code:
awk 'NR == FNR { cd1[$1]; next }
f { cd2[$1] = $2; next }
!f && / $/ {
if (!(substr($0, 7, 6) in cd1)) sub(/. $/, "1 ")
if ((substr($0, 29, 4) in cd2) && cd2[substr($0, 29, 4)] != "abc")
sub(/ $/, "2 ")
}1' file_cd1 f=1 file_cd2 f=0 file_in
An example:
Code:
% awk 'NR == FNR { cd1[$1]; next }
f { cd2[$1] = $2; next }
!f && / $/ {
if (!(substr($0, 7, 6) in cd1)) sub(/. $/, "1 ")
if ((substr($0, 29, 4) in cd2) && cd2[substr($0, 29, 4)] != "abc")
sub(/ $/, "2 ")
}1' file_cd1 f=1 file_cd2 f=0 file_in
1234567890123456789012345678901234567890
Joe 123456 30 Main St 1234 F 1
Jim 101362 1492 Hugh 0101 P
Kerry 040419 6091 Lost St 0101 F
Linda 123456 50 High Way 1235 1
Matt 242424 48 Speedway Dr4343 F 2
Kerrin180118 99 Skaters Way2012 P *
If you want the second test to have precedence:
Code:
awk 'NR == FNR { cd1[$1]; next }
f { cd2[$1] = $2; next }
!f && / $/ {
if (!(substr($0, 7, 6) in cd1)) sub(/. $/, "1 ")
if ((substr($0, 29, 4) in cd2) && cd2[substr($0, 29, 4)] != "abc")
sub(/. $/, "2 ")
}1' file_cd1 f=1 file_cd2 f=0 file_in
For example:
Code:
% awk 'NR == FNR { cd1[$1]; next }
quote> f { cd2[$1] = $2; next }
quote> !f && / $/ {
quote> if (!(substr($0, 7, 6) in cd1)) sub(/. $/, "1 ")
quote> if ((substr($0, 29, 4) in cd2) && cd2[substr($0, 29, 4)] != "abc")
quote> sub(/. $/, "2 ")
quote> }1' file_cd1 f=1 file_cd2 f=0 file_in
1234567890123456789012345678901234567890
Joe 123456 30 Main St 1234 F 1
Jim 101362 1492 Hugh 0101 P
Kerry 040419 6091 Lost St 0101 F
Linda 123456 50 High Way 1235 2
Matt 242424 48 Speedway Dr4343 F 2
Kerrin180118 99 Skaters Way2012 P *
Last edited by radoulov; 10-24-2008 at 06:20 AM..
Reason: correction
Okay. Use associative arrays. This gives you three files one.txt with a "1" two.txt three.txt which are intermediate and then bad.txt which is still just blank in col 39 & 40.
After re-reading your post and Jim's comments I'm not sure if you prefer to generate multiple files (good - bad records) or an output like the one I posted.
I would prefer all data - good and bad records - stored to one file.
While reading through my 'sed & awk' book, the idea of arrays did jump out to me. I am going to have to sit and read through the examples to understand how they work.
I am using c shell and trying to compare 2 files using awk . But the below awk statement doesnt give any result. Pls. advise why am not getting the desired o/p with the corrected awk script.
Need to acheive this solution in awk using C shell.
awk 'FNR==NR{a++;next}
{for(i in a)
{if ( a=$0... (8 Replies)
Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk.
I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside.
For example:
a sample_1 200
a.b sample_2 10
a sample_3 10
a sample_1 10
a... (4 Replies)
Hi Experts,
I am facing one problem here which is one process always stuck in running state which causes the other similar process to sleep state . This causes my system in hanged state.
On doing cat /proc/<pid>wchan showing the "__init_begin" in the output.
Can you please help me here... (6 Replies)
Hi Experts,
I am facing one problem here which is one process always stuck in running state which causes the other similar process to sleep state . This causes my system in hanged state.
On doing cat /proc/<pid>wchan showing the "__init_begin" in the output.
Can you please help me here... (1 Reply)
Hi Experts,
I am facing one problem here which is one process always stuck in running state which causes the other similar process to sleep state . This causes my system in hanged state.
On doing cat /proc/<pid>wchan showing the "__init_begin" in the output.
Can you please help me here... (0 Replies)
Hi to all,
I have thousand of files in a folder with names with format "FILE-YYYY-MM-DD-HHMM" for what I want to send the following AWK command
awk '/Code.*/' FILE-2014*
I'd like to separate all files that have the same date to a folder named with the corresponding date. For example, if I... (7 Replies)
I need to take 2 input files and create 1 output based on matches from each file. I am looking to match field #1 in both files (Userid) and create an output file that will be a combination of fields from
both file1 and file2 if there are any differences in the fields 2,3,4,5,or 6.
Below is an... (5 Replies)
I am trying to parse two files and get data that does not match in one of the columns ( column 3 in my case )
Data for two files are as follows
A.txt
=====
abc 10 5 0 1 16
xyz 16 1 1 0 18
efg 30 8 0 2 40
ijk 22 2 0 1 25
B.txt
=====
abc... (6 Replies)
I am trying to match 4 colums (first_name,last_name,dob,ssn) between 2 files and when there is an exact match I need to write out these matches to a new file with a combination of fields from file1 and file2. I've managed to come up with a way to match these 2 files based on the columns (see below)... (7 Replies)
Just wondering if anyone else is using IBM's TWS on HP-UX 11.11i. Seeing some very strange name-lookup issues when it comes to using various utilities on the system. The same software works fine o0n AIX, Linux, Solaris, etc, but on HP-UX there is noticeable time lags in issuing commands - at the... (0 Replies)