Awk incorrect data.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk incorrect data.
# 1  
Old 04-21-2009
Awk incorrect data.

I am using the following command:

Code:
nawk -F"," 'NR==FNR {a[$2$3]=$1;next} a[$2$3] {print a[$2$3],$1,$2,$3}'  file1 file2

I am getting 40 records output.
But when i import file1 and file2 in MS Access i get 140 records.
And i know 140 is correct count.

Appreciate your help on correcting the above script
# 2  
Old 04-21-2009
It would definitely help if you provided a more detailed description of what you're trying to achieve with the sample data files and the expected output.

My crystal ball is a bit fuzzy, but:
Code:
nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file1 file2

# 3  
Old 04-21-2009
Quote:
Originally Posted by vgersh99
It would definitely help if you provided a more detailed description of what you're trying to achieve with the sample data files and the expected output.

My crystal ball is a bit fuzzy, but:
Code:
nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file1 file2

vgersh99

I have two files
$ head file1
zip,FirstName,Lastname
07777,abc,def
22584,dec,dlo
25487,xyz,jkl
25488,dim,kio

$ head file2
aim server database
SSN,Firstname,LastName
123456789,abc,def
123456789,dec,dlo
123456789,xyz,jkl
123456789,dim,kio
wanted Output:
SSN,zip,FirstName,LastName

Code:
nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=,  " file2 file1
40 Matches

Code:
nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=,  " file1 file2
140 matches

140 matches is correct i know but both should give 140 i dont know why its giving difference.

Can you please explain this part ($2 SUBSEP $3)
a[$2,$3] we are using , here because its is comma seperated inputfile or is it general rule
If i dont use , then also i am getting same result
# 4  
Old 04-21-2009
Quote:
Originally Posted by zenith
vgersh99

I have two files
$ head file1
zip,FirstName,Lastname
07777,abc,def
22584,dec,dlo
25487,xyz,jkl
25488,dim,kio

$ head file2
aim server database
SSN,Firstname,LastName
123456789,abc,def
123456789,dec,dlo
123456789,xyz,jkl
123456789,dim,kio
wanted Output:
SSN,zip,FirstName,LastName

Code:
nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=,  " file2 file1
40 Matches

Code:
nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=,  " file1 file2
140 matches

The above 2 invocations are exactly the same. I don't understand why you're getting different results.
Also I don't understand why you have a trailing double quote (in red) in both case?
Quote:
Originally Posted by zenith
140 matches is correct i know but both should give 140 i dont know why its giving difference.

Can you please explain this part ($2 SUBSEP $3)
a[$2,$3] we are using , here because its is comma seperated inputfile or is it general rule
If i dont use , then also i am getting same result
No, it's not because your file is comma-separated. You can build your array index just by concatenating the strings ($2$3) or (which is better for further processing) by doing this:
Code:
a[$2,$3]

In the context of the array index building, the "," is substituted by the awk's internal variable SUBSEP. If later on you decide to "split" the index (to find it parts) you can split by SUBSEP. If you simply concatenate the string, you cannot reconstruct the index to its original parts.

The originally posted solution should give you the desired result.
Given file1:
Code:
zip,FirstName,Lastname
07777,abc,def
22584,dec,dlo
25487,xyz,jkl
25488,dim,kio

and file2:
Code:
SSN,Firstname,LastName
123456789,abc,def
123456789,dec,dlo
123456789,xyz,jkl
123456789,dim,kio

running:
Code:
nawk -F, 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file2 file1

Results in:
Code:
123456789,07777,abc,def
123456789,22584,dec,dlo
123456789,25487,xyz,jkl
123456789,25488,dim,kio

Check your file1 and file2 - see if there're any discrepancies and/or embedded spaces.

Also, this is NOT one of your first forum posts and you've been asked in the past: please use BB Code tags when posting data or code samples.
# 5  
Old 04-21-2009
Quote:
Originally Posted by vgersh99
The above 2 invocations are exactly the same. I don't understand why you're getting different results.
Also I don't understand why you have a trailing double quote (in red) in both case?

No, it's not because your file is comma-separated. You can build your array index just by concatenating the strings ($2$3) or (which is better for further processing) by doing this:
Code:
a[$2,$3]

In the context of the array index building, the "," is substituted by the awk's internal variable SUBSEP. If later on you decide to "split" the index (to find it parts) you can split by SUBSEP. If you simply concatenate the string, you cannot reconstruct the index to its original parts.

The originally posted solution should give you the desired result.
Given file1:
Code:
zip,FirstName,Lastname
07777,abc,def
22584,dec,dlo
25487,xyz,jkl
25488,dim,kio

and file2:
Code:
SSN,Firstname,LastName
123456789,abc,def
123456789,dec,dlo
123456789,xyz,jkl
123456789,dim,kio

running:
Code:
nawk -F, 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file2 file1

Results in:
Code:
123456789,07777,abc,def
123456789,22584,dec,dlo
123456789,25487,xyz,jkl
123456789,25488,dim,kio

Check your file1 and file2 - see if there're any discrepancies and/or embedded spaces.

Also, this is NOT one of your first forum posts and you've been asked in the past: please use BB Code tags when posting data or code samples.
Code:
nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file1 file2

In the above code if i switch the file1 and fie2 position then i get different results.
I cannot post the files due to data sensitivity.
I visually checked the files and i see no special characters or anything.
Is there a special command to verify this.

Appreciate your response.
# 6  
Old 04-22-2009
Quote:
Originally Posted by zenith
Code:
nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file1 file2

In the above code if i switch the file1 and fie2 position then i get different results.
I cannot post the files due to data sensitivity.
I visually checked the files and i see no special characters or anything.
Is there a special command to verify this.

Appreciate your response.
Patient: Doc, it really hurts when I do that!
Doctor: Then don't do that!

The positions of the files on the command line is important for mapping the fields from one file to the other. Look at your data files' fields - try to see the difference and look at your original posting for the mapping logic.
Good luck.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk incorrect format

I was wondering whether anyone has any idea what is happening here. I'm using simple code to compare 2 tab delimited files based on column 1 values. If the column1 value of file1 exists in file2, then I'm to print the column4 value in file2 in column3 of file1. Here is my code: 1st I have to... (6 Replies)
Discussion started by: Geneanalyst
6 Replies

2. Shell Programming and Scripting

awk command gives incorrect result?

Hi All, I am looking to filter out filesystems which are greter than a specific value. I use the command df -h | awk '$4 >=70.00 {print $4,$5}' But this results out as below, which also gives for lower values. 9% /u01 86% /home 8% /u01/data 82% /install 70% /u01/app Looks... (3 Replies)
Discussion started by: jjoy
3 Replies

3. Shell Programming and Scripting

Df -h | awk - output incorrect matching

Running solaris 9, on issuing the follwing command df -h | awk '$5 > 45 {print}' Filesystems with utilisation > 45% are being displayed as well as those between 5 and-9%!!! (3 Replies)
Discussion started by: squrcles
3 Replies

4. Shell Programming and Scripting

awk --> math-operation in data-record and joining with second file data

Hi! I have a pretty complex job - at least for me! i have two csv-files with meassurement-data: fileA ...... (2 Replies)
Discussion started by: IMPe
2 Replies

5. Shell Programming and Scripting

Help with parsing data with awk , eliminating unwanted data

Experts , Below is the data: --- Physical volumes --- PV Name /dev/dsk/c1t2d0 VG Name /dev/vg00 PV Status available Allocatable yes VGDA 2 Cur LV 8 PE Size (Mbytes) 8 Total PE 4350 Free PE 2036 Allocated PE 2314 Stale PE 0 IO Timeout (Seconds) default --- Physical volumes ---... (5 Replies)
Discussion started by: rveri
5 Replies

6. Shell Programming and Scripting

awk sum giving incorrect value

cat T|awk -v format=$format '{ SUM += $1} END { printf format,SUM}' the file T has below data usghrt45tf:hrguat:/home/hrguat $ cat T -1363000.00123456789 -95000.00789456123 -986000.0045612378 -594000.0015978 -368939.54159753258415 -310259.0578945612 -133197.37123456789... (4 Replies)
Discussion started by: zulfi123786
4 Replies

7. Shell Programming and Scripting

awk : deleting specific incorrect lines

Hello friends, I searched in forums for similar threads but what I want is to have a single awk code to perform followings; I have a big log file going like this; ... 7450494 1724465 -47 003A98B710C0 7450492 1724461 -69 003A98B710C0 7450488 1724459 001DA1915B70 trafo_14:3 7450482... (5 Replies)
Discussion started by: enes71
5 Replies

8. Shell Programming and Scripting

awk to extract incorrect fixed length records

I have a number of unix text files containing fixed-length records (normal unix linefeed terminator) where I need to find odd records which are an incorrect length. The data is not validated and records can contain odd backslash characters and control characters which makes them awkward to process... (2 Replies)
Discussion started by: methyl
2 Replies

9. Shell Programming and Scripting

Merge lines in a file with Awk - incorrect output

Hi, I would like: FastEthernet0/0 is up, line protocol is up 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 output errors, 0 collisions, 0 interface resets Serial1/0:0 is up, line protocol is up 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 0... (14 Replies)
Discussion started by: mv652
14 Replies

10. Shell Programming and Scripting

Script extracting the incorrect data from text file

Hello, A script has been written to extract a specific column data from a text file ONLY if the user's initial input matches the the data of the first column in the text, then only the data from that row will be prinited. The problem I am having is that the code is only reading the records... (6 Replies)
Discussion started by: jermaine4ever
6 Replies
Login or Register to Ask a Question