Awk incorrect data.

04-21-2009

Registered User

246, 1

Join Date: Apr 2009

Last Activity: 27 March 2020, 1:08 PM EDT

Posts: 246

Thanks Given: 9

Thanked 1 Time in 1 Post

Awk incorrect data.

I am using the following command:

Code:

nawk -F"," 'NR==FNR {a[$2$3]=$1;next} a[$2$3] {print a[$2$3],$1,$2,$3}'  file1 file2

I am getting 40 records output.
But when i import file1 and file2 in MS Access i get 140 records.
And i know 140 is correct count.

Appreciate your help on correcting the above script

pinnacle

View Public Profile for pinnacle

Find all posts by pinnacle

04-21-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

It would definitely help if you provided a more detailed description of what you're trying to achieve with the sample data files and the expected output.

My crystal ball is a bit fuzzy, but:

Code:

nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file1 file2

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

04-21-2009

Registered User

246, 1

Join Date: Apr 2009

Last Activity: 27 March 2020, 1:08 PM EDT

Posts: 246

Thanks Given: 9

Thanked 1 Time in 1 Post

Quote:

Originally Posted by vgersh99

It would definitely help if you provided a more detailed description of what you're trying to achieve with the sample data files and the expected output.

My crystal ball is a bit fuzzy, but:

Code:

nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file1 file2

vgersh99

I have two files
$ head file1
zip,FirstName,Lastname
07777,abc,def
22584,dec,dlo
25487,xyz,jkl
25488,dim,kio

$ head file2
aim server database
SSN,Firstname,LastName
123456789,abc,def
123456789,dec,dlo
123456789,xyz,jkl
123456789,dim,kio
wanted Output:
SSN,zip,FirstName,LastName

Code:

nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=,  " file2 file1
40 Matches

Code:

nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=,  " file1 file2
140 matches

140 matches is correct i know but both should give 140 i dont know why its giving difference.

Can you please explain this part ($2 SUBSEP $3)
a[$2,$3] we are using , here because its is comma seperated inputfile or is it general rule
If i dont use , then also i am getting same result

pinnacle

View Public Profile for pinnacle

Find all posts by pinnacle

04-21-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

Quote:

Originally Posted by zenith

Code:

nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=,  " file2 file1
40 Matches

Code:

nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=,  " file1 file2
140 matches

The above 2 invocations are exactly the same. I don't understand why you're getting different results.
Also I don't understand why you have a trailing double quote (in red) in both case?

Quote:

Originally Posted by zenith

No, it's not because your file is comma-separated. You can build your array index just by concatenating the strings ($2$3) or (which is better for further processing) by doing this:

Code:

a[$2,$3]

In the context of the array index building, the "," is substituted by the awk's internal variable SUBSEP. If later on you decide to "split" the index (to find it parts) you can split by SUBSEP. If you simply concatenate the string, you cannot reconstruct the index to its original parts.

The originally posted solution should give you the desired result.
Given file1:

Code:

zip,FirstName,Lastname
07777,abc,def
22584,dec,dlo
25487,xyz,jkl
25488,dim,kio

and file2:

Code:

SSN,Firstname,LastName
123456789,abc,def
123456789,dec,dlo
123456789,xyz,jkl
123456789,dim,kio

running:

Code:

nawk -F, 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file2 file1

Results in:

Code:

123456789,07777,abc,def
123456789,22584,dec,dlo
123456789,25487,xyz,jkl
123456789,25488,dim,kio

Check your file1 and file2 - see if there're any discrepancies and/or embedded spaces.

Also, this is NOT one of your first forum posts and you've been asked in the past: please use BB Code tags when posting data or code samples.

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

04-21-2009

Registered User

246, 1

Join Date: Apr 2009

Last Activity: 27 March 2020, 1:08 PM EDT

Posts: 246

Thanks Given: 9

Thanked 1 Time in 1 Post

Quote:

Originally Posted by vgersh99

The above 2 invocations are exactly the same. I don't understand why you're getting different results.
Also I don't understand why you have a trailing double quote (in red) in both case?

No, it's not because your file is comma-separated. You can build your array index just by concatenating the strings ($2$3) or (which is better for further processing) by doing this:

Code:

a[$2,$3]

Code:

zip,FirstName,Lastname
07777,abc,def
22584,dec,dlo
25487,xyz,jkl
25488,dim,kio

and file2:

Code:

SSN,Firstname,LastName
123456789,abc,def
123456789,dec,dlo
123456789,xyz,jkl
123456789,dim,kio

running:

Code:

nawk -F, 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file2 file1

Results in:

Code:

123456789,07777,abc,def
123456789,22584,dec,dlo
123456789,25487,xyz,jkl
123456789,25488,dim,kio

Code:

nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file1 file2

In the above code if i switch the file1 and fie2 position then i get different results.
I cannot post the files due to data sensitivity.
I visually checked the files and i see no special characters or anything.
Is there a special command to verify this.

Appreciate your response.

pinnacle

View Public Profile for pinnacle

Find all posts by pinnacle

04-22-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

Quote:

Originally Posted by zenith

Code:

nawk -F"," 'NR==FNR {a[$2,$3]=$1;next} ($2 SUBSEP $3) in a {print a[$2,$3],$1,$2,$3}'  OFS=, file1 file2

Patient: Doc, it really hurts when I do that!
Doctor: Then don't do that!

The positions of the files on the command line is important for mapping the fields from one file to the other. Look at your data files' fields - try to see the difference and look at your original posting for the mapping logic.
Good luck.

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

Shell Programming and Scripting

Awk incorrect data.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk incorrect format

Discussion started by: Geneanalyst

2. Shell Programming and Scripting

awk command gives incorrect result?

Discussion started by: jjoy

3. Shell Programming and Scripting

Df -h | awk - output incorrect matching

Discussion started by: squrcles

4. Shell Programming and Scripting

awk --> math-operation in data-record and joining with second file data

Discussion started by: IMPe

5. Shell Programming and Scripting

Help with parsing data with awk , eliminating unwanted data

Discussion started by: rveri

6. Shell Programming and Scripting

awk sum giving incorrect value

Discussion started by: zulfi123786

7. Shell Programming and Scripting

awk : deleting specific incorrect lines

Discussion started by: enes71

8. Shell Programming and Scripting

awk to extract incorrect fixed length records

Discussion started by: methyl

9. Shell Programming and Scripting

Merge lines in a file with Awk - incorrect output

Discussion started by: mv652

10. Shell Programming and Scripting

Script extracting the incorrect data from text file

Discussion started by: jermaine4ever