Issue with awk when joining two files when field has '-' hyphen


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Issue with awk when joining two files when field has '-' hyphen
# 1  
Old 02-06-2019
Issue with awk when joining two files when field has '-' hyphen

Dear Community;

I need to join two files but I am facing issues.

1st file has multiple columns. Primary (1st) columns has unique values. There are other columns out of which some has non-ascii characters as well (other language).
Example File below:

Code:
 
1-1001JRL,BiRecurring
1-1001W5O,QuadRecurring
1-1002KQK,SpclRecurring
1-1002KR9,1-Recurring
1-1002KRN,2-Recurring
1-1002KS1,One-Time
1-1003BAY,Recurring

2nd file has multiple columns as well. Primary (1st) columns has repeated values.
Example File below:

Code:
 
1-1001JRL,abc-1
1-1001JRL,abc-2
1-1001JRL,abc-3
1-1001JRL,abc-4
1-1001JRL,abc-5
1-1001JRL,abc-6
1-1001JRL,abc-7
1-1001JRL,abc-8
1-1001W5O,def-1
1-1001W5O,def-2
1-1001W5O,def-3
1-1001W5O,def-4
1-1001W5O,def-5

I need to join these two files. Expected Output:

Code:
 
1-1001JRL,abc-1,1-1001JRL,BiRecurring
1-1001JRL,abc-2,1-1001JRL,BiRecurring
1-1001JRL,abc-3,1-1001JRL,BiRecurring
1-1001JRL,abc-4,1-1001JRL,BiRecurring
1-1001JRL,abc-5,1-1001JRL,BiRecurring
1-1001JRL,abc-6,1-1001JRL,BiRecurring
1-1001JRL,abc-7,1-1001JRL,BiRecurring
1-1001JRL,abc-8,1-1001JRL,BiRecurring
1-1001W5O,def-1,1-1001W5O,QuadRecurring
1-1001W5O,def-2,1-1001W5O,QuadRecurring
1-1001W5O,def-3,1-1001W5O,QuadRecurring
1-1001W5O,def-4,1-1001W5O,QuadRecurring
1-1001W5O,def-5,1-1001W5O,QuadRecurring
1-1002KQK,SpclRecurring
1-1002KR9,1-Recurring
1-1002KRN,2-Recurring
1-1002KS1,One-Time
1-1003BAY,Recurring

I used below awk commands:

Code:
awk 'NR==FNR {T[$1]=FS $1; next} {print $0 T[$1]}' FS="," File1 File2

Code:
awk -F"," 'FNR==NR{A[$1]=$1;B[$1]=$0;next} ($1 in A){print $0 FS B[$1]} !($1 in A){print $0}' File1 File2

Both the commands are only printing output of 2nd file as the condition for matching the fields are not macthing. I tried using the files after removing all non-ascii columns by just keeping the primary columns as well and it did not work.
I am suspecting that it is because of the presence of spcl character '-' hyphen.

Kindly assist and suggest.

Best Regards//
# 2  
Old 02-06-2019
Code:
awk -F, 'NR==FNR {T[$1]=$2;B[$1]=$2; next} ($1 in T) { $0 = $0 FS $1 FS T[$1] ;delete B[$1]} END { for(i in B) print i FS B[i] } 1' file1 file2

And I also made a mistake in the name of the array at the output. Fixed T on B

Last edited by nezabudka; 02-06-2019 at 06:39 AM..
This User Gave Thanks to nezabudka For This Post:
# 3  
Old 02-06-2019
Similar but slightly simplified approach:
Code:
awk -F, '
NR==FNR         {T[$1] = $0
                 next
                }
($1 in T)       {$0 = $0 FS T[$1]
                 B[$1]
                }
END             {for (t in T) if (!(t in B)) print T[t]
                }
1
' file[12]

Be aware that the order that the residual T elements are printed in is arbitrary by awk implementation. If you need them in the order given in file1, additional measures must be taken.
These 2 Users Gave Thanks to RudiC For This Post:
# 4  
Old 02-06-2019
Dear Both;

Many thanks for the response. Unfortunately - these commands are not working as expected.

Input File 1:
Code:
$ cat test2.txt
1-1001JRL,Recurring
1-1001W5O,One-Time
1-KHC3G1,Recurring
1-JKP5QY,One-Time
1-13M7ORR,One-Time
1-LXZSK3,Recurring
1-80QXOZ,One-Time
3-83ODZZ,Recurring
1-VKWGEP,Recurring
3-2FCQ4Y,Recurring
1-W65E6A,Recurring
1-13M7BLZ,Recurring
3-14UC92,Recurring
1-QRMHWU,One-Time

Input File 2:
Code:
cat test1.txt
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O

Output:
Code:
 awk -F, 'NR==FNR{T[$1] = $0;next}($1 in T){$0 = $0 FS T[$1]B[$1]}END{for (t in T) if (!(t in B)) print T[t]}1' test2.txt test1.txt
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001JRL
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-1001W5O
1-VKWGEP,Recurring
1-W65E6A,Recurring
3-83ODZZ,Recurring
1-QRMHWU,One-Time
1-KHC3G1,Recurring
1-13M7ORR,One-Time
1-JKP5QY,One-Time
1-1001W5O,One-Time  --> This line was not matched.
1-1001JRL,Recurring --> This line was not matched.
3-14UC92,Recurring
3-2FCQ4Y,Recurring
1-13M7BLZ,Recurring
1-LXZSK3,Recurring
1-80QXOZ,One-Time

Expected Output:
Code:
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-VKWGEP,Recurring
1-W65E6A,Recurring
3-83ODZZ,Recurring
1-QRMHWU,One-Time
1-KHC3G1,Recurring
1-13M7ORR,One-Time
1-JKP5QY,One-Time
3-14UC92,Recurring
3-2FCQ4Y,Recurring
1-13M7BLZ,Recurring
1-LXZSK3,Recurring
1-80QXOZ,One-Time

# 5  
Old 02-06-2019
DON'T manipulate scripts without EXACTLY knowing what you are doing, here: condensing a multi line script into a one liner. By sheer accident, your unfortunate manipulation still works for me:


Code:
awk -F, 'NR==FNR{T[$1] = $0;next}($1 in T){$0 = $0 FS T[$1]B[$1]}END{for (t in T) if (!(t in B)) print T[t]}1' file[12]
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001JRL,1-1001JRL,Recurring
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-1001W5O,1-1001W5O,One-Time
1-VKWGEP,Recurring
1-QRMHWU,One-Time
1-13M7BLZ,Recurring
1-JKP5QY,One-Time
1-LXZSK3,Recurring
1-KHC3G1,Recurring
3-14UC92,Recurring
1-80QXOZ,One-Time
1-W65E6A,Recurring
3-83ODZZ,Recurring
1-13M7ORR,One-Time
 3-2FCQ4Y,Recurring


Please show the result of running any script AS GIVEN! on the samples in post #1. If using a different sample set, make sure the structure is identical to the original samples.

Last edited by RudiC; 02-06-2019 at 07:54 AM..
This User Gave Thanks to RudiC For This Post:
# 6  
Old 02-06-2019
I made a mistake in the name of the variable on the output and fixed it, but it is better to correct the code from RudiC
Copy my code else and replace files places
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk joining multiple lines based on field count

Hi Folks, I have a file with fields as follows which has last field in multiple lines. I would like to combine a line which has three fields with single field line for as shown in expected output. Please help. INPUT hname01 windows appnamec1eda_p1, ... (5 Replies)
Discussion started by: shunya
5 Replies

2. Shell Programming and Scripting

Joining files using awk not extracting all columns from File 2

Hello All I'm joining two files using Awk by Left outer join on the file 1 File 1 1 AA 2 BB 3 CC 4 DD File 2 1 IND 100 200 300 2 AUS 400 500 600 5 USA 700 800 900 (18 Replies)
Discussion started by: venkat_reddy
18 Replies

3. Shell Programming and Scripting

awk to print unique text in field before hyphen

Trying to print the unique values in $2 before the -, currently the count is displayed. Hopefully, the below is close. Thank you :). file chr2:46603668-46603902 EPAS1-902|gc=54.3 253.1 chr2:211471445-211471675 CPS1-1205|gc=48.3 264.7 chr19:15291762-15291983 NOTCH3-1003|gc=68.8 195.8... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

UNIX joins : facing issue while joining three files

Hello , I have three files : sampleoutput1.txt has columns (in the following order) : hostname ; available patches , available packages sampleoutput2.txt has columns (in the following order) : hostname ; patchwave ; BSID ; Application sampleoutput3.txt has columns (in the following... (10 Replies)
Discussion started by: rahul2662
10 Replies

5. Shell Programming and Scripting

awk to place value at 24 field in a flat file issue

I am trying to add 0393 value at 24th feild using the below command, but its adding at all the lines including header and trailer Input file: ZHV|2657|D0217001|T|TXU|Z|PAN|20131112000552||||OPER| 754|52479| 492|489|SP40|1014570286334|20131111|20131201|14355334|CHAMELON... (1 Reply)
Discussion started by: Aditya_001
1 Replies

6. Shell Programming and Scripting

awk - replace first hyphen

How do I use awk to replace the first hyphen of a specific record? (1 Reply)
Discussion started by: locoroco
1 Replies

7. Shell Programming and Scripting

AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2

First, thanks for the help in previous posts... couldn't have gotten where I am now without it! So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following: If $1... (4 Replies)
Discussion started by: right_coaster
4 Replies

8. Shell Programming and Scripting

1024 field issue : awk

Hi i have a txt file in which i do a awk operation with ":" as field separator A B C D ABC::2386.13:2386.13:3248234281995::+DPY:INT:3:N::::2:200.00:0.00:2010-05-12:CA: ::2:N::N:PH:00010031:0001+DPY:BAL:3:N::::3:1601.01:0.00:2010-05-12:XT::2:N:MR ... (1 Reply)
Discussion started by: mad_man12
1 Replies

9. Shell Programming and Scripting

Joining Two Files Using Awk

Hi All, I am new to awk program. But i have got some assignment on awk. The problem is: i have two files file1 and file2. Both files have same structure. First i have to join both files on filed1,field2 and field3 and then for matching records i want to perform some calculation like:... (1 Reply)
Discussion started by: Jeetuibm
1 Replies

10. Shell Programming and Scripting

Issue with Joining lines from two files

Hi, I have two text files, that need their data joining/concatenation. 'Paste' works for this. But have an issue when there is mismatch in number of rows in each file. E.g. (main file) File1 - has 20 rows File2 - has 30 rows. Command 'paste file1 file2 > file3' joins all lines. I want the... (4 Replies)
Discussion started by: sharath160
4 Replies
Login or Register to Ask a Question