Problem with Join Command


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Problem with Join Command
# 1  
Old 08-12-2016
Problem with Join Command

I have 2 files. File 1 is a daily file with only a bunch of IDs and a date column. File 2 has all the dump of IDs and their respective cost. I basically want an inner join. When I am picking a few rows from these files and joining, they work perfectly fine. But when I join the full files together, I get no response. Below are the sample files

Code:
File 1 :
01223610248 , 04/07/2009
01223612562 , 08/19/2003
01223617098 , 02/01/2005
01223618661 , 12/13/2005
01223619159 , 05/29/2007
01223620423 , 02/06/2007
01256957092 , 04/22/2003
01256959417 , 12/19/2006
01256959597 , 02/08/2005
01256970382 , 10/12/2004
01256970722 , 03/15/2005
01256972774 , 10/11/2005
01256975064 , 05/02/2006
01256976030 , 03/21/2006
933604300140 , 10/27/2008
933736900872 , 05/06/2016
933604300128 , 03/14/2008

Code:
File 2:
933604300140 ,20.64
933736900872 ,18.56
933604300128 ,20.64
67119603398 ,0.64
67261704102 ,0.65
75072313652 ,0.65
02454397033 ,0.70
02454397537 ,0.70
03139824387 ,0.70
03139824388 ,0.70
76218230730 ,0.70
77898802256 ,0.70
88843006240 ,0.70
63410597392 ,0.81
84315600053 ,0.82
63447926391 ,0.97
06004450461 ,0.98
60161702121 ,0.98
79862230382 ,0.98
79862230662 ,0.98



A simple join command works for this. But when I join the original File 2 which is around 19 mb on the server with the original file 1 (96 kb) , I get no output at all.

using
Code:
join file 2 file1

works for the above sample files.

I have tried the following commands for the full files:
Code:
join -t, file2 file1

Code:
 join -t, -1 1 -1 2 file2 file1

nothing seems to be working with the original files.

I also tried the following code lines:

Code:
awk 'NR=FNR{check[$0];next} $2 in check' File2 File1

and


Code:
cat File2 | while read line; do  grep $line File1; done

What am I missing here? Please help!

Thanks
# 2  
Old 08-12-2016
You say nothing seems to be working with the original files. What does that mean? Are the commands completing with no output? Are you killing the commands because no output is produced in an hour (or some other fixed time)?

Your sample data in File 1 (or maybe file1 as it is referenced in your sample code) uses <space><comma><space> as the field separator. Your sample data in File 2 (or maybe file2 or file 2 as it is referenced in your sample code) uses <space><comma> as the field separator. But your code just uses <comma> as the field separator. Are you sure that your real data ALWAYS has exactly one <space> character after the number in the first field in both input files before the <comma> on every line?
# 3  
Old 08-15-2016
I posted the reply yesterday but I am not sure why it is not reflecting. So here it is again :

The files were originally tab delimited but I made them comma delimited to help me with the join command. I am now trying again with tab delimited files. I also tried some additions in my join command and this is what I gave :

Code:
join -t"  "  -a 2 -a 1 -e 'NULL' -o '0,1.1,1.2,2.1,2.2' File1 File2 | head -100

I am getting a result out of this which unfortunately means that UNIX is not finding a common key for the files to join and it is surprising because there ARE common values between the files. This is how the sample of the result looks like :

Code:
01635158332	09/09/2016 01635158332	09/09/2016 NULL NULL NULL
01635163349	11/24/2009 01635163349	11/24/2009 NULL NULL NULL
16.11	01635163339 NULL NULL 16.11	01635163339 NULL
16.11	01635163349 NULL NULL 16.11	01635163349 NULL


As you can see above, 01635163349 is a common key between File 1 that has dates and file 2 that has the cost. So ideally the result should be

Code:
01635163349  11/24/2009  16.11

The command
Code:
join -1 1 -2 1 File 1 File 2

does not give me any result as in no output on the console at all.

This is how file 1 looks:

Code:
00033492482     04/11/2006
00033492682     07/14/2009
00033492702     02/09/2010
00076848302     08/10/2010
00881123792     11/07/2000
01130162424     06/12/2007
01130164254     01/29/2008
01130165543     05/16/2011
01130168864     07/14/2009
01635163349     11/24/2009

File 2:

Code:
0.00    03139822826
0.00    49246820001
0.00    7621830148
0.00    822004599003
0.11    73379268872
0.64    67119603398
0.65    67261704102
16.11   01635163349


Can there be any other way to achieve an inner join between these files?
# 4  
Old 08-15-2016
Shouldn't you use column 2 in file2? Try
Code:
join  -11 -22 file1 file2
01635163349 11/24/2009 16.11

And, moving targets rarely help. Why didn't you post representative samples in the first place?
# 5  
Old 08-15-2016
Sorry, I did

Code:
join -11 -22  File1 File2

Came out as a typo .... but this is not working either Smilie
# 6  
Old 08-15-2016
DOS <CR> (0x0D, \r, ^M) line terminators? Where and how did you produce the files?
# 7  
Old 08-15-2016
These files are being sent by the source. There are many other columns in these files. I have manipulated them to remove the unrequired columns and the header using AWK and SED.

---------- Post updated at 04:24 AM ---------- Previous update was at 04:19 AM ----------

These files are being sent by the source. There are many other columns in these files. I have manipulated them to remove the unrequired columns and the header using AWK and SED. When I am viewing the file in vi, I do not see ant ^M charachters.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Weird problem with join command

I have a weird issue going on with the join command... I have two files I am trying to join...here is a line from each file with the important parts marked in red: file1: /groupspace/ccops/cmis/bauwkrcn/commsamp_20140315.txt,1 file2:... (3 Replies)
Discussion started by: dbiggied
3 Replies

2. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

3. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

4. UNIX for Dummies Questions & Answers

Problem when using join command

Dear all, I have two files (each only contains 1 column) as attached. I want to combined the two files and only show the common records in both files. But when I use join command only the last row was combined. Anyone know what is the problem? I don't know how to write the correct code to only... (2 Replies)
Discussion started by: forevertl
2 Replies

5. UNIX for Dummies Questions & Answers

problem with join

So I want to join two files that have a lot of rows The file named gen1 has 2 columns: head gen1 1008567 0.4026931012 1119535 0.7088912314 1120590 0.7093805634 1145994 0.7287952590 1148140 0.7313924434 1155173 0.7359550430 1188481 0.7598914553 1201155 0.7663406553 1206921... (2 Replies)
Discussion started by: peanuts48
2 Replies

6. UNIX for Dummies Questions & Answers

SOLVED: Join problem

Hello, Going through book, "Guide to UNIX Using Linux". I am doing one of the projects that has me writing scripts to join files. Here is my pnumname script and I am extracting the programmers names and numbers from the program file and redirecting the output to the file pnn. I then created a... (0 Replies)
Discussion started by: thebeav
0 Replies

7. Shell Programming and Scripting

awk command for simple join command but based on 2 columns

input1 a_a a/a 10 100 a1 a_a 20 200 b1 b_b 30 300 input2 a_a a/a xxx yyy a1 a1 lll ppp b1 b_b kkk ooo output a_a a/a 10 100 xxx yyy (2 Replies)
Discussion started by: ruby_sgp
2 Replies

8. Shell Programming and Scripting

Problem with Join command

Hi guyz Excuse me for posting simple question I tried join and sort and other perl commands but failed I have 2 files. 1st file contain single column with around 6000 values (rows). Second file contain 2 columns 1st column is the same column (in 1st file) but randomly ordered and second... (5 Replies)
Discussion started by: repinementer
5 Replies

9. Shell Programming and Scripting

join (pls help on join command)

Hi, I am a new learner of join command. Some result really make me confused. Please kindly help me. input: file1: LEO oracle engineer 210375 P.Jones Office Runner ID897 L.Clip Personl Chief ID982 S.Round UNIX admin ID6 file2: Dept2C ID897 6 years Dept5Z ID982 1 year Dept3S ID6 2... (1 Reply)
Discussion started by: summer_cherry
1 Replies

10. Shell Programming and Scripting

A join problem?

Hi everybody, I am hoping somebody here will be either be able to solve my troubles or at least give me a push in the right direction :) . I am developing a shell script to read in 4 different files worth of data that each contain a list of: username firstname secondname group score I... (2 Replies)
Discussion started by: jamjamjammie
2 Replies
Login or Register to Ask a Question