The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Join two files koti_rama Shell Programming and Scripting 4 06-10-2008 03:15 AM
How to ignore incomplete files sentak SUN Solaris 6 02-14-2008 12:03 PM
How to ignore incomplete files sentak Shell Programming and Scripting 6 02-14-2008 10:29 AM
Join Files choppas Shell Programming and Scripting 2 10-18-2006 07:03 AM
append newline to files with incomplete last line ziyi UNIX for Dummies Questions & Answers 1 04-14-2004 06:00 AM

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-25-2005
Registered User
 

Join Date: Nov 2005
Posts: 4
Join of files is incomplete?!

Hi folks,

I am using the join command to join two files on a common field as follows:

File1.txt
Adsorption|H01.181.529.047
Adult|M01.060.116
Children|M01.055

File2.txt
5|Adsorption|C0001674
7|Adult|C000001
6|Children|C00002

join -i -t "|" -a 2 -1 1 -2 2 File1.txt File2.txt

This works fine for some lines but not all - Adult is missed whatever I try to do e.g. put to lower case etc?

Adsorption|H01.181.529.047|5|C0001674
7|Adult|C000001
Children|M01.055|6|C00002
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 11-25-2005
Perderabo's Avatar
Unix Daemon
 

Join Date: Aug 2001
Location: Washington DC Area
Posts: 8,314
What os are you using? What does -i do with your version of join? I don't have a "join" that supports -i. But, using your data files...
Code:
$ cat File1.txt
Adsorption|H01.181.529.047
Adult|M01.060.116
Children|M01.055
$ cat File2.txt
5|Adsorption|C0001674
7|Adult|C000001
6|Children|C00002
$
$
$ join -t "|" -a 2 -1 1 -2 2 File1.txt File2.txt
Adsorption|H01.181.529.047|5|C0001674
Adult|M01.060.116|7|C000001
Children|M01.055|6|C00002
$
Reply With Quote
  #3 (permalink)  
Old 11-25-2005
Registered User
 

Join Date: Nov 2005
Posts: 4
Hmmm, thanks for that.

I am using FedoraCore 2 Linux with join (coreutils) 5.2.1, May 2004.

It must be a problem with my version of join then, what OS are you on?

The -i flag is just for case-insensitive matching.

Cheers
Reply With Quote
  #4 (permalink)  
Old 11-25-2005
Registered User
 

Join Date: Sep 2005
Posts: 45
There's this from the 'join' manual at www.gnu.org

'Either file1 or file2 (but not both) can be `-', meaning standard input. file1 and file2 should be already sorted in increasing textual order on the join fields, using the collating sequence specified by the LC_COLLATE locale...'

Another site mentions that:-

'However, as a GNU extension, if the input has no unpairable lines the sort order can be any order that considers two fields to be equal if and only if the sort comparison described above considers them to be equal.'

Which suggests to me that experimenting with the LC_COLLATE environment variable may allow the command to work.
Reply With Quote
  #5 (permalink)  
Old 11-25-2005
Perderabo's Avatar
Unix Daemon
 

Join Date: Aug 2001
Location: Washington DC Area
Posts: 8,314
With no -i, it works with HP-UX, Solaris, and even Redhat 7.2. Redhat does support the -i option so I tried that as well. Still works.
Reply With Quote
  #6 (permalink)  
Old 11-25-2005
RTM's Avatar
RTM RTM is offline
Hog Hunter
 
Join Date: Apr 2002
Location: On my motorcycle
Posts: 3,039
Fedora - Linux localhost.localdomain 2.6.11-1.1369_FC4
Works just fine.
Reply With Quote
  #7 (permalink)  
Old 06-08-2006
Registered User
 

Join Date: Jun 2006
Posts: 2
System - SunOS 5.9

I am using Unix join to join the following two files.

FileA
_______________
1,-1
3,-1
5,-1
49,-3
51,-1
52,-1
53,-1
54,-1
56,-2
57,-2
61,-1
62,-2
65,-1
66,-2
71,-1
72,-2
81,-3
82,-3
91,-4
99,-1
100,-5


FileB
________
1,2222
3,3222
5,2342
11,2418
15,1890
16,2445
20,2465
21,1889
30,1588
30,1888
31,2887
40,3423
45,4321
49,2345
51,5567
52,5210
53,4444
54,4567
56,1111
57,5678
61,6754
62,6742
65,1231
66,6765
71,1234
71,1991
72,7168
81,7777
82,8765
91,8766
99,9812
99,9998
100,8888
100,8981

First I sort them as -

sort -b -n -t ',' +0 FileA > A_sort
sort -b -n -t ',' +0 FileB > B_sort


Then I join them as,
join -t ',' -j1 1 -j2 1 -o 0 1.2 2.2 A_sort B_sort

and get -
1,2222,-1
3,3222,-1
5,2342,-1
51,5567,-1
52,5210,-1
53,4444,-1
54,4567,-1
56,1111,-2
57,5678,-2
61,6754,-1
62,6742,-2
65,1231,-1
66,6765,-2
71,1234,-1
71,1991,-1
72,7168,-2
81,7777,-3
82,8765,-3
91,8766,-4
99,9812,-1
99,9998,-1

I miss the following -
49,2345,-3
100,8888,-5
100,8981,-5

Why is this happening? Are they being internally treated as character though I specify -n in sort? What do i need to do? btw, both LC_COLLATE and LC_CTYPE are set to "". Should I set them as POSIX or C or something?

Many thanks in advance to all the Unix enthusiasts in this forum
Reply With Quote
Google UNIX.COM
Reply

Tags
linux

Thread Tools
Display Modes




All times are GMT -7. The time now is 03:03 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0