Join of files is incomplete?!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Join of files is incomplete?!
# 1  
Old 11-25-2005
Join of files is incomplete?!

Hi folks,

I am using the join command to join two files on a common field as follows:

File1.txt
Adsorption|H01.181.529.047
Adult|M01.060.116
Children|M01.055

File2.txt
5|Adsorption|C0001674
7|Adult|C000001
6|Children|C00002

join -i -t "|" -a 2 -1 1 -2 2 File1.txt File2.txt

This works fine for some lines but not all - Adult is missed whatever I try to do e.g. put to lower case etc?

Adsorption|H01.181.529.047|5|C0001674
7|Adult|C000001
Children|M01.055|6|C00002
# 2  
Old 11-25-2005
What os are you using? What does -i do with your version of join? I don't have a "join" that supports -i. But, using your data files...
Code:
$ cat File1.txt
Adsorption|H01.181.529.047
Adult|M01.060.116
Children|M01.055
$ cat File2.txt
5|Adsorption|C0001674
7|Adult|C000001
6|Children|C00002
$
$
$ join -t "|" -a 2 -1 1 -2 2 File1.txt File2.txt
Adsorption|H01.181.529.047|5|C0001674
Adult|M01.060.116|7|C000001
Children|M01.055|6|C00002
$

# 3  
Old 11-25-2005
Hmmm, thanks for that.

I am using FedoraCore 2 Linux with join (coreutils) 5.2.1, May 2004.

It must be a problem with my version of join then, what OS are you on?

The -i flag is just for case-insensitive matching.

Cheers
# 4  
Old 11-25-2005
There's this from the 'join' manual at www.gnu.org

'Either file1 or file2 (but not both) can be `-', meaning standard input. file1 and file2 should be already sorted in increasing textual order on the join fields, using the collating sequence specified by the LC_COLLATE locale...'

Another site mentions that:-

'However, as a GNU extension, if the input has no unpairable lines the sort order can be any order that considers two fields to be equal if and only if the sort comparison described above considers them to be equal.'

Which suggests to me that experimenting with the LC_COLLATE environment variable may allow the command to work.
# 5  
Old 11-25-2005
With no -i, it works with HP-UX, Solaris, and even Redhat 7.2. Redhat does support the -i option so I tried that as well. Still works.
# 6  
Old 11-26-2005
Fedora - Linux localhost.localdomain 2.6.11-1.1369_FC4
Works just fine.
# 7  
Old 06-08-2006
System - SunOS 5.9

I am using Unix join to join the following two files.

FileA
_______________
1,-1
3,-1
5,-1
49,-3
51,-1
52,-1
53,-1
54,-1
56,-2
57,-2
61,-1
62,-2
65,-1
66,-2
71,-1
72,-2
81,-3
82,-3
91,-4
99,-1
100,-5


FileB
________
1,2222
3,3222
5,2342
11,2418
15,1890
16,2445
20,2465
21,1889
30,1588
30,1888
31,2887
40,3423
45,4321
49,2345
51,5567
52,5210
53,4444
54,4567
56,1111
57,5678
61,6754
62,6742
65,1231
66,6765
71,1234
71,1991
72,7168
81,7777
82,8765
91,8766
99,9812
99,9998
100,8888
100,8981

First I sort them as -

sort -b -n -t ',' +0 FileA > A_sort
sort -b -n -t ',' +0 FileB > B_sort


Then I join them as,
join -t ',' -j1 1 -j2 1 -o 0 1.2 2.2 A_sort B_sort

and get -
1,2222,-1
3,3222,-1
5,2342,-1
51,5567,-1
52,5210,-1
53,4444,-1
54,4567,-1
56,1111,-2
57,5678,-2
61,6754,-1
62,6742,-2
65,1231,-1
66,6765,-2
71,1234,-1
71,1991,-1
72,7168,-2
81,7777,-3
82,8765,-3
91,8766,-4
99,9812,-1
99,9998,-1

I miss the following - Smilie
49,2345,-3
100,8888,-5
100,8981,-5

Why is this happening? Are they being internally treated as character though I specify -n in sort? What do i need to do? btw, both LC_COLLATE and LC_CTYPE are set to "". Should I set them as POSIX or C or something?

Many thanks in advance to all the Unix enthusiasts in this forum Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies

2. UNIX for Beginners Questions & Answers

Split Command Generating Incomplete Output Files

Hello All, May i please know how do i ensure my split command would NOT generate incomplete output files like below, the last lines in each file is missing some columns or last line is complete. split -b 50GB File File_ File_aa |551|70210203|xxxxxxx|12/22/2010 20:44:58|11/01/2010... (1 Reply)
Discussion started by: Ariean
1 Replies

3. Shell Programming and Scripting

Exclude incomplete files in ls -rlt

Hi All, I am bit puzzled with this requirement where I need to list the files in a directory. However, files are being continuously written to this folder through FTP. Hence I need to exclude the file which is being written at the time of listing the directory. I thought of using file time... (5 Replies)
Discussion started by: angshuman
5 Replies

4. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

5. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

6. UNIX for Dummies Questions & Answers

How to deal with incomplete image files

Sorry for the odd title, but I couldn't think of an easy way to describe my issue. Background I have a home security system that continually sends (via FTP) 4 different still images to a directory on my personal website - cam0.jpg, cam1.jpg, etc. I've written an extremely basic html script to... (4 Replies)
Discussion started by: CinciJeff
4 Replies

7. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

8. Solaris

How to ignore incomplete files

On Solaris, suppose there is a directory 'dir'. Log files of size approx 1MB are continuously being deposited here by scp command. I have a script that scans this dir every 5 mins and moves away the log files that have been deposited so far. How do I design my script so that I pick up *only*... (6 Replies)
Discussion started by: sentak
6 Replies

9. Shell Programming and Scripting

How to ignore incomplete files

On Solaris & AIX, suppose there is a directory 'dir'. Log files of size approx 1MB are continuously being deposited here by scp command. I have a script that scans this dir every 5 mins and moves away the log files that have been deposited so far. How do I design my script so that I pick up... (6 Replies)
Discussion started by: sentak
6 Replies

10. UNIX for Dummies Questions & Answers

append newline to files with incomplete last line

Hi all, Is there any way I can check a file for the linefeed character at the end of the file, and append one only if it is missing (ie. Incomplete last line)? Need to do this because I need to write a script to process files FTP-ed over from various machines, which may or may not be... (1 Reply)
Discussion started by: ziyi
1 Replies
Login or Register to Ask a Question