The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk, join or sed jkl_jkl Shell Programming and Scripting 1 04-15-2008 02:55 AM
Join jazz8146 UNIX for Dummies Questions & Answers 5 01-29-2008 07:42 AM
join (pls help on join command) summer_cherry Shell Programming and Scripting 1 12-31-2007 01:19 AM
Strip all non-alphanumerics braindrain Shell Programming and Scripting 3 09-17-2006 11:21 AM

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-16-2005
Registered User
 

Join Date: Nov 2005
Posts: 4
Use non alphanumerics in join

Hi,

I have a problem while joining two sorted files with "join".

File 1.txt
Alnus|123
ALO140102|234
ALO 1401 02|345
ALO-1401-02|456
Alobar Holoprosencephalies|567

File 2.txt
1|Alnus|
1|ALO 1401 02|
1|ALO-1401-02|
1|Alobar Holoprosencephalies|

If I join the files as follows:
join -i -t '|' -1 1 -2 2 file1.txt file2.txt

this doesn't work because the join command ignores punctuation i.e. it checks ALO140102 against file 2 and when it doesn't find a match it moves on to Alobar Holoprosencephalies. If ALO140102 IS present in file 2 then the match works fine. Therefore I need to get the join command to recognise non-alphanumerics.

Any ideas?!!
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 12-16-2005
Registered User
 

Join Date: Nov 2005
Posts: 4
Long-winded...

I've done it a long winded way by replacing punctuation with alphanumeric tags (e.g. REMOVE1) resorted the files and and then do the join. This works fine as the tags are matched exactly whereas the punctuation was not. However, this seems a ridiculous way to do it - there must be a better one!

I think it may be to do with the way UNIX matches which I think you can change with the LC_COLLATE variable but I'm not sure.
Reply With Quote
Google UNIX.COM
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 11:15 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0