Use non alphanumerics in join


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Use non alphanumerics in join
# 1  
Old 12-16-2005
Use non alphanumerics in join

Hi,

I have a problem while joining two sorted files with "join".

File 1.txt
Alnus|123
ALO140102|234
ALO 1401 02|345
ALO-1401-02|456
Alobar Holoprosencephalies|567

File 2.txt
1|Alnus|
1|ALO 1401 02|
1|ALO-1401-02|
1|Alobar Holoprosencephalies|

If I join the files as follows:
join -i -t '|' -1 1 -2 2 file1.txt file2.txt

this doesn't work because the join command ignores punctuation i.e. it checks ALO140102 against file 2 and when it doesn't find a match it moves on to Alobar Holoprosencephalies. If ALO140102 IS present in file 2 then the match works fine. Therefore I need to get the join command to recognise non-alphanumerics.

Any ideas?!!
# 2  
Old 12-16-2005
Long-winded...

I've done it a long winded way by replacing punctuation with alphanumeric tags (e.g. REMOVE1) resorted the files and and then do the join. This works fine as the tags are matched exactly whereas the punctuation was not. However, this seems a ridiculous way to do it - there must be a better one!

I think it may be to do with the way UNIX matches which I think you can change with the LC_COLLATE variable but I'm not sure.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies

2. Shell Programming and Scripting

grep and alphanumerics

Hi Experts, I'm facing an issue with grep or may be Im missing something. Following are the details Input file # more regexp asdh1987 dog897you 981towm 1234oqn 4yuop8pou sam99917c00l Akoold0g8 data sample data Im trying to grep out only the alphanumeric entries and following... (10 Replies)
Discussion started by: maverick_here
10 Replies

3. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

4. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

5. Shell Programming and Scripting

How to filter Alphanumerics

Hi all, Can you please let me know how to achieve the below thing AAABBCC@#$W123 123@#$DFG<>. Output: AAABBCW123 123DFG i.e I want to filer only alphanumerics from the strings (4 Replies)
Discussion started by: marcus_kosaman
4 Replies

6. UNIX for Dummies Questions & Answers

Join

Hi.. Just a question: Let's say we have three files: test1.txt --------- 11111111 6 12121212 7 12345678 4 11112222 8 test2.txt --------- 11111111 5 12345678 6 11112222 6 test3.txt --------- 12345678 6 12121212 8 (3 Replies)
Discussion started by: nelsonandwee
3 Replies

7. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

8. Programming

sql,multiple join,outer join issue

example sql: select a.a1,b.b1,c.c1,d.d1,e.e1 from a left outer join b on a.x=b.x left outer join c on b.y=c.y left outer join d on d.z=a.z inner join a.t=e.t I know how single outer or inner join works in sql. But I don't really understand when there are multiple of them. can... (0 Replies)
Discussion started by: robbiezr
0 Replies

9. Shell Programming and Scripting

join (pls help on join command)

Hi, I am a new learner of join command. Some result really make me confused. Please kindly help me. input: file1: LEO oracle engineer 210375 P.Jones Office Runner ID897 L.Clip Personl Chief ID982 S.Round UNIX admin ID6 file2: Dept2C ID897 6 years Dept5Z ID982 1 year Dept3S ID6 2... (1 Reply)
Discussion started by: summer_cherry
1 Replies

10. Shell Programming and Scripting

Strip all non-alphanumerics

Hi, Can someone let me know how do I strip out any non-alphanumeric character in string tomake it alphanumeric? i.e abc def ghi ->abcdefghi abc-def-ghi ->abcdefghi abc#def-ghi->abcdefghi Thanks in advance (3 Replies)
Discussion started by: braindrain
3 Replies
Login or Register to Ask a Question