need help using "join"


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers need help using "join"
# 1  
Old 02-02-2008
need help using "join"

Dear experts,

I urgently need to know how to join these 2 files that match.

I have one file which looks like
Quote:
1-0-0060122450000
1-0-0060122450001
1-0-0060122450002
1-0-0060122450006
1-0-0060122450007
1-0-0060122450014
1-0-0060122450021
1-0-0060122450024
1-0-0060122450028
1-0-0060122450029
and another file that looks like
Quote:
0060122000550
0060122000632
0060122001374
0060122004006
0060122004141
0060122004607
0060122011124
0060122014392
0060122014537
i desperately need to know how to match the 2 files with using the second "-" as a field separator. Ive been tryong all day i cannot figure it out. Please help !!!

Sara
# 2  
Old 02-02-2008
Two things:

I've never used join, so you're probably going to get a better answer from someone else.

I didn't find the problem statement very clear, and looking at the data didn't help much. It appears that there isn't any commonality between the data. For example, are the first entries from the two files supposed to correlate?
1-0-0060122450000 0060122000550

Are the 00060122 numbers what you're trying to join on? If so, what is the desired output you're after?

Anyway, I experimented with join a bit and found the results could easily be sent through awk to simply print the fields you need. Maybe you should approach it like that.

If you post a more detailed description of what you're after it may help.
# 3  
Old 02-02-2008
sorry i was not precise.
The two files are huge, over 2 million records each. But the format is the same. Only difference is that one file has the additional "1-0-" or "1-1-" or "1-3-".

Im doing it a long way by cutting characters 1-4 off and then using paste and join again later.

Appreciate if you could let me know how to do it using awk. Thanks !!

sara
# 4  
Old 02-02-2008
Well my assumptions on what you were trying to do were bad, so my awk solution didn't pan out for me. However I gave it a shot with a really short python script, and I think I may have what you need.

To be honest, you still didn't give me a clear idea of what you wanted your output to look like, so here's what I assumed. If I'm wrong, then sorry, this is my last shot.

Using the first four lines of your data, I think you want this output:
0060122450000 2000550
0060122450001 2000632
0060122450002 2001374
0060122450006 2004006

The first number is from file_a with the first four chars chopped off. The second number has the first value stripped off.

If this is correct, then here's a super simple python script to get that for you:

script name: foo.py

# open the data files
fa = open('file_a','rb')
fb = open('file_b','rb')

#Go through the files line by line stripping out just the parts
# that you want to keep. Also strip the newlines from the end.

for line in fa:
bita = line[4:].strip('\n')
tmpb = fb.readline().strip('\n')
# You could add a chk here to ensure tmpb matches bita. You'd have to do some additional chopping though. With millions of records, I'd do it.
bitb = tmpb[-7:]
print bita,bitb

# close the files
fa.close()
fb.close()

Run the script like this:
shellPrompt$ python foo.py

The script makes the assumption that your data files are matched up correctly, with the entries matching position-wise all the way through. If they're not, then this won't work without modifications. Your data will be wrong if the values are shifted.

And finally, there are probably more elegant python or shell techniques of doing this, but this works.

Good luck.

Last edited by H2OBoodle; 02-02-2008 at 08:18 AM.. Reason: give more info on how to run the script, warn on data corruption if data is not aligned.
# 5  
Old 02-02-2008
join

Aismann
Will you pls make clear what you want in output file ?? see for successful
execution of join commd you require the field should be of same length
and sorted in the same order . i.e. if digit -> sort -n and if alfabet
then -> sort -d . so the both the file will be sorted on the same order
and for faster (a bit ) execution keep the field as 1st and use
join -1 1 -2 1 -t( the delimiter if you have used any ) -o 1.1 1.2 1.3 file1
file 2 > output file . And your job is done.

Enjoy.
# 6  
Old 02-06-2008
Thanks h20boodle and mahesh. They both work really well !!! Thanks again
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

5. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

6. Web Development

Perl join two files by "common" column

Hello; I am posting to get any help on my code that I have been struggling for some time. The project is to join two files each with 80k~180k rows. I want to merge them together by the shared common column. The problem of the shared column is partially matching, not exactly the same. File1:... (5 Replies)
Discussion started by: yifangt
5 Replies

7. Shell Programming and Scripting

Remove ":" and join lines in outline file

I have a vim outliner file like this: Title title 2 :Testing now :testing 2 :testing 3 title 3 :testing :ttt :ttg Is there a way to use a script or command to remove... (7 Replies)
Discussion started by: jostber
7 Replies

8. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question