The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Development Releases: Linux Mint 4.0 Beta "Fluxbox", 4.0 Alpha "Debian" iBot UNIX and Linux RSS News 0 01-04-2008 11:00 AM
Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`" Lokesha UNIX for Dummies Questions & Answers 4 12-19-2007 09:52 PM
No utpmx entry: you must exec "login" from lowest level "shell" peterpan UNIX for Dummies Questions & Answers 0 01-18-2006 12:15 AM
join two lines when the second line contains "US DOLLAR" powah Shell Programming and Scripting 2 10-21-2005 03:30 PM
Help~~join and "multijoin" hyo77 Shell Programming and Scripting 1 11-18-2003 09:20 PM

Reply
 
Submit Tools LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 02-01-2008
Registered User
 

Join Date: Apr 2005
Posts: 22
Stumble this Post!
need help using "join"

Dear experts,

I urgently need to know how to join these 2 files that match.

I have one file which looks like
Quote:
1-0-0060122450000
1-0-0060122450001
1-0-0060122450002
1-0-0060122450006
1-0-0060122450007
1-0-0060122450014
1-0-0060122450021
1-0-0060122450024
1-0-0060122450028
1-0-0060122450029
and another file that looks like
Quote:
0060122000550
0060122000632
0060122001374
0060122004006
0060122004141
0060122004607
0060122011124
0060122014392
0060122014537
i desperately need to know how to match the 2 files with using the second "-" as a field separator. Ive been tryong all day i cannot figure it out. Please help !!!

Sara
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 02-02-2008
Registered User
 

Join Date: Jan 2008
Posts: 18
Stumble this Post!
Two things:

I've never used join, so you're probably going to get a better answer from someone else.

I didn't find the problem statement very clear, and looking at the data didn't help much. It appears that there isn't any commonality between the data. For example, are the first entries from the two files supposed to correlate?
1-0-0060122450000 0060122000550

Are the 00060122 numbers what you're trying to join on? If so, what is the desired output you're after?

Anyway, I experimented with join a bit and found the results could easily be sent through awk to simply print the fields you need. Maybe you should approach it like that.

If you post a more detailed description of what you're after it may help.
Reply With Quote
  #3 (permalink)  
Old 02-02-2008
Registered User
 

Join Date: Apr 2005
Posts: 22
Stumble this Post!
sorry i was not precise.
The two files are huge, over 2 million records each. But the format is the same. Only difference is that one file has the additional "1-0-" or "1-1-" or "1-3-".

Im doing it a long way by cutting characters 1-4 off and then using paste and join again later.

Appreciate if you could let me know how to do it using awk. Thanks !!

sara
Reply With Quote
  #4 (permalink)  
Old 02-02-2008
Registered User
 

Join Date: Jan 2008
Posts: 18
Stumble this Post!
Well my assumptions on what you were trying to do were bad, so my awk solution didn't pan out for me. However I gave it a shot with a really short python script, and I think I may have what you need.

To be honest, you still didn't give me a clear idea of what you wanted your output to look like, so here's what I assumed. If I'm wrong, then sorry, this is my last shot.

Using the first four lines of your data, I think you want this output:
0060122450000 2000550
0060122450001 2000632
0060122450002 2001374
0060122450006 2004006

The first number is from file_a with the first four chars chopped off. The second number has the first value stripped off.

If this is correct, then here's a super simple python script to get that for you:

script name: foo.py

# open the data files
fa = open('file_a','rb')
fb = open('file_b','rb')

#Go through the files line by line stripping out just the parts
# that you want to keep. Also strip the newlines from the end.

for line in fa:
bita = line[4:].strip('\n')
tmpb = fb.readline().strip('\n')
# You could add a chk here to ensure tmpb matches bita. You'd have to do some additional chopping though. With millions of records, I'd do it.
bitb = tmpb[-7:]
print bita,bitb

# close the files
fa.close()
fb.close()

Run the script like this:
shellPrompt$ python foo.py

The script makes the assumption that your data files are matched up correctly, with the entries matching position-wise all the way through. If they're not, then this won't work without modifications. Your data will be wrong if the values are shifted.

And finally, there are probably more elegant python or shell techniques of doing this, but this works.

Good luck.

Last edited by H2OBoodle; 02-02-2008 at 04:18 AM. Reason: give more info on how to run the script, warn on data corruption if data is not aligned.
Reply With Quote
  #5 (permalink)  
Old 02-02-2008
Registered User
 

Join Date: Feb 2007
Posts: 51
Stumble this Post!
join

Aismann
Will you pls make clear what you want in output file ?? see for successful
execution of join commd you require the field should be of same length
and sorted in the same order . i.e. if digit -> sort -n and if alfabet
then -> sort -d . so the both the file will be sorted on the same order
and for faster (a bit ) execution keep the field as 1st and use
join -1 1 -2 1 -t( the delimiter if you have used any ) -o 1.1 1.2 1.3 file1
file 2 > output file . And your job is done.

Enjoy.
Reply With Quote
  #6 (permalink)  
Old 02-05-2008
Registered User
 

Join Date: Apr 2005
Posts: 22
Stumble this Post!
Thanks h20boodle and mahesh. They both work really well !!! Thanks again
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 10:53 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0