The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM
Home Forums Register Rules & FAQ Members List Arcade Search Today's Posts Mark Forums Read


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!


Other UNIX.COM Threads You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Development Releases: Linux Mint 4.0 Beta "Fluxbox", 4.0 Alpha "Debian" iBot UNIX and Linux RSS News 0 01-04-2008 11:00 AM
Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`" Lokesha UNIX for Dummies Questions & Answers 4 12-19-2007 09:52 PM
No utpmx entry: you must exec "login" from lowest level "shell" peterpan UNIX for Dummies Questions & Answers 0 01-18-2006 12:15 AM
join two lines when the second line contains "US DOLLAR" powah Shell Programming and Scripting 2 10-21-2005 03:30 PM
Help~~join and "multijoin" hyo77 Shell Programming and Scripting 1 11-18-2003 09:20 PM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-01-2008
Registered User
 

Join Date: Apr 2005
Posts: 20
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
need help using "join"

Dear experts,

I urgently need to know how to join these 2 files that match.

I have one file which looks like
Quote:
1-0-0060122450000
1-0-0060122450001
1-0-0060122450002
1-0-0060122450006
1-0-0060122450007
1-0-0060122450014
1-0-0060122450021
1-0-0060122450024
1-0-0060122450028
1-0-0060122450029
and another file that looks like
Quote:
0060122000550
0060122000632
0060122001374
0060122004006
0060122004141
0060122004607
0060122011124
0060122014392
0060122014537
i desperately need to know how to match the 2 files with using the second "-" as a field separator. Ive been tryong all day i cannot figure it out. Please help !!!

Sara
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 02-02-2008
Registered User
 

Join Date: Jan 2008
Posts: 18
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Two things:

I've never used join, so you're probably going to get a better answer from someone else.

I didn't find the problem statement very clear, and looking at the data didn't help much. It appears that there isn't any commonality between the data. For example, are the first entries from the two files supposed to correlate?
1-0-0060122450000 0060122000550

Are the 00060122 numbers what you're trying to join on? If so, what is the desired output you're after?

Anyway, I experimented with join a bit and found the results could easily be sent through awk to simply print the fields you need. Maybe you should approach it like that.

If you post a more detailed description of what you're after it may help.
Reply With Quote
  #3 (permalink)  
Old 02-02-2008
Registered User
 

Join Date: Apr 2005
Posts: 20
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
sorry i was not precise.
The two files are huge, over 2 million records each. But the format is the same. Only difference is that one file has the additional "1-0-" or "1-1-" or "1-3-".

Im doing it a long way by cutting characters 1-4 off and then using paste and join again later.

Appreciate if you could let me know how to do it using awk. Thanks !!

sara
Reply With Quote
  #4 (permalink)  
Old 02-02-2008
Registered User
 

Join Date: Jan 2008
Posts: 18
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Well my assumptions on what you were trying to do were bad, so my awk solution didn't pan out for me. However I gave it a shot with a really short python script, and I think I may have what you need.

To be honest, you still didn't give me a clear idea of what you wanted your output to look like, so here's what I assumed. If I'm wrong, then sorry, this is my last shot.

Using the first four lines of your data, I think you want this output:
0060122450000 2000550
0060122450001 2000632
0060122450002 2001374
0060122450006 2004006

The first number is from file_a with the first four chars chopped off. The second number has the first value stripped off.

If this is correct, then here's a super simple python script to get that for you:

script name: foo.py

# open the data files
fa = open('file_a','rb')
fb = open('file_b','rb')

#Go through the files line by line stripping out just the parts
# that you want to keep. Also strip the newlines from the end.

for line in fa:
bita = line[4:].strip('\n')
tmpb = fb.readline().strip('\n')
# You could add a chk here to ensure tmpb matches bita. You'd have to do some additional chopping though. With millions of records, I'd do it.
bitb = tmpb[-7:]
print bita,bitb

# close the files
fa.close()
fb.close()

Run the script like this:
shellPrompt$ python foo.py

The script makes the assumption that your data files are matched up correctly, with the entries matching position-wise all the way through. If they're not, then this won't work without modifications. Your data will be wrong if the values are shifted.

And finally, there are probably more elegant python or shell techniques of doing this, but this works.

Good luck.

Last edited by H2OBoodle : 02-02-2008 at 04:18 AM. Reason: give more info on how to run the script, warn on data corruption if data is not aligned.
Reply With Quote
  #5 (permalink)  
Old 02-02-2008
Registered User
 

Join Date: Feb 2007
Posts: 38
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
join

Aismann
Will you pls make clear what you want in output file ?? see for successful
execution of join commd you require the field should be of same length
and sorted in the same order . i.e. if digit -> sort -n and if alfabet
then -> sort -d . so the both the file will be sorted on the same order
and for faster (a bit ) execution keep the field as 1st and use
join -1 1 -2 1 -t( the delimiter if you have used any ) -o 1.1 1.2 1.3 file1
file 2 > output file . And your job is done.

Enjoy.
Reply With Quote
Google UNIX.COM
Reply



Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -7. The time now is 08:37 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger

Search Engine Optimization by vBSEO 3.1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102