![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Development Releases: Linux Mint 4.0 Beta "Fluxbox", 4.0 Alpha "Debian" | iBot | UNIX and Linux RSS News | 0 | 01-04-2008 11:00 AM |
| Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`" | Lokesha | UNIX for Dummies Questions & Answers | 4 | 12-19-2007 09:52 PM |
| No utpmx entry: you must exec "login" from lowest level "shell" | peterpan | UNIX for Dummies Questions & Answers | 0 | 01-18-2006 12:15 AM |
| join two lines when the second line contains "US DOLLAR" | powah | Shell Programming and Scripting | 2 | 10-21-2005 03:30 PM |
| Help~~join and "multijoin" | hyo77 | Shell Programming and Scripting | 1 | 11-18-2003 09:20 PM |
|
|
Submit Tools | LinkBack | Thread Tools | Display Modes |
|
|||
|
need help using "join"
Dear experts,
I urgently need to know how to join these 2 files that match. I have one file which looks like Quote:
Quote:
Sara |
| Forum Sponsor | ||
|
|
|
|||
|
Two things:
I've never used join, so you're probably going to get a better answer from someone else. I didn't find the problem statement very clear, and looking at the data didn't help much. It appears that there isn't any commonality between the data. For example, are the first entries from the two files supposed to correlate? 1-0-0060122450000 0060122000550 Are the 00060122 numbers what you're trying to join on? If so, what is the desired output you're after? Anyway, I experimented with join a bit and found the results could easily be sent through awk to simply print the fields you need. Maybe you should approach it like that. If you post a more detailed description of what you're after it may help. |
|
|||
|
sorry i was not precise.
The two files are huge, over 2 million records each. But the format is the same. Only difference is that one file has the additional "1-0-" or "1-1-" or "1-3-". Im doing it a long way by cutting characters 1-4 off and then using paste and join again later. Appreciate if you could let me know how to do it using awk. Thanks !! sara |
|
|||
|
Well my assumptions on what you were trying to do were bad, so my awk solution didn't pan out for me. However I gave it a shot with a really short python script, and I think I may have what you need.
To be honest, you still didn't give me a clear idea of what you wanted your output to look like, so here's what I assumed. If I'm wrong, then sorry, this is my last shot. Using the first four lines of your data, I think you want this output: 0060122450000 2000550 0060122450001 2000632 0060122450002 2001374 0060122450006 2004006 The first number is from file_a with the first four chars chopped off. The second number has the first value stripped off. If this is correct, then here's a super simple python script to get that for you: script name: foo.py # open the data files fa = open('file_a','rb') fb = open('file_b','rb') #Go through the files line by line stripping out just the parts # that you want to keep. Also strip the newlines from the end. for line in fa: bita = line[4:].strip('\n') tmpb = fb.readline().strip('\n') # You could add a chk here to ensure tmpb matches bita. You'd have to do some additional chopping though. With millions of records, I'd do it. bitb = tmpb[-7:] print bita,bitb # close the files fa.close() fb.close() Run the script like this: shellPrompt$ python foo.py The script makes the assumption that your data files are matched up correctly, with the entries matching position-wise all the way through. If they're not, then this won't work without modifications. Your data will be wrong if the values are shifted. And finally, there are probably more elegant python or shell techniques of doing this, but this works. Good luck. Last edited by H2OBoodle; 02-02-2008 at 04:18 AM. Reason: give more info on how to run the script, warn on data corruption if data is not aligned. |
|
|||
|
join
Aismann
Will you pls make clear what you want in output file ?? see for successful execution of join commd you require the field should be of same length and sorted in the same order . i.e. if digit -> sort -n and if alfabet then -> sort -d . so the both the file will be sorted on the same order and for faster (a bit ) execution keep the field as 1st and use join -1 1 -2 1 -t( the delimiter if you have used any ) -o 1.1 1.2 1.3 file1 file 2 > output file . And your job is done. Enjoy. |