The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM



View Single Post in UNIX Forums - Click on the Thread or Permalink to View Entire Thread -->
  #4 (permalink)  
Old 02-02-2008
H2OBoodle H2OBoodle is offline
Registered User
 

Join Date: Jan 2008
Posts: 18
Well my assumptions on what you were trying to do were bad, so my awk solution didn't pan out for me. However I gave it a shot with a really short python script, and I think I may have what you need.

To be honest, you still didn't give me a clear idea of what you wanted your output to look like, so here's what I assumed. If I'm wrong, then sorry, this is my last shot.

Using the first four lines of your data, I think you want this output:
0060122450000 2000550
0060122450001 2000632
0060122450002 2001374
0060122450006 2004006

The first number is from file_a with the first four chars chopped off. The second number has the first value stripped off.

If this is correct, then here's a super simple python script to get that for you:

script name: foo.py

# open the data files
fa = open('file_a','rb')
fb = open('file_b','rb')

#Go through the files line by line stripping out just the parts
# that you want to keep. Also strip the newlines from the end.

for line in fa:
bita = line[4:].strip('\n')
tmpb = fb.readline().strip('\n')
# You could add a chk here to ensure tmpb matches bita. You'd have to do some additional chopping though. With millions of records, I'd do it.
bitb = tmpb[-7:]
print bita,bitb

# close the files
fa.close()
fb.close()

Run the script like this:
shellPrompt$ python foo.py

The script makes the assumption that your data files are matched up correctly, with the entries matching position-wise all the way through. If they're not, then this won't work without modifications. Your data will be wrong if the values are shifted.

And finally, there are probably more elegant python or shell techniques of doing this, but this works.

Good luck.

Last edited by H2OBoodle; 02-02-2008 at 04:18 AM. Reason: give more info on how to run the script, warn on data corruption if data is not aligned.
Reply With Quote