Joining Two Files Does not Work as Expected


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Joining Two Files Does not Work as Expected
# 8  
Old 08-23-2012
Quote:
Originally Posted by yirgacheffe
I was suggested the padding with 0 solution....unfortunately can not find a way to do it.
Unix has an abundance of text-manipulating commands: sed, awk, the shell itself, ... Probably any of these can do the padding. Here is a solution based on sed: it will change any number with less than 4 digits at the beginning of the line to a 0-padded 4-digit number:

Code:
1     -> 0001
12    -> 0012
123   -> 0123
1234 -> 1234

Code:
sed 's/^[0-9][0-9][0-9][^0-9]/0&/;s/^[0-9][0-9][^0-9]/00&/;s/^[0-9][^0-9]/000&/' /path/to/infile

Explanation:
"s"="substitute". Change what is between the first pair of "/" to what is between the second pair
"/^[0-9][0-9][0-9][^0-9]/"=search pattern. Search for "^" (beginning of line), followed by 3 numerical characters ("[0-9]"), followed by a non-numerical character ("[^0-9]")
"/0&/" Replacement string: replace what you have found with a "0", followed by what you have found ("&")

The other two commands are analogous.

I hope this helps.

bakunin

Last edited by bakunin; 08-23-2012 at 08:16 AM..
This User Gave Thanks to bakunin For This Post:
# 9  
Old 08-23-2012
Code:
awk 'NR==FNR{a[$1]=$2;next}(a[$1]){print $0,a[$1]}' file2 file1

output is
Code:
950 0.0 1612.0 -163.34
950 212.0 1762.0 -163.34
950 488.0 1912.0 -163.34
950 772.0 2024.0 -163.34
950 1032.0 2199.0 -163.34
950 1308.0 2474.0 -163.34
950 1548.0 2799.0 -163.34
950 1776.0 3062.0 -163.34
950 2028.0 3324.0 -163.34
950 2320.0 3524.0 -163.34
950 3000.0 4000.0 -163.34
950 3500.0 4000.0 -163.34
1000 0.0 1612.0 -162.65
1000 165.0 1855.0 -162.65
1000 288.0 1887.0 -162.65
1000 496.0 1949.0 -162.65
1000 724.0 2024.0 -162.65
1000 896.0 2124.0 -162.65
1000 1052.0 2274.0 -162.65
1000 1320.0 2524.0 -162.65
1000 1548.0 2880.0 -162.65
1000 1684.0 3012.0 -162.65
1000 1880.0 3112.0 -162.65
1000 2264.0 3337.0 -162.65
1000 3000.0 4000.0 -162.65
1000 3500.0 4000.0 -162.65
1050 0.0 1612.0 -162.19
1050 152.0 1780.0 -162.19
1050 248.0 1834.0 -162.19
1050 359.0 1923.0 -162.19
1050 488.0 1962.0 -162.19
1050 652.0 2037.0 -162.19
1050 808.0 2099.0 -162.19
1050 948.0 2212.0 -162.19
1050 1064.0 2324.0 -162.19
1050 1312.0 2574.0 -162.19
1050 1428.0 2712.0 -162.19
1050 1556.0 2837.0 -162.19
1050 1652.0 2924.0 -162.19
1050 1752.0 3037.0 -162.19
1050 1944.0 3187.0 -162.19
1050 2260.0 3424.0 -162.19
1050 3000.0 4000.0 -162.19
1050 3500.0 4000.0 -162.19
.
.
.

is this output you required
This User Gave Thanks to raj_saini20 For This Post:
# 10  
Old 08-23-2012
You can use sth like
Code:
sed '/^1000/,$ !s/^/0/' filen

(if the 1000 line definitely exists) on both input files to produce two temp files, and then use
Code:
sed 's/^0//'

on the output to undo the leading zeros.
This User Gave Thanks to RudiC For This Post:
# 11  
Old 08-23-2012
Thanks a lot everyone, I think raj_saini20 nailed it, this is what I need.
Was going with

Code:
sed 's/^[0-9][0-9][0-9][^0-9]/0&/;s/^[0-9][0-9][^0-9]/00&/;s/^[0-9][^0-9]/000&/' /path/to/infile

to insert leading 0s, but this would have required more help from RudiC as my columns to join by are not necessarily t the start of the file.....i would have needed the padded 0s anywhere between columns 6 and 11...
Thanks again,
cheers
Sid


Moderator's Comments:
Mod Comment Please use code tags next time for your code and data as asked 2 times already for in moderator notes.

Last edited by zaxxon; 08-23-2012 at 08:25 AM.. Reason: code tags
# 12  
Old 08-23-2012
Quote:
Originally Posted by yirgacheffe
to insert leading 0s, but this would have required more help from RudiC as my columns to join by are not necessarily t the start of the file.....i would have needed the padded 0s anywhere between columns 6 and 11...
You are welcome, but it is generally easier on all of us (including you) when you state your requirements as clearly as possible up front. We can only solve, what we are told and it is more work to come up with 3 intermediate solutions, which won't work on the real data and only then with a final solution, which actually works, than it is to do the final solution up front.

bakunin
This User Gave Thanks to bakunin For This Post:
# 13  
Old 08-23-2012
I'm afraid awk with arrays will run out of space with the large files you mentioned, esp. as file2 seems to have many lines. Pls come back with the result, for curiosity.
This User Gave Thanks to RudiC For This Post:
# 14  
Old 08-23-2012
Thanks again everyone,

the final awk approach did work, my longest file had ~1500 lines.
And apologies if I could not explain everything properly from the start....like the fact that my files not necessarily start on column 1....I put up my problem and hoped someone will help me solve it.
I am glad you all helped, thanks again,

cheers
Sid
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Bash script does not work as expected

Repeat this text in a file named notes.txt and run the script Before bash is a good language a blank line appears Also, the following notes are displayed incorrectly What is bad? ================================== Title : Note 1 ================================== Category: Computer Date... (3 Replies)
Discussion started by: cesar60
3 Replies

2. Shell Programming and Scripting

Why my SETUID does not work as expected?

Hi All, Thanks for your help in advanced. Could you please kindly help on why my SETUID does not work? create a file, only root can read write it /tmp>ls -l a.log -rw------- 1 root root 3 Nov 12 18:57 a.log create a script under root with SETUID /tmp>ls -l a.sh -rwsr-sr-x 1 root... (3 Replies)
Discussion started by: summer_cherry
3 Replies

3. UNIX for Dummies Questions & Answers

sed command does not work as expected

Why when I use this command do I get "E123"? echo NCE123 | sed -n 's/\(.*\)\(\{1,\}\{1,5\}\)\(.*\)/\2/p' But when I used this command, I get NCE123? echo NCE123 | sed -n 's/\(.*\)\(\{3\}\{1,5\}\)\(.*\)/\2/p' I thought \{1,\} would mean any number of characters and \{1,5\ would mean 1-5... (1 Reply)
Discussion started by: newbie2010
1 Replies

4. Shell Programming and Scripting

Help with joining files and adding headers to files

Hi, I have about 20 tab delimited text files that have non sequential numbering such as: UCD2.summary.txt UCD45.summary.txt UCD56.summery.txt The first column of each file has the same number of lines and content. The next 2 column have data points: i.e UCD2.summary.txt: a 8.9 ... (8 Replies)
Discussion started by: rrdavis
8 Replies

5. UNIX for Dummies Questions & Answers

Joining two files

I have two comma separated files. I want to join those filesa nd put the result in separate file. smaple data are: file1: A1,1,100 A2,1,200 B1,2,100 B2,2,200 file2 1,50 1,25 1,25 1,100 1,100 2,50 2,50 (10 Replies)
Discussion started by: pandeesh
10 Replies

6. Shell Programming and Scripting

Parsing XML in awk : OFS does not work as expected

Hi, I am trying to parse regular XML file where I have to reduce number of decimal points in some xml elements. I am using following AWK command to achive that : #!/bin/ksh EDITCMD='BEGIN { FS = ""; OFS=FS } { if ( $3 ~ "*\\.*" && length(substr($3,1+index($3,"."))) == 15 ) {... (4 Replies)
Discussion started by: martin.franek
4 Replies

7. UNIX for Dummies Questions & Answers

For some reason, my grep doesn't work as expected

I am trying to find only those entries where 7018 and another number appear in the end of the line. 7018 2828 1423 2351 7018 2828 14887 2828 7018 1222 123 7018 1487 I am looking for a way to generate only the last two lines. I was trying to do just "grep '7018{1,5}" but it does not... (5 Replies)
Discussion started by: Legend986
5 Replies

8. Shell Programming and Scripting

Script doesn't work as expected when run on cron

The script checks for free space stats on Oracle. If there are any tablespaces with more than 85% usage it prints the details of the tablespace. If all the tablespaces have more than 15% free space, then "All tablespaces have more than 15 pct free space" must be printed on the screen. When I run... (2 Replies)
Discussion started by: RoshniMehta
2 Replies

9. Shell Programming and Scripting

Help with joining two files

Greetings, all. I've got a project that requires I join two data files together, then do some processing and output. Everything must be done in a shell script, using standard unix tools. The files look like the following: File_1 Layout: Acct#,Subacct#,Descrip Sample: ... (3 Replies)
Discussion started by: rjlohman
3 Replies
Login or Register to Ask a Question