Visit The New, Modern Unix Linux Community


Replace one column from fixed width file with another column from another file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Replace one column from fixed width file with another column from another file
# 1  
Replace one column from fixed width file with another column from another file

Hi Forum.

I tried to search online for the solution but most of the examples I found the data that I'm trying to manipulate doesn't quite match (col pos#134, 12 bytes) and my code is not working as expected.

Format of the output file should remain the same and if the data value from File 1 is not found in File 2, it should replaced with blank spaces.

Sample Input Data - File 1:
Code:
1GTD         03/03/20R20200303010004170USD
23923                     3684761733     000000378183JAN20 ABC                     01/31/20NET20       02/20/20  3605        1007  DR606951-000  1020  Software                    N/A            GST_SELF
23923                     3684787963     000001258323JAN20 BBB C                   01/31/20NET20       02/20/20  3605        1007  DR606951-000  1020  Software                    N/A            GST_SELF
26288                     40169          000000250000PRO SERV                      12/31/18NET20       01/20/19  3605        1007  DR607650-000  1020  Software                     N/A            GST_SELF
26288                     INV-600        000000400000PRO SERV REMOTE               05/31/19NET20       06/20/19  3605        1007  DR607650-000  1020  Software                     N/A            GST_SELF
26731                     26955          000003519000MAR20-FEB21 PRO               01/23/20NET20       02/12/20  3605        1007  DR162010-000  1007  Software                     N/A            GST_SELF
9GTD         03/03/20R20200303010004000000005805506000000000000000000005

Sample Input Data - File 2:
Code:
606951-000|7543
607650-000|9654
100145-050|

Output expected:
Code:
1GTD         03/03/20R20200303010004170USD
23923                     3684761733     000000378183JAN20 ABC                     01/31/20NET20       02/20/20  3605        1007  DR7543        1020  Software                     N/A            GST_SELF
23923                     3684787963     000001258323JAN20 BBB C                   01/31/20NET20       02/20/20  3605        1007  DR7543        1020  Software                     N/A            GST_SELF
26288                     40169          000000250000PRO SERV                      12/31/18NET20       01/20/19  3605        1007  DR9654        1020  Software                     N/A            GST_SELF
26288                     INV-600        000000400000PRO SERV REMOTE               05/31/19NET20       06/20/19  3605        1007  DR            1020  Software                     N/A            GST_SELF
26731                     26955          000003519000MAR20-FEB21 PRO               01/23/20NET20       02/12/20  3605        1007  DR            1007  Software                     N/A            GST_SELF
9GTD         03/03/20R20200303010004000000005805506000000000000000000005


Code:
cat File1.txt | while read line
do
  code=`echo "${line}" | awk '{print substr($0,134,12)}' | awk '{ gsub(/[ ]+/,""); print }'` 
  cn=$(awk -v CID=$code '$1==CID {print $2}' FS=\| File2.txt)
  awk -v CN=$cn 'BEGIN {FIELDWIDTHS="1 10 15 15 12 30 8 12 8 2 12 6 2 12 6 15 14 15 8"} {$14=CN} 1'
done

Any help would be greatly appreciated.

Thank you.
# 2  
Hi
try this
Code:
awk '
NR==FNR  {pat["DR"$1]=$2; next}
NF>12    { sub($(NF-4),"DR" pat[$(NF-4)])}
1' FS='|' File2.txt FS='[[:blank:]]+' File1.txt


Last edited by nezabudka; 03-09-2020 at 04:26 PM..
# 3  
Thanks nezabudka for your response. Your code yielded a very close results - only thing is that the update column value is not 12 bytes long (like the original).

Also, if I read your code correctly, you are looking for a "DR" string - it doesn't always have to be DR - could be something else. Also, what would happen if DR value appears in a different column position.

Code:
1GTD         03/03/20R20200303010004170USD
23923                     3684761733     000000378183JAN20 ABC                     01/31/20NET20       02/20/20  3605        1007  DR7543  1020  Software                     N/A            GST_SELF
23923                     3684787963     000001258323JAN20 BBB C                   01/31/20NET20       02/20/20  3605        1007  DR7543  1020  Software                     N/A            GST_SELF
26288                     40169          000000250000PRO SERV                      12/31/18NET20       01/20/19  3605        1007  DR9654  1020  Software                     N/A            GST_SELF
26288                     INV-600        000000400000PRO SERV REMOTE               05/31/19NET20       06/20/19  3605        1007  DR9654  1020  Software                     N/A            GST_SELF
26731                     26955          000003519000MAR20-FEB21 PRO               01/23/20NET20       02/12/20  3605        1007  DR  1007  Software                     N/A            GST_SELF
9GTD         03/03/20R20200303010004000000005805506000000000000000000005

# 4  
Hi, @pchang
NF>12 We select for modification only those lines where there are more than 12 fields(columns).
In other words, you can NF>3 to cut off the first and last lines and avoid the error output.
Because the first and last line contains only 2 fields, which means if we subtract from 2-4 (FN-4)
we get an error - the fields with the number $(-2) cannot exist.
We select only 5 from the end of the fields for substitution (NF-4), which means
that the appearance of the 'DR' in other fields will remain untouched.
If a different letter prefix is expected in the fourth field from the end
then the program must be rewritten

Last edited by nezabudka; 03-09-2020 at 06:17 PM..
# 5  
Hi nezabudka.

This is an external file that we receive from a third party vendor and there's no guarantee that we will always have a space in between the fields (so NF-4 might not always work correctly).

I think it would be better to look for field value starting at column position#134 for 12 bytes and replace that value. Then we wouldn't need to be concerned if it's "DR" or "CR" or something else.

Unfortunately, I'm stuck on how to go about writing the code.

Let me know if you need any other clarifications.

Thanks
Paul
This User Gave Thanks to pchang For This Post:
# 6  
Adapting nezabudka's proposal, (untested):


Code:
awk '
NR==FNR    {pat[$1] = $2
            next
           }

           {$0 = substr ($0, 1, 133) sprintf ("%12s", pat[substr ($0, 134, 12)]) substr ($0, 147)
           }
1
' FS='|' File2.txt  File1.txt

This User Gave Thanks to RudiC For This Post:
# 7  
Code:
awk '
NR==FNR {pat[$1]=$2; next}
NF>12   {b=a=$(NF-4)
         sub(/[0-9-]+$/, "", a)
         sub(/^[[:alpha:]]+/, "", b)
         sub($(NF-4), a pat[b])}
1' FS='|' File2.txt FS='[[:blank:]]+' File1.txt

--- Post updated at 10:30 ---

Hi, @pchang
I did not notice your comment. I think it would be more correct to finalize the program from @RudiC

--- Post updated at 10:47 ---

I seem to have counted correctly Smilie
Code:
NR==FNR    {pat[$1] = $2
            next
           }
           {$0 = substr ($0, 1, 133) pat[substr ($0, 134, 10)] substr($0, 144)
           }
1 ' FS='|' File2.txt  File1.txt


Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #374
Difficulty: Easy
In 2004, a new edition of the POSIX:2002 standard was released called POSIX:2004 (formally: IEEE Std 1003.1-2004).
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

UNIX command -Filter rows in fixed width file based on column values

Hi All, I am trying to select the rows in a fixed width file based on values in the columns. I want to select only the rows if column position 3-4 has the value AB I am using cut command to get the column values. Is it possible to check if cut -c3-4 = AB is true then select only that... (2 Replies)
Discussion started by: ashok.k
2 Replies

2. Shell Programming and Scripting

Print column details from fixed width file using awk command

hi, i have a fixed width file with multiple columns and need to print data using awk command. i use: awk -F "|" '($5 == BH) {print $1,$2,$3}' <non_AIM target>.txt for a delimiter file. but now i have a fixed width file like below: 7518 8269511BH 20141224951050N8262 11148 8269511BH... (5 Replies)
Discussion started by: kcdg859
5 Replies

3. Shell Programming and Scripting

To replace the value of the column in a fixed width file

I have a fixed with file with header & trailer length having the same length of the detail record file. The details record length of this file is 24, for Header and Trailer the records will be padded with spaces to match the record length of the file Currently I am adding 3 spaces in header... (14 Replies)
Discussion started by: ginrkf
14 Replies

4. Shell Programming and Scripting

How to split a fixed width text file into several ones based on a column value?

Hi, I have a fixed width text file without any header row. One of the columns contains a date in YYYYMMDD format. If the original file contains 3 dates, I want my shell script to split the file into 3 small files with data for each date. I am a newbie and need help doing this. (14 Replies)
Discussion started by: bhanja_trinanja
14 Replies

5. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Hi, How to output the duplicate record to another file. We say the record is duplicate based on a column whose position is from 2 and its length is 11 characters. The file is a fixed width file. ex of Record: DTYU12333567opert tjhi kkklTRG9012 The data in bold is the key on which... (1 Reply)
Discussion started by: Qwerty123
1 Replies

6. Shell Programming and Scripting

row to column and position data in to fixed column width

Dear friends, Below is my program and current output. I wish to have 3 or 4 column output in order to accomodate in single page. i do have subsequent command to process after user enter the number. Program COUNT=1 for MYDIR in `ls /` do VOBS=${MYDIR} echo "${COUNT}. ${MYDIR}" ... (4 Replies)
Discussion started by: baluchen
4 Replies

7. Shell Programming and Scripting

Changing one column of delimited file column to fixed width column

Hi, Iam new to unix. I have one input file . Input file : ID1~Name1~Place1 ID2~Name2~Place2 ID3~Name3~Place3 I need output such that only first column should change to fixed width column of 15 characters of length. Output File: ID1<<12 spaces>>Name1~Place1 ID2<<12... (5 Replies)
Discussion started by: manneni prakash
5 Replies

8. Shell Programming and Scripting

edit entire column from a fixed-width file using awk or sed

Col1 Col2 Col3 Col4 12 Completed 08 0830 12 In Progress 09 0829 11 For F U 07 0828 Considering the file above, how could i replace the third column the most efficient way? The actual file size is almost 1G. I am... (10 Replies)
Discussion started by: tamahomekarasu
10 Replies

9. Shell Programming and Scripting

Comparing column of variable length anf fixed width file

Hi, I have two input files. File1: ID Name Place 1-234~name1~Newyork 1-34~name2~Boston 1-2345~name3~Hungary File1 is a variable length file where each column is seperated by delimitter "~". File2: ID Country 1-34<<11 SPACES>>USA<<7 spaces>> 1-234<<10 SPACES>>UK<<8... (5 Replies)
Discussion started by: manneni prakash
5 Replies

Featured Tech Videos