File updation on matching key


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers File updation on matching key
# 1  
Old 06-09-2016
File updation on matching key

I have input file like Input.dat with below content
Code:
RRD       0Z91YUn000000Lk 9000100001 103020151117   STMT151117155527001            0000 2 000000 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117   STMT151117155527001            0000 3 000000 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117   STMT151117155527002            0000 1 000000 000007
RRD       0Z91YUn00002tng 9000100002 103020151117   STMT151117155527002            0000 0 000000 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117   STMT151117155527003            0000 1 000000 000007
RRD       0Z91YUn00004oid 9000100003 103020151117   STMT151117155527003            0000 0 000000 000006
RRD       0Z91YUn00005l6b 9000100004 103020151117   STMT151117155527004            0000 1 000000 000007
RRD       0Z91YUn00006jda 9000100004 103020151117   STMT151117155527004            0000 0 000000 000006

1				2				3		4					5						6	7	8 		9  <----- column number (This line is not part of file)

Here 3 column is the key column and 7th and 9th are the value columns. I want to do some arithematic operations on 7th & 9th column and update that
value to 8th column on matching of 3rd key column.
example : here first two row has same key column(3rd column) (9000100001) so I want to do
((((7th column value + 9th column value)of row 1)+((7th column value + 9th column value)of row 2))/2)
ie (((2+4)+(3+3))/2)=000006 (leading zeros for complete 6 digit number)
so i have to update the first 2 rows(key is 9000100001)of origal Input.dat files 8th column with 000006 value
So final output of above example is as
Code:
RRD       0Z91YUn000000Lk 9000100001 103020151117    STMT151117155527001            0000 2 000006 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117    STMT151117155527001            0000 3 000006 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117    STMT151117155527002            0000 1 000007 000007
RRD       0Z91YUn00002tng 9000100002 103020151117    STMT151117155527002            0000 0 000007 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117    STMT151117155527003            0000 1 000007 000007
RRD       0Z91YUn00004oid 9000100003 103020151117    STMT151117155527003            0000 0 000007 000006
RRD       0Z91YUn00005l6b 9000100004 103020151117    STMT151117155527004            0000 1 000007 000007
RRD       0Z91YUn00006jda 9000100004 103020151117    STMT151117155527004            0000 0 000007 000006

I have done script for this requirement as below : Final_Count.sh
Code:
#!/bin/sh
JOBFOLDER=/opt/hpx/exstream/StatementAppA/inputs/POC/testSort
input_file=$JOBFOLDER/Input.dat
count_output_file=$JOBFOLDER/Count_output.dat
final_updated_file=$JOBFOLDER/Final_updated.dat
awk ' BEGIN {FS=" "}
function round(A) {
return int( A + 0.5 )
}
 {a[$3]+=$7; b[$3]+=$9} END{for(x in a) print x,round((a[x]+b[x])/2)
}' $input_file >>$count_output_file
echo "--- `basename $count_output_file `---"
cat $count_output_file
echo "---------------- ` basename $final_updated_file ` --------------"
awk ' BEGIN {FS=" "} FNR==NR{a[$1]=$2;next}
{
    if(a[$3] >= 0 && a[$3] <= 9)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 10 && a[$3] <= 99)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 100 && a[$3] <= 999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 1000 && a[$3] <= 9999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 10000 && a[$3] <= 99999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 100000 && a[$3] <= 999999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?a[$3]:"NA",$9,$10
    }
}' $count_output_file $input_file >> $final_updated_file
echo "----------------------------------------------------"
cat $final_updated_file
rm $final_updated_file $count_output_file

and output is :
--- Count_output.dat---
Code:
9000100002 7
9000100003 7
9000100004 7
9000100001 6

---------------- Final_updated.dat --------------
----------------------------------------------------
Code:
RRD       0Z91YUn000000Lk 9000100001 103020151117    STMT151117155527001            0000 2 000006 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117    STMT151117155527001            0000 3 000006 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117    STMT151117155527002            0000 1 000007 000007
RRD       0Z91YUn00002tng 9000100002 103020151117    STMT151117155527002            0000 0 000007 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117    STMT151117155527003            0000 1 000007 000007
RRD       0Z91YUn00004oid 9000100003 103020151117    STMT151117155527003            0000 0 000007 000006
RRD       0Z91YUn00005l6b 9000100004 103020151117    STMT151117155527004            0000 1 000007 000007
RRD       0Z91YUn00006jda 9000100004 103020151117    STMT151117155527004            0000 0 000007 000006

-------------------------------------------------------------------------------------------------------------------

Here in this script for final updated file I am using static printing like
Code:
print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00"a[$3]:"NA",$9,$10

so here I am printing spaces after column value manually. I want this as dynamic.
In short print all column with as it is(with space) and just update 8th column.
Please can you help me with this ?

Last edited by Don Cragun; 06-09-2016 at 02:46 AM.. Reason: Add CODE and ICODE tags.
# 2  
Old 06-09-2016
Please refrain double posts in terms of highjacking your own other thread and start using code tags, thanks.
# 3  
Old 06-09-2016
Quote:
Originally Posted by zaxxon
Please refrain double posts in terms of highjacking your own other thread and start using code tags, thanks.
Code:
#!/bin/sh
JOBFOLDER=/opt/hpx/exstream/StatementAppA/inputs/POC/testSort
input_file=$JOBFOLDER/Input.dat
count_output_file=$JOBFOLDER/Count_output.dat
final_updated_file=$JOBFOLDER/Final_updated.dat
awk ' BEGIN {FS=" "}
function round(A) {
return int( A + 0.5 )
}
 {a[$3]+=$7; b[$3]+=$9} END{for(x in a) print x,round((a[x]+b[x])/2)
}' $input_file >>$count_output_file
echo "--- `basename $count_output_file `---"
cat $count_output_file
echo "---------------- ` basename $final_updated_file ` --------------"
awk ' BEGIN {FS=" "} FNR==NR{a[$1]=$2;next}
{
    if(a[$3] >= 0 && a[$3] <= 9)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 10 && a[$3] <= 99)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 100 && a[$3] <= 999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 1000 && a[$3] <= 9999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 10000 && a[$3] <= 99999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 100000 && a[$3] <= 999999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?a[$3]:"NA",$9,$10
    }
}' $count_output_file $input_file >> $final_updated_file
echo "----------------------------------------------------"
cat $final_updated_file
rm $final_updated_file $count_output_file

# 4  
Old 06-09-2016
Moderator's Comments:
Mod Comment This thread was split off from a post added into the middle of a discussion on a completely different topic. Please create a new thread when you want to discuss a new topic.

As has been requested many times before, PLEASE use CODE tags when displaying sample input, sample output, and code segments. Do not depend on forum moderators and administrators to clean up your posts for you (hoping that we will guess correctly on how your data should be formatted.

Your calculations seem to assume that there will always be exactly two lines with a given 3rd field value. Are these pairs of lines always adjacent (as in your sample input and output)? Can there be just one or more than two lines with a given 3rd field value (and, if so, should the division be by 2 or by the number of lines with that 3rd field value)?

Why are you printing ten fields to your output file when there are only nine input fields? This is adding a trailing space to each line of your output file that is not present in the sample output that you say you want???

Does the width of any of your input columns vary, or are the column widths for each column constant for files this script will process?
# 5  
Old 06-09-2016
Your calculations seem to assume that there will always be exactly two lines with a given 3rd field value. Are these pairs of lines always adjacent (as in your sample input and output)? Can there be just one or more than two lines with a given 3rd field value (and, if so, should the division be by 2 or by the number of lines with that 3rd field value)?

Quote:
No there will be not exactly two lines with given 3rd field value. It can 1,2 or 3 . Depending on input data.These pairs of lines will always adjacent(because 3rd key column is sorted).
Why are you printing ten fields to your output file when there are only nine input fields? This is adding a trailing space to each line of your output file that is not present in the sample output that you say you want???
Quote:
This 10th value printed by mistake. You can skip that.
Does the width of any of your input columns vary, or are the column widths for each column constant for files this script will process?
Quote:
the width of columns may change. This is the major issue I am facing here.
If you could give the solution for this to be printed dynamically with updated value of column 8 will be very helpful.
# 6  
Old 06-09-2016
You didn't answer the question about whether the calculations done in your 1st awk script should always divide by 2 or should instead divide by the number of lines containing a given 3rd column value.

And, it is past my bedtime and I have an early morning meeting tomorrow. I have working code (with no temp files and only one awk script) now for fixed width columns and two input lines per 3rd column value. I'll work on adjustments for varying field widths and varying numbers of lines per 3rd column value tomorrow afternoon (assuming you answer the questions above and below before I get up later this morning).

Do you need to print the count_output_fie contents you're producing now, or is that just debug information? You are currently removing both of the files your script is creating at the end of the script. Am I correct in assuming that these are temp files that do not need to be kept when your script exits?

My current script just prints the output produced by your 2nd awk script. Is that sufficient (with the other changes discussed above)?
# 7  
Old 06-09-2016
Here is a new example to explain in detail-
new Input.dat will have content like
Code:
RRD       0Z91YUn000000Lk 9000100001 103020151117   STMT151117155527001            0000 2 000000 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117   STMT151117155527001            0000 3 000000 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117   STMT151117155527002            0000 1 000000 000007
RRD       0Z91YUn00002tng 9000100002 103020151117   STMT151117155527002            0000 0 000000 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117   STMT151117155527003            0000 1 000000 000007
RRD       0Z8xgNq000000Lk 9000100005 33820160129    STMT151117155527001            0000 5 000000 000006
RRD       0Z8xgNq00000hs6 9000100005 33820160129    STMT151117155527001            0000 3 000000 000006
RRD       0Z8xgNq00001HxO 9000100006 33820160129    STMT151117155527002            0000 0 000000 000006

Quote:
In above sample input you can see in row 6,7 & 8 the size of column 4 is reduced by 1 digit(ie 11 digit). for columns 1 to 6 the size of 4th column is 12 digits.
Output for above input using my code is
Code:
RRD       0Z91YUn000000Lk 9000100001 103020151117    STMT151117155527001            0000 2 000006 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117    STMT151117155527001            0000 3 000006 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117    STMT151117155527002            0000 1 000007 000007
RRD       0Z91YUn00002tng 9000100002 103020151117    STMT151117155527002            0000 0 000007 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117    STMT151117155527003            0000 1 000004 000007
RRD       0Z8xgNq000000Lk 9000100005 33820160129    STMT151117155527001            0000 5 000010 000006
RRD       0Z8xgNq00000hs6 9000100005 33820160129    STMT151117155527001            0000 3 000010 000006
RRD       0Z8xgNq00001HxO 9000100006 33820160129    STMT151117155527002            0000 0 000003 000006

Quote:
But expected output is :
Code:
RRD       0Z91YUn000000Lk 9000100001 103020151117    STMT151117155527001            0000 2 000006 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117    STMT151117155527001            0000 3 000006 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117    STMT151117155527002            0000 1 000007 000007
RRD       0Z91YUn00002tng 9000100002 103020151117    STMT151117155527002            0000 0 000007 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117    STMT151117155527003            0000 1 000004 000007
RRD       0Z8xgNq000000Lk 9000100005 33820160129     STMT151117155527001            0000 5 000010 000006
RRD       0Z8xgNq00000hs6 9000100005 33820160129     STMT151117155527001            0000 3 000010 000006
RRD       0Z8xgNq00001HxO 9000100006 33820160129     STMT151117155527002            0000 0 000003 000006

---------- Post updated at 02:12 PM ---------- Previous update was at 02:03 PM ----------

Quote:
My latest code is :
Code:
#!/bin/sh
JOBFOLDER=/opt/hpx/exstream/StatementAppA/inputs/POC/testSort
input_file=$JOBFOLDER/Input.dat
count_output_file=$JOBFOLDER/Count_output.dat
final_updated_file=$JOBFOLDER/Final_updated.dat
awk ' BEGIN {FS=" "}
function round(A) {
return int( A + 0.5 )
}
 {a[$3]+=$7; b[$3]+=$9} END{for(x in a) print x,round((a[x]+b[x])/2)
}' $input_file >>$count_output_file
echo "--- `basename $count_output_file `---"
cat $count_output_file
echo "---------------- ` basename $final_updated_file ` --------------"
awk ' BEGIN {FS=" "} FNR==NR{a[$1]=$2;next}
{
    if(a[$3] >= 0 && a[$3] <= 9)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00000"a[$3]:"NA",$9
    }
    else if(a[$3] >= 10 && a[$3] <= 99)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0000"a[$3]:"NA",$9
    }
    else if(a[$3] >= 100 && a[$3] <= 999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"000"a[$3]:"NA",$9
    }
    else if(a[$3] >= 1000 && a[$3] <= 9999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00"a[$3]:"NA",$9
    }
    else if(a[$3] >= 10000 && a[$3] <= 99999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0"a[$3]:"NA",$9
    }
    else if(a[$3] >= 100000 && a[$3] <= 999999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?a[$3]:"NA",$9
    }
}' $count_output_file $input_file >> $final_updated_file
echo "----------------------------------------------------"
cat $final_updated_file
rm $final_updated_file $count_output_file

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matching 2 files based on key

Hi all I have two files I need to match record from first file and second file on column 1,8 and and output only match records on file1 File1: 020059801803180116130926800002090000800231000245204003160000000002000461OUNCE000000350000100152500BM01007W0000 ... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies

2. UNIX for Dummies Questions & Answers

awk - Print lines if only matching key is found

I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but - Currently, script is not printing lines to File3.txt in order. Thanks a lot. Any help is appreciated. Script I am using: awk 'FNR == NR && ! /^]*$/ {... (9 Replies)
Discussion started by: High-T
9 Replies

3. Shell Programming and Scripting

Rsa public private key matching

Hi All, I have a requirement where i need to check if an rsa public key corresponds to a private key and hence return success or failure. Currently i am using the command diff <( ssh-keygen -y -e -f "$PRIVKEY" ) <( ssh-keygen -y -e -f "$PUBLICKEY" ) and its solving my purpose. This is in... (1 Reply)
Discussion started by: mritusmoi
1 Replies

4. Shell Programming and Scripting

Config file auto-updation

Hello All, I need to update my .cfg file which is used in the script for almost all runs. myfile.cfg file: var=1 var1=1 run=0 script: #! /bin/sh . /mydir/myfile.cfg echo $var"\t" $var1 exit So, the requirement is that the myfile.cfg should update every time I run the... (10 Replies)
Discussion started by: PikK45
10 Replies

5. Shell Programming and Scripting

check files updation

Hi All, Can anyone help to write the script to check files updation? i have files as mentioned below. which will be updated some time. i just want to check the last file is updating the data for last 15 mins or not. if its not updating i want to print NOT OK. if its updating data i want... (1 Reply)
Discussion started by: steve2216
1 Replies

6. AIX

Problem with updation of 'quota'

Hi, We have recently implemented 'quota' concept for the unix users. softlimit - 230MB hardlimit - 250MB We have applied the quota when few of users are more than the hardlimit,issue is that even though the users cleared the space, still its 'quota' was not updating properly. For some... (0 Replies)
Discussion started by: girish_satyam
0 Replies

7. Shell Programming and Scripting

awk should output if one input file doesnt have matching key

nawk -F, 'FNR==NR{a= $3 ;next} $2 in a{print $1, 'Person',$2, a}' OFS=, filea fileb Input filea Input fileb output i am getting : (2 Replies)
Discussion started by: pinnacle
2 Replies

8. Shell Programming and Scripting

Matching by key fields

I have a file (key.dat) that contains two columns: AA|1234| BB|567| CC|8910| I have another file (extract.dat) that contains some data: SD|458|John|Smith| AA|3345|Frank|Williams| AA|1234|Bill|Garner| BD|0098|Yu|Lin| BB|567|Gail|Hansen| CC|8910|Ken|Nielsen| I want to compare the... (5 Replies)
Discussion started by: ChicagoBlues
5 Replies

9. Linux

gcc updation on Linux machine

Hi All, I already have gcc complier installed in my machine. Its version is : gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5) I am not sure whethere it's is latest gcc version available. I want to update my gcc version. Can anyone please suggest me what is the latest and stable gcc... (1 Reply)
Discussion started by: bisla.yogender
1 Replies

10. Programming

Directory updation Notification?

Hi, I'm a UNIX newbie .. so forgive me if this question sounds dumb. :) Is it possible for Unix to notify a process that a particular directory has been updated? Rather that the process constantly polling the directory ... Awaiting your replies .. Thanks, VJ (6 Replies)
Discussion started by: vjsony
6 Replies
Login or Register to Ask a Question