File updation on matching key

06-09-2016

Registered User

14, 0

Join Date: Mar 2016

Last Activity: 21 September 2016, 5:13 AM EDT

Posts: 14

Thanks Given: 4

Thanked 0 Times in 0 Posts

File updation on matching key

I have input file like Input.dat with below content

Code:

RRD       0Z91YUn000000Lk 9000100001 103020151117   STMT151117155527001            0000 2 000000 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117   STMT151117155527001            0000 3 000000 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117   STMT151117155527002            0000 1 000000 000007
RRD       0Z91YUn00002tng 9000100002 103020151117   STMT151117155527002            0000 0 000000 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117   STMT151117155527003            0000 1 000000 000007
RRD       0Z91YUn00004oid 9000100003 103020151117   STMT151117155527003            0000 0 000000 000006
RRD       0Z91YUn00005l6b 9000100004 103020151117   STMT151117155527004            0000 1 000000 000007
RRD       0Z91YUn00006jda 9000100004 103020151117   STMT151117155527004            0000 0 000000 000006

1				2				3		4					5						6	7	8 		9  <----- column number (This line is not part of file)

Here 3 column is the key column and 7th and 9th are the value columns. I want to do some arithematic operations on 7th & 9th column and update that
value to 8th column on matching of 3rd key column.
example : here first two row has same key column(3rd column) (9000100001) so I want to do
((((7th column value + 9th column value)of row 1)+((7th column value + 9th column value)of row 2))/2)
ie (((2+4)+(3+3))/2)=000006 (leading zeros for complete 6 digit number)
so i have to update the first 2 rows(key is 9000100001)of origal Input.dat files 8th column with 000006 value
So final output of above example is as

Code:

RRD       0Z91YUn000000Lk 9000100001 103020151117    STMT151117155527001            0000 2 000006 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117    STMT151117155527001            0000 3 000006 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117    STMT151117155527002            0000 1 000007 000007
RRD       0Z91YUn00002tng 9000100002 103020151117    STMT151117155527002            0000 0 000007 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117    STMT151117155527003            0000 1 000007 000007
RRD       0Z91YUn00004oid 9000100003 103020151117    STMT151117155527003            0000 0 000007 000006
RRD       0Z91YUn00005l6b 9000100004 103020151117    STMT151117155527004            0000 1 000007 000007
RRD       0Z91YUn00006jda 9000100004 103020151117    STMT151117155527004            0000 0 000007 000006

I have done script for this requirement as below : Final_Count.sh

Code:

#!/bin/sh
JOBFOLDER=/opt/hpx/exstream/StatementAppA/inputs/POC/testSort
input_file=$JOBFOLDER/Input.dat
count_output_file=$JOBFOLDER/Count_output.dat
final_updated_file=$JOBFOLDER/Final_updated.dat
awk ' BEGIN {FS=" "}
function round(A) {
return int( A + 0.5 )
}
 {a[$3]+=$7; b[$3]+=$9} END{for(x in a) print x,round((a[x]+b[x])/2)
}' $input_file >>$count_output_file
echo "--- `basename $count_output_file `---"
cat $count_output_file
echo "---------------- ` basename $final_updated_file ` --------------"
awk ' BEGIN {FS=" "} FNR==NR{a[$1]=$2;next}
{
    if(a[$3] >= 0 && a[$3] <= 9)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 10 && a[$3] <= 99)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 100 && a[$3] <= 999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 1000 && a[$3] <= 9999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 10000 && a[$3] <= 99999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 100000 && a[$3] <= 999999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?a[$3]:"NA",$9,$10
    }
}' $count_output_file $input_file >> $final_updated_file
echo "----------------------------------------------------"
cat $final_updated_file
rm $final_updated_file $count_output_file

and output is :
--- Count_output.dat---

Code:

9000100002 7
9000100003 7
9000100004 7
9000100001 6

---------------- Final_updated.dat --------------
----------------------------------------------------

Code:

RRD       0Z91YUn000000Lk 9000100001 103020151117    STMT151117155527001            0000 2 000006 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117    STMT151117155527001            0000 3 000006 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117    STMT151117155527002            0000 1 000007 000007
RRD       0Z91YUn00002tng 9000100002 103020151117    STMT151117155527002            0000 0 000007 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117    STMT151117155527003            0000 1 000007 000007
RRD       0Z91YUn00004oid 9000100003 103020151117    STMT151117155527003            0000 0 000007 000006
RRD       0Z91YUn00005l6b 9000100004 103020151117    STMT151117155527004            0000 1 000007 000007
RRD       0Z91YUn00006jda 9000100004 103020151117    STMT151117155527004            0000 0 000007 000006

-------------------------------------------------------------------------------------------------------------------

Here in this script for final updated file I am using static printing like

Code:

print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00"a[$3]:"NA",$9,$10

so here I am printing spaces after column value manually. I want this as dynamic.
In short print all column with as it is(with space) and just update 8th column.
Please can you help me with this ?

Last edited by Don Cragun; 06-09-2016 at 02:46 AM.. Reason: Add CODE and ICODE tags.

PRAMOD 96

View Public Profile for PRAMOD 96

Find all posts by PRAMOD 96

06-09-2016

Registered User

6,575, 572

Join Date: Sep 2007

Last Activity: 5 November 2019, 9:08 AM EST

Location: St. Gallen, Switzerland

Posts: 6,575

Thanks Given: 179

Thanked 572 Times in 484 Posts

Please refrain double posts in terms of highjacking your own other thread and start using code tags, thanks.

zaxxon

View Public Profile for zaxxon

Find all posts by zaxxon

06-09-2016

Registered User

14, 0

Join Date: Mar 2016

Last Activity: 21 September 2016, 5:13 AM EDT

Posts: 14

Thanks Given: 4

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by zaxxon

Please refrain double posts in terms of highjacking your own other thread and start using code tags, thanks.

Code:

#!/bin/sh
JOBFOLDER=/opt/hpx/exstream/StatementAppA/inputs/POC/testSort
input_file=$JOBFOLDER/Input.dat
count_output_file=$JOBFOLDER/Count_output.dat
final_updated_file=$JOBFOLDER/Final_updated.dat
awk ' BEGIN {FS=" "}
function round(A) {
return int( A + 0.5 )
}
 {a[$3]+=$7; b[$3]+=$9} END{for(x in a) print x,round((a[x]+b[x])/2)
}' $input_file >>$count_output_file
echo "--- `basename $count_output_file `---"
cat $count_output_file
echo "---------------- ` basename $final_updated_file ` --------------"
awk ' BEGIN {FS=" "} FNR==NR{a[$1]=$2;next}
{
    if(a[$3] >= 0 && a[$3] <= 9)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 10 && a[$3] <= 99)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 100 && a[$3] <= 999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"000"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 1000 && a[$3] <= 9999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 10000 && a[$3] <= 99999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0"a[$3]:"NA",$9,$10
    }
    else if(a[$3] >= 100000 && a[$3] <= 999999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?a[$3]:"NA",$9,$10
    }
}' $count_output_file $input_file >> $final_updated_file
echo "----------------------------------------------------"
cat $final_updated_file
rm $final_updated_file $count_output_file

PRAMOD 96

View Public Profile for PRAMOD 96

Find all posts by PRAMOD 96

06-09-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Moderator's Comments:

This thread was split off from a post added into the middle of a discussion on a completely different topic. Please create a new thread when you want to discuss a new topic.

As has been requested many times before, PLEASE use CODE tags when displaying sample input, sample output, and code segments. Do not depend on forum moderators and administrators to clean up your posts for you (hoping that we will guess correctly on how your data should be formatted.

Your calculations seem to assume that there will always be exactly two lines with a given 3rd field value. Are these pairs of lines always adjacent (as in your sample input and output)? Can there be just one or more than two lines with a given 3rd field value (and, if so, should the division be by 2 or by the number of lines with that 3rd field value)?

Why are you printing ten fields to your output file when there are only nine input fields? This is adding a trailing space to each line of your output file that is not present in the sample output that you say you want???

Does the width of any of your input columns vary, or are the column widths for each column constant for files this script will process?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

06-09-2016

Registered User

14, 0

Join Date: Mar 2016

Last Activity: 21 September 2016, 5:13 AM EDT

Posts: 14

Thanks Given: 4

Thanked 0 Times in 0 Posts

Your calculations seem to assume that there will always be exactly two lines with a given 3rd field value. Are these pairs of lines always adjacent (as in your sample input and output)? Can there be just one or more than two lines with a given 3rd field value (and, if so, should the division be by 2 or by the number of lines with that 3rd field value)?

Quote:

No there will be not exactly two lines with given 3rd field value. It can 1,2 or 3 . Depending on input data.These pairs of lines will always adjacent(because 3rd key column is sorted).

Why are you printing ten fields to your output file when there are only nine input fields? This is adding a trailing space to each line of your output file that is not present in the sample output that you say you want???

Quote:

This 10th value printed by mistake. You can skip that.

Does the width of any of your input columns vary, or are the column widths for each column constant for files this script will process?

Quote:

the width of columns may change. This is the major issue I am facing here.
If you could give the solution for this to be printed dynamically with updated value of column 8 will be very helpful.

PRAMOD 96

View Public Profile for PRAMOD 96

Find all posts by PRAMOD 96

06-09-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You didn't answer the question about whether the calculations done in your 1st awk script should always divide by 2 or should instead divide by the number of lines containing a given 3rd column value.

And, it is past my bedtime and I have an early morning meeting tomorrow. I have working code (with no temp files and only one awk script) now for fixed width columns and two input lines per 3rd column value. I'll work on adjustments for varying field widths and varying numbers of lines per 3rd column value tomorrow afternoon (assuming you answer the questions above and below before I get up later this morning).

Do you need to print the count_output_fie contents you're producing now, or is that just debug information? You are currently removing both of the files your script is creating at the end of the script. Am I correct in assuming that these are temp files that do not need to be kept when your script exits?

My current script just prints the output produced by your 2nd awk script. Is that sufficient (with the other changes discussed above)?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

06-09-2016

Registered User

14, 0

Join Date: Mar 2016

Last Activity: 21 September 2016, 5:13 AM EDT

Posts: 14

Thanks Given: 4

Thanked 0 Times in 0 Posts

Here is a new example to explain in detail-
new Input.dat will have content like

Code:

RRD       0Z91YUn000000Lk 9000100001 103020151117   STMT151117155527001            0000 2 000000 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117   STMT151117155527001            0000 3 000000 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117   STMT151117155527002            0000 1 000000 000007
RRD       0Z91YUn00002tng 9000100002 103020151117   STMT151117155527002            0000 0 000000 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117   STMT151117155527003            0000 1 000000 000007
RRD       0Z8xgNq000000Lk 9000100005 33820160129    STMT151117155527001            0000 5 000000 000006
RRD       0Z8xgNq00000hs6 9000100005 33820160129    STMT151117155527001            0000 3 000000 000006
RRD       0Z8xgNq00001HxO 9000100006 33820160129    STMT151117155527002            0000 0 000000 000006

Quote:

In above sample input you can see in row 6,7 & 8 the size of column 4 is reduced by 1 digit(ie 11 digit). for columns 1 to 6 the size of 4th column is 12 digits.
Output for above input using my code is

Code:

RRD       0Z91YUn000000Lk 9000100001 103020151117    STMT151117155527001            0000 2 000006 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117    STMT151117155527001            0000 3 000006 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117    STMT151117155527002            0000 1 000007 000007
RRD       0Z91YUn00002tng 9000100002 103020151117    STMT151117155527002            0000 0 000007 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117    STMT151117155527003            0000 1 000004 000007
RRD       0Z8xgNq000000Lk 9000100005 33820160129    STMT151117155527001            0000 5 000010 000006
RRD       0Z8xgNq00000hs6 9000100005 33820160129    STMT151117155527001            0000 3 000010 000006
RRD       0Z8xgNq00001HxO 9000100006 33820160129    STMT151117155527002            0000 0 000003 000006

Quote:

But expected output is :

Code:

RRD       0Z91YUn000000Lk 9000100001 103020151117    STMT151117155527001            0000 2 000006 000004
RRD       0Z91YUn00000ysj 9000100001 103020151117    STMT151117155527001            0000 3 000006 000003
RRD       0Z91YUn00001vGh 9000100002 103020151117    STMT151117155527002            0000 1 000007 000007
RRD       0Z91YUn00002tng 9000100002 103020151117    STMT151117155527002            0000 0 000007 000006
RRD       0Z91YUn00003qBe 9000100003 103020151117    STMT151117155527003            0000 1 000004 000007
RRD       0Z8xgNq000000Lk 9000100005 33820160129     STMT151117155527001            0000 5 000010 000006
RRD       0Z8xgNq00000hs6 9000100005 33820160129     STMT151117155527001            0000 3 000010 000006
RRD       0Z8xgNq00001HxO 9000100006 33820160129     STMT151117155527002            0000 0 000003 000006

---------- Post updated at 02:12 PM ---------- Previous update was at 02:03 PM ----------

Quote:

My latest code is :

Code:

#!/bin/sh
JOBFOLDER=/opt/hpx/exstream/StatementAppA/inputs/POC/testSort
input_file=$JOBFOLDER/Input.dat
count_output_file=$JOBFOLDER/Count_output.dat
final_updated_file=$JOBFOLDER/Final_updated.dat
awk ' BEGIN {FS=" "}
function round(A) {
return int( A + 0.5 )
}
 {a[$3]+=$7; b[$3]+=$9} END{for(x in a) print x,round((a[x]+b[x])/2)
}' $input_file >>$count_output_file
echo "--- `basename $count_output_file `---"
cat $count_output_file
echo "---------------- ` basename $final_updated_file ` --------------"
awk ' BEGIN {FS=" "} FNR==NR{a[$1]=$2;next}
{
    if(a[$3] >= 0 && a[$3] <= 9)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00000"a[$3]:"NA",$9
    }
    else if(a[$3] >= 10 && a[$3] <= 99)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0000"a[$3]:"NA",$9
    }
    else if(a[$3] >= 100 && a[$3] <= 999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"000"a[$3]:"NA",$9
    }
    else if(a[$3] >= 1000 && a[$3] <= 9999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"00"a[$3]:"NA",$9
    }
    else if(a[$3] >= 10000 && a[$3] <= 99999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?"0"a[$3]:"NA",$9
    }
    else if(a[$3] >= 100000 && a[$3] <= 999999)
    {
        print $1"      ",$2,$3,$4"   ",$5"           ",$6,$7,a[$3]?a[$3]:"NA",$9
    }
}' $count_output_file $input_file >> $final_updated_file
echo "----------------------------------------------------"
cat $final_updated_file
rm $final_updated_file $count_output_file

PRAMOD 96

View Public Profile for PRAMOD 96

Find all posts by PRAMOD 96

UNIX for Dummies Questions & Answers

File updation on matching key

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matching 2 files based on key

Discussion started by: arunkumar_mca

2. UNIX for Dummies Questions & Answers

awk - Print lines if only matching key is found

Discussion started by: High-T

3. Shell Programming and Scripting

Rsa public private key matching

Discussion started by: mritusmoi

4. Shell Programming and Scripting

Config file auto-updation

Discussion started by: PikK45

5. Shell Programming and Scripting

check files updation

Discussion started by: steve2216

6. AIX

Problem with updation of 'quota'

Discussion started by: girish_satyam

7. Shell Programming and Scripting

awk should output if one input file doesnt have matching key

Discussion started by: pinnacle

8. Shell Programming and Scripting

Matching by key fields

Discussion started by: ChicagoBlues

9. Linux

gcc updation on Linux machine

Discussion started by: bisla.yogender

10. Programming

Directory updation Notification?

Discussion started by: vjsony