I need some logic that would help to group up some records that fall between two dates:
Input Data
Code:
COL_1 COL_2 COL_3 COL_4 COL_5 COL_6 COL_7 COL_8 COL_9 COL_10 COL_11 COL_12
C ABC ABCD 3 ZZ WLOA 2015-12-01 2015-12-15 975.73 ZZZ P 147018.64
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-01-31 898.86 ZZZ P 129018.66
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-02-29 788.81 ZZZ P 110912.18
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-02-29 18106.48 ZZZ P 110912.18
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-05-31 1652.2 ZZZ P 55947.43
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-08-31 650.05 ZZZ P 45500.00
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-09-20 500.15 ZZZ P 37525.00
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-10-01 357.05 ZZZ P 12385.00
We will pass VAR_DATE as a parameter to the script, which we will then use to work out the grouping from this value.
For each value in COL_4 (Grouping Column) we need to group up the records that are within a certain date range.
In the above example therefore, we would like to group records with the below logic:
Code:
COL_7 > VAR_DATE AND COL_8 <= VAR_DATE + 1 Year
Example Output (Year 1)
Code:
C ABC ABCD 3 ZZ WLOA VAR_DATE VAR_DATE + 1 YEAR SUM ALL VALUES ZZZ P FINAL RECORD IN COL_12 BALANCE FOR GIVEN ID (COL_4)
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-12-01 2663.40 ZZZ P 110912.18
We then need to group up the values for the next year (Year 2)
We would then like to repeat the same, but for year 2, using the below logic:
Code:
COL_7 > VAR_DATE + 1 Year AND COL_8 <= VAR_DATE + 2 Year
Example Output (Year 2)
Code:
C ABC ABCD 3 ZZ WLOA VAR_DATE + 1 YEAR VAR_DATE + 2 YEAR SUM ALL VALUES ZZZ P FINAL RECORD IN COL_12 BALANCE FOR GIVEN ID (COL_4)
C ABC ABCD 3 ZZ WLOA 2016-12-01 2017-12-01 21265.93 ZZZ P 12385.00
We will need to keep doing this logic up to 5 years / groups but this amount of years could change so ideally the loop/amount of groups required needs to be dynamic/parameterised.
The final output would look like below, with the two new records generated from the above logic, appended to the end of the file:
Code:
COL_1 COL_2 COL_3 COL_4 COL_5 COL_6 COL_7 COL_8 COL_9 COL_10 COL_11 COL_12
C ABC ABCD 3 ZZ WLOA 2015-12-01 2015-12-15 975.73 ZZZ P 147018.64
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-01-31 898.86 ZZZ P 129018.66
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-02-29 788.81 ZZZ P 110912.18
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-02-29 18106.48 ZZZ P 110912.18
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-05-31 1652.2 ZZZ P 55947.43
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-08-31 650.05 ZZZ P 45500.00
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-09-20 500.15 ZZZ P 37525.00
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-10-01 357.05 ZZZ P 12385.00
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-12-01 2663.40 ZZZ P 110912.18 ** (New record from above logic - Year 1 ) **
C ABC ABCD 3 ZZ WLOA 2016-12-01 2017-12-01 21265.93 ZZZ P 12385.00 ** (New record from above logic - Year 2 ) **
As the above logic will be running over a large amount of records, I would assume AWK will be the most efficient solution to the above, however my experience of AWK is extremely limited, therefore I am unsure as to how to proceed with starting the above logic.
Last edited by Ads89; 02-16-2016 at 04:20 AM..
Reason: Expanding with further information of requirements.
Thanks for the above, that solution works well for generating the different years I require, but I also need a way to group these records as mentioned above.
We need to group records that fall between VAR_DATE and VAR_DATE +1 year and do some calculations on some of the columns i.e. Sum of all records (COL_9) and Final record value in COL_12
What are the names of the input and output files you want to process?
Give us more detail about your input file:
What is the sort order for your input file?
What is supposed to happen if the input date is something like 2015-01-01 with a line like the 3rd line in your sample input: C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-01-31 898.86 ZZZ P 129018.66
where the apparent start date in COL_7 is within the range of 1 year, but the apparent end date in COL_8 is not in that same range?
Your stated logic isn't clear about what constitutes the final record that determines how COL_12 is supposed to be set. Is it the last line found in the file, the line with the highest COL_7 value, the line with the highest COL_8 value, or something else?
awk 'FNR==NR {a; next} $NF in a' genes.txt refseq_exons.txt > output.txt
I can not figure out how to group the same name in $4 together.
Basically, all the SKI together in separate rows and all the TGFB2. Thank you :).
chr1 2160133 2161174 SKI
chr1 218518675 218520389 TGFB2... (1 Reply)
Hi guys,
I am a complete newbie to unix and have been tasked with creating a script to group the following data (file) by hourly slots so that I can count the transactions completed within the peak hour.
I am not sure how to group data like this in unix. Can anyone please help?
Here is an... (1 Reply)
Hi all,
I am using following command:
perl program.pl input.txt output.txt CUTOFF 3 > groups_3.txt
containing program.pl, two files (input.txt, output.txt) and getting output in groups_3.txt:
But, I wish to have 30 files corresponding to each CUTOFF ranging from 0 to 30 using the same... (1 Reply)
I have a text file in this format.
Group: AAA
Notes: IP : 11.11.11.11
#User xxxxxxxxx
#Password aaaaaaaaaaaaaaaa
Group: AAA
Notes: IP : 11.11.11.22
#User yyyyyyyyyyyyy
#Password bbbbbbbbbbbbb (8 Replies)
Hi,
I am having some troubles using /usr/sbin/logadm to rotate sulog yearly. Can someone please assist with the correct syntax to rotate the sulog yearly? I'd like to maintain up to 3 years of logs.
I am on Solaris 10.
Thanks, (1 Reply)
I have a text file that contains numbers (listed from the smallest to the largest).
For ex.
34
817
1145
1645
1759
1761
3368
3529
4311
4681
5187
5193
5199
5417
5682
.
. (5 Replies)
Hi all,
I have a set data as shown below, and i would like to eliminate the name that no children - boy and girl. What is the appropriate command can i use(other than grep)? Please assist...
My input:
name sex marital status children - boy children - girl ... (3 Replies)
Hi everyone,
I need a help on Unix scripting.
I have a file is like this
Date Amt
20071205 10
20071204 10
20071203 200
20071204 300
20071203 400
20071205 140
20071203 100
20071205 100... (1 Reply)