Modification of Summation Script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Modification of Summation Script
# 1  
Old 08-26-2012
Modification of Summation Script

hi guys,

i have a script that basically just sums up the values of 2 particular columns in a file, grouped by the columns specified as well

The script is quite long, but it's basically just repetitive.. just for each condition.

This script only accepts one type of input file.

Code:
#!/usr/bin/sh
###############################################################################
#									      #
# Description: 								      #
# This Shell Script sums the 2 columns as grouped by the the columns          #
# specified 								      #  
#             								      #
###############################################################################
FILENAME=$1
DELIMITER=$2
FIRST_COL=$3
SECOND_COL=$4
SALESDATE_COL=$5
STOREID=$6
UPC=$7
GTIN=$8
PROMOID=$9
echo ""
echo ".:Summation Tool:."

for FILE in ${FILENAME}
do
	gzip -t ${FILE} 2>/dev/null

	if [ $? -eq 1 ]; 
	then
		comm=cat
	else
		comm=gzcat
	fi

	if [ $# -eq 5 ]; then
		$comm $FILE | awk -v col=$FIRST_COL -v col2=$SECOND_COL -v col3=$SALESDATE_COL -v sourcefile=$FILE -F "$DELIMITER" '
			{
				if(NR!=1)
				{
					if($col3!="" && $col!="" && $col2!="")
					{
						salesdate[$col3] = $col3
						v[$col3] += $col
						d[$col3] += $col2
					}
				}
			}
			END{
				printf("\n%s%s%s%s%s%s\n","sales_date","|","sum(POS_QTY)","|","sum(POS_AMT)|","<source_file>") 
				for (i in v)
				{
					if(salesdate[i]!=1)
					{
						printf("%s%s%d%s%10.4f%s%s\n",salesdate[i],"|",v[i],"|",d[i],"|",sourcefile)
					}
				}
			}'

	elif [ $# -eq 6 ]; then
		$comm $FILE | awk -v col=$FIRST_COL -v col2=$SECOND_COL -v col3=$SALESDATE_COL -v col4=$STOREID -v sourcefile=$FILE -F "$DELIMITER" '
			{
				if(NR!=1)
				{
					if($col3!="" && $col!="" && $col2!="")
					{
						salesdate[$col3$col4] = $col3 "|" $col4
						v[$col3$col4] += $col
						d[$col3$col4] += $col2
					}
				}
			}
			END{
				printf("\n%s%s%s%s%s%s\n","sales_date|cust_id","|","sum(POS_QTY)","|","sum(POS_AMT)|","<source_file>") 
				for (i in v)
				{
					if(salesdate[i]!=1)
					{
						printf("%s%s%d%s%10.4f%s%s\n",salesdate[i],"|",v[i],"|",d[i],"|",sourcefile)
					}
				}
			}'

                         .
                         .   (same pattern for [ $# -eq 7 ] up to [ $# -eq 9 ]
                         .

	elif [ $# -gt 10 ]; then
		echo "Too many parameters passed. Please pass only the file name, delimiter, the number columns for POS_QTY and POS_AMT, and number of fields to GROUP BY."
	elif [ $# -lt 5 ]; then
		echo "Not enough parameters passed. Please pass the file name, delimiter, the number columns for POS_QTY and POS_AMT, and number of fields to GROUP BY."
	else
		echo "No parameters passed. Please pass the file name, delimiter, the number columns for POS_QTY and POS_AMT, and number of fields to GROUP BY."
	fi
done


Usage of the script:

Code:
sh <script file name> <file or files> <delimiter> <column # to sum1> <column # to sum2> <column #s for grouping>

so basically sample execution would look like this

Code:
$ sh  getsums.sh ft-gnct-3398-cd-2012-07-07-140112.txt.gz "     " 22 23 1 2 4

.:Summation Tool:.

Code:
sales_date|cust_id|UPC|sum(POS_QTY)|sum(POS_AMT)|<source_file>
2012-07-05|PL_000000000034014|PL_2003007476012|75|   75.9200|ft-gnct-3398-cd-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003000322606|19|   19.5000|ft-gnct-3398-cd-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003005081201|38|   38.9800|ft-gnct-3398-cd-2012-07-07-140112.txt.gz


THE MODIFICATION NEEDED:
1. there is an additional input file (2nd file)
- first, there should be a code added to check whether there is a 2ND file
- if there is no 2nd file, just execute the first original script as is
- if there is a 2nd file
- first column needed from this file needs to be
stored to an array to be compared with UPC=$7 which is
coming from the first file
- all those values that match between UPC=$7 and first column
of 2nd file should be "excluded in summation"
(entire line for that match)
- SO, basically, the array from first column of 2nd file needs to
be looped to UPC=$7, to compare each line if there's a match
- those that will match will be stored to a separate new array
- this new array contains the list of values that will be
excluded during summation


if you guys can come up/suggest codes for this, that would be awesome.
i would really appreciate it.
i would also like suggestions, and guide me through the whole process of building it, im quite new to this stuff, so i basically don't know anything,
but i'm willing to learn.

i hope you can all teach me and help me solve this and come up with a
correctly modified script.

Thanks a lot guys! hope to hear from you guys soon.
i really need this script done as soon as possible.

Thanks a lot.

Last edited by Scrutinizer; 08-26-2012 at 04:40 PM.. Reason: code tags
# 2  
Old 08-26-2012
I'd suggest using another env var for the exclude filename e.g. EXC_FILE and default this to /dev/null when none.

Change script to like this:

Code:
  $comm $FILE | awk -v col=$FIRST_COL -v col2=$SECOND_COL -v col3=$SALESDATE_COL -v sourcefile=$FILE -F "$DELIMITER" '
   FILENAME != "-" {EXC[$1]; next}
   !($7 in EXC){
       if(NR!=1)
       ...
}' $EXC_FILE -

Red code has been added
This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 08-27-2012
hi there, Chubler_XL, thanks a lot for your reply.

Although, do i need to add it to every if/elif statement?

and also, i'm not quite familiar with this EXC command, what exactly would it do?

will it function to exclude all data in $7 of 1st file before summarization?

so i don't even need to put the first column of the 2nd file to an array and loop to $7 anymore?

what do you think?

i appreaciate all your suggestions, thanks so much.

---------- Post updated at 09:38 AM ---------- Previous update was at 08:17 AM ----------

if there's anyone who can help me here, i would really be truly grateful for your comments/suggestions/recommendations.. thanks alot guys.
# 4  
Old 08-27-2012
This thread seems to be a follow up to a thread titled HELP with Unix scripts in summing columns in a file posted by ramneim in the Homework & Coursework Questions forum. Therefore, this looks like the next assignment for the same class, but not filed in the homework forum.

If this another assignment for the same class?
# 5  
Old 08-27-2012
no sir, i just need help that's all, and i just want to learn how to this particular modification, and it's a bit different.
i'm just honestly seeking for help from you guys.

---------- Post updated at 01:12 PM ---------- Previous update was at 01:10 PM ----------

hi there, Chubler_XL,

for the code you posted, how shall i add it to my script like this?

Code:
        if [ $# -eq 5 ]; then
                $comm $FILE | awk -v col=$FIRST_COL -v col2=$SECOND_COL -v col3=$SALESDATE_COL -v sourcefile=$FILE -F "$DELIMITER" '
			FILENAME != "-" {EXC[$1]; next}
			!($7 in EXC)
                        {
                                if(NR!=1)
                                {
                                        if($col3!="" && $col!="" && $col2!="")
                                        {
                                                salesdate[$col3] = $col3
                                                v[$col3] += $col
                                                d[$col3] += $col2
                                        }
                                }
                        }
                        END{
                                printf("\n%s%s%s%s%s%s\n","sales_date","|","sum(POS_QTY)","|","sum(POS_AMT)|","<source_file>")
                                for (i in v)
                                {
                                        if(salesdate[i]!=1)
                                        {
                                                printf("%s%s%d%s%10.4f%s%s\n",salesdate[i],"|",v[i],"|",d[i],"|",sourcefile)
                                        }
                                }
                        }'$EXC_FILE -

and also, do i need to add another argument for the second file?

it's just that, i edited the script, and i don't know how to execute it..
should there be an argument for the second file?

i tried to execute it, but it made me wonder where should i place the second file when i execute the script?

Last edited by ramneim; 08-27-2012 at 03:27 PM..
# 6  
Old 08-27-2012
Quote:
Originally Posted by ramneim
it made me wonder where should i place the second file when i execute the script?
So now you see the problem with using the number of arguments to control your script logic - it makes it hard to add optional parameters later.

How about a slight change to your input parameters...Pass a CSV of column numbers instead of seperate vars eg:

Code:
# getsums.sh ft-gnct-3398-cd-2012-07-07-140112.txt.gz " " 22,23,1,2,4

That way we can use argument #4 as the optional exclude filename.

Another advantage here is you end up with only 1 piece of awk logic for all output options:
Code:
FILES="$1"
DELIMITER="$2"
COL_LIST=$3
EXC_FILE=${4:-/dev/null}
for FILE in $FILES
do
    gzip -t ${FILE} 2>/dev/null
    if [ $? -eq 1 ];
    then
        comm=cat
    else
        comm=gzcat
    fi
    $comm $FILE | awk -v col_list=$COL_LIST -v sourcefile=$FILE -F "$DELIMITER" '
        BEGIN {
            cols=split(col_list, col, ",");
            split("sales_date,cust_id,UPC", titles, ",")
        }
        FILENAME != "-" {EXC[$1]; next}
        !($7 in EXC) {
            if(NR!=1) {
                if($col[3]!="" && $col[1]!="" && $col[2]!="") {
                   key=$col[3]
                   val=key
                   for(m=4;m<=cols;m++) {
                       key=key $col[m]
                       val=val "|" $col[m]
                   }
                   salesdate[key] = val
                   v[key] += $col[1]
                   d[key] += $col[2]
                }
            }
        }
        END {
            for(i=2;i<cols;i++) printf "%s|", titles[i-1];
            printf("sum(POS_QTY)|sum(POS_AMT)|<source_file>\n")
            for (i in v) {
                if(salesdate[i]!=1) {
                    printf("%s|%d|%10.4f|%s\n",salesdate[i],v[i],d[i],sourcefile)
                }
            }
        }' $EXC_FILE -
done

I've left the parameter validation for you to work on.

Note: that the column titles are hardcoded, you didn't give an example datafile so I'm not sure if the 1st line in the datafile could be used for the output titles or not.
This User Gave Thanks to Chubler_XL For This Post:
# 7  
Old 08-28-2012
Thanks again Chubler_XL for your inputs, i really appreciate it.

so basically, the arguments would be like this:

$1 = ft-gnct-3398-cd-2012-07-07-140112.txt.gz (1st filename)
$2 = " "
$3 = 22,23,1,2,4
$4 = the 2nd filename separated by a space from the $3? , right?

i'm trying to understand each code that you posted, so i apologize if i ask too many questions ok.

so this Begin statement -- it just basically separates the CSV into different columns, right? (will it also split it into different arguments as well?)
Though, i'm not really familiar with the split command, so i'm not sure if i understood it correctly yet...

col_list should have 5 columns right? --> $3 = 22,23,1,2,4
do we need to add in this like this?
Code:
split("first_col,second_col,sales_date, cust_id,UPC,", titles, ",")

Code:
BEGIN {
            cols=split(col_list, col, ",");
            split("sales_date,cust_id,UPC", titles, ",")
        }

and also in these code below:will this take the first column of the second file and compare to $7, which i think is now.. only part of $3 (csv), right?

Code:
FILENAME != "-" {EXC[$1]; next}
        !($7 in EXC)

so in this added code, i'm not quite sure what it does? Smilie

Code:
{
                   key=$col[3]
                   val=key
                   for(m=4;m<=cols;m++) {
                       key=key $col[m]
                       val=val "|" $col[m]
                   }

for your NOTE in the end: actually the first line/header row in the files (both first and 2nd file) should be skipped when reading the files.

for this added for loop in the last part of the code:
could you kindly enlighten me with this part?
Code:
for(i=2;i<cols;i++) printf "%s|", titles[i-1];

i'm really grateful for all your help.
i hope you have patience in me with this, thanks a lot.

Last edited by ramneim; 08-28-2012 at 09:28 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Need small modification in script

Hi All, In the below script, I am calling one sql file test.sql If this file returns any data then I have to generate this file test_$RUN_DATE.FCNA If the sql files returns no data then I dont want to generate this file test_$RUN_DATE.FCNA. I tried one approach like: check the size of FCNA files... (1 Reply)
Discussion started by: praveenk768
1 Replies

2. Shell Programming and Scripting

awk script modification

can someone help me identify what i'm doing wrong here: awk -F'|' 'BEGIN{c=0} /./ && /./ { if ($3 < 2) { print ; c++ } END { print c":OK" } else if (($3 >= 2) && ($3 < 4)) { print ; c++ } END { print c":WARNING" } else if ($3 >= 4) { print ; c++ } END { print c":CRITICAL" } }'... (4 Replies)
Discussion started by: SkySmart
4 Replies

3. Shell Programming and Scripting

Modification in script

Hi, I have below script, i want to monitor that that ntp server listed in setting is under sync or not. I wrote below script but it is not working properly. Here are problems, first it should server under sync if "*" shows and rest if shows "+" it means it is next server in waiting list.... (4 Replies)
Discussion started by: learnbash
4 Replies

4. Shell Programming and Scripting

ksh script modification

Hi I have some list of files in a .dat i need to read them line by line and assing them to variables. For ex: list of files are some,some1 i need two variables g1 as some and g2 as some1. and then need to perform some operations on g1 and g2 for which i can get some o/p, i need to capture... (2 Replies)
Discussion started by: Ravindra Swan
2 Replies

5. Shell Programming and Scripting

awk script modification

I want the below script to omit every chunk of data that contains a specific hostname. here's the scenario. i have a configuration file that contains the configuration of several hosts. a sample of this configuration file is this: define host { address ... (12 Replies)
Discussion started by: SkySmart
12 Replies

6. Shell Programming and Scripting

Modification in shell script

Hello Team, I have prepared script which will check for listening message for ports 1199,1200 and 1201. I need modifcation in script in such a way that if port 1200 is not listening then it should message rmi port 1200 is not listening. Smap for port 1199 and 1201. kindly guide me to acheive... (4 Replies)
Discussion started by: coolguyamy
4 Replies

7. Shell Programming and Scripting

Help with Shell Script Modification

Hi all Iam very new to Shell Scripting, I have to modify a shell script looking at an existing one except that it will query against some table X in A database. Befor Spooling check if there are any reload files if there archive the files. The above scipt executes some abc.sql which will b a new... (2 Replies)
Discussion started by: Varunkv
2 Replies

8. Shell Programming and Scripting

time modification in script

Hi All.. I have a file with a number of non-unique entries as below: 1243 01:42:29,567 --> 01:42:32,108 blah blah .... blah blah .. 1244 01:42:32,709 --> 01:42:34,921 blah blah .... 1245 01:42:35,214 --> 01:42:36,533 blah blah .... blah blah .. blah blah .... blah blah .. (4 Replies)
Discussion started by: UniRock
4 Replies

9. Shell Programming and Scripting

Need a modification on this script

Hi All I have files contains rows which look like this: 2 20090721_16:58:47.173 JSUD2 JD1M1 20 IAM 966591835270 249918113182 b 3610 ACM b 3614 ACM b 3713 CPG b 3717 CPG f 5799 REL b 5815 RLC b 5817 RLC :COMMA: NCI=00,FCI=6101,CPC=0A,TMR=00,OFI=00,USI: :COMMB: BCI=1234: :RELCAUSE:10: ... (1 Reply)
Discussion started by: zanetti321
1 Replies

10. Shell Programming and Scripting

help in script modification

i have the following perl script.but it searches for a given filename. i want to run the same script in my directoy which has subdirectories too and it has to display the file if sreach satisfies along with directory name. can anyone help me: perl script: my $FILE = $ARGV; for zf in... (4 Replies)
Discussion started by: a.suryakumar
4 Replies
Login or Register to Ask a Question