Modification of Summation Script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Modification of Summation Script
# 8  
Old 08-28-2012
here is what the second file looks like.
It is a csv file(comma delimited), the header row/first line of this 2nd file should be ignored/skipped as well.

$ cat flagfile.csv
UPC,name
PL_2003007476012,pl1
PL_2003000322606,pl2
PL_2003005081201,pl3
PL_2003010201151,pl4

so we need to only compare the first column of this second file, should be compared to $7 supposedly of the first file.

i executed it with second file, and it resulted like this

Code:
[cc1771@bdhp4423:/home/cc1771]
$ sh qa_del_2.sh ft-GNCT-3398-CD-2012-07-07-140112.txt.gz "     " 22,23,1,2,4 flagfile.csv
sales_date|cust_id|UPC|sum(POS_QTY)|sum(POS_AMT)|<source_file>
2012-07-05|PL_000000000034014|PL_2003007476012|75|   75.9200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003000322606|19|   19.5000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003005081201|38|   38.9800|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003010201151|39|   39.7700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003000322491|78|   78.0000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003008732988|27|   27.6300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003008035225|63|   63.2700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003006044021|58|   58.5000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
DATE|CUST_ID|PROD_ID|0|    0.0000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003007529992|51|   51.1700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003009138383|43|   43.8900|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003011293384|10|   10.5600|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003120054116|43|   43.5100|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003010377276|42|   42.2400|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003006058004|27|   27.6300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003012060213|13|   13.8100|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003000322293|7|    7.7200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003006845246|11|   11.0800|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003120054139|130|  130.5300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003000427936|25|   25.5900|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003005104689|62|   62.4200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003000322477|22|   22.7400|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz

i attached a sample of the first file and second file for your reference.

i am so happy, that it seems that the script is close to completion, i owe you everything there is to this script, thanks a lot man.
# 9  
Old 08-29-2012
Sorry about the delay in my reply. Real life caught up with me for a bit there. I'll try an answer your questions in order:

Quote:
Originally Posted by ramneim
so this Begin statement -- it just basically separates the CSV into different columns, right? (will it also split it into different arguments as well?)
Spot on, the variable col_list is "22,23,1,2,4" and the split command separates each of the five values into the col array variable, so:
Code:
col[1]="22"
col[2]="23"
col[3]="1"
col[4]="2"
col[5]="4"

The titles array is just for the output headings the 1 split statement could be replaced with:
Code:
titles[1]="sales_date"
titles[2]="cust_id"
titles[3]="UPC"

Quote:
Originally Posted by ramneim
and also in these code below:will this take the first column of the second file and compare to $7, which i think is now.. only part of $3 (csv), right?

Code:
FILENAME != "-" {EXC[$1]; next}
        !($7 in EXC) {

This stores that first column in the 2nd file into an array named EXC. And only records in the first files where field #7 is not in the EXC array are processed.

Do you really need comma seperator in the 2nd file? if So you might need to do:

Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
        !($7 in EXC) {

Also, I see from your data files that the exclude field appears to be in field 4, not 7 (you might need !($4 in EXC) {)


Quote:
Originally Posted by ramneim
so in this added code, i'm not quite sure what it does? Smilie
This strings column values required from the input line together with pipes into val and directly together into key.

So if your col_list was "1,2,3,5,7,6" and the input line was "one,two,three,four,five,six,seven" then key ends up as "fivesevensix" and val ends up with "five|seven|six".

Quote:
Originally Posted by ramneim
for this added for loop in the last part of the code:
could you kindly enlighten me with this part?
Code:
for(i=2;i<cols;i++) printf "%s|", titles[i-1];

Prints the headings from the titles array for the additional fields you add to the csv (beyond the first 3).

To remove the heading line from the output replace if(NR!=1) { with if(FNR!=1) {

Last edited by Chubler_XL; 08-29-2012 at 12:26 AM..
This User Gave Thanks to Chubler_XL For This Post:
# 10  
Old 08-29-2012
okay, thanks for the answers to those questions. i really appreciate it a lot.

so in the change that you made to the code:
Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
        !($7 in EXC) {

compared to before:
Code:
FILENAME != "-" {EXC[$1]; next}
        !($7 in EXC) {

i have additional questions again, i apologize in advance.
since the second input file is comma delimited, basically each column is separated by commas, right?
so the changes you made to the code was to be able to read the second file as having columns separated by commas? and still get the first column of that second file?

i'm not sure what this additional code does to be quite honest
Code:
{sub(/,.*$/,"")

and as to why
Code:
{EXC[$1]

is changed to
Code:
if(FNR>1)EXC[$0]

and just to add... shall i also change it further to this, given the code changes you suggested to change 7 to 4?
Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
        !($4 in EXC) {

though i have questions regarding that one, though the "exclude" field appears in field 4 of the data file, didn't we already used field 4 in $3 (csv e.g 22,23,1,2,4 ?

because i tried executing having these code changes reflected, and it resulted to this error: (im a bit confused wth this error tbh Smilie )
Code:
$ sh qa_del_3.sh ft-GNCT-3398-CD-2012-07-07-140112.txt.gz "     " 22,23,1,2,4 flagfile.csv
 syntax error The source line is 7.
 The error context is
                        !($4 in EXC) >>>  { <<<
 awk: The statement cannot be correctly parsed.
 The source line is 7.
        awk: There is a missing } character.

please see the current script now, and let me know if i updated it incorrectly
Code:
#!/usr/bin/sh
###########################################################################
###########################################################################
FILES="$1"
DELIMITER="$2"
COL_LIST=$3
EXC_FILE=${4:-/dev/null}
for FILE in $FILES
do
    gzip -t ${FILE} 2>/dev/null
    if [ $? -eq 1 ];
    then
        comm=cat
    else
        comm=gzcat
    fi
    $comm $FILE | awk -v col_list=$COL_LIST -v sourcefile=$FILE -F "$DELIMITER" '
        BEGIN {
            cols=split(col_list, col, ",");
            split("sales_date,cust_id,UPC", titles, ",")
        }
        FILENAME != "-" {sub(/,.$/,"");if(FNR>1){EXC[$0]; next}
        !($4 in EXC) {
            if(NR!=1) {
                if($col[3]!="" && $col[1]!="" && $col[2]!="") {
                   key=$col[3]
                   val=key
                   for(m=4;m<=cols;m++) {
                       key=key $col[m]
                       val=val "|" $col[m]
                   }
                   salesdate[key] = val
                   v[key] += $col[1]
                   d[key] += $col[2]
                }
            }
        }
        END {
            for(i=2;i<cols;i++) printf "%s|", titles[i-1];
            printf("sum(POS_QTY)|sum(POS_AMT)|<source_file>\n")
            for (i in v) {
                if(salesdate[i]!=1) {
                    printf("%s|%d|%10.4f|%s\n",salesdate[i],v[i],d[i],sourcefile)
                }
            }
        }' $EXC_FILE -
done

---------- Post updated at 04:52 AM ---------- Previous update was at 04:26 AM ----------

i did some trial and error for the meantime while waiting for your reply.

i only changed 7 to 4 in EXC...
Code:
#!/usr/bin/sh
###########################################################################
###########################################################################
FILES="$1"
DELIMITER="$2"
COL_LIST=$3
EXC_FILE=${4:-/dev/null}
for FILE in $FILES
do
    gzip -t ${FILE} 2>/dev/null
    if [ $? -eq 1 ];
    then
        comm=cat
    else
        comm=gzcat
    fi
    $comm $FILE | awk -v col_list=$COL_LIST -v sourcefile=$FILE -F "$DELIMITER" '
        BEGIN {
            cols=split(col_list, col, ",");
            split("sales_date,cust_id,UPC", titles, ",")
        }
        FILENAME != "-" {EXC[$1]; next}
        !($4 in EXC) {
            if(NR!=1) {
                if($col[3]!="" && $col[1]!="" && $col[2]!="") {
                   key=$col[3]
                   val=key
                   for(m=4;m<=cols;m++) {
                       key=key $col[m]
                       val=val "|" $col[m]
                   }
                   salesdate[key] = val
                   v[key] += $col[1]
                   d[key] += $col[2]
                }
            }
        }
        END {
            for(i=2;i<cols;i++) printf "%s|", titles[i-1];
            printf("sum(POS_QTY)|sum(POS_AMT)|<source_file>\n")
            for (i in v) {
                if(salesdate[i]!=1) {
                    printf("%s|%d|%10.4f|%s\n",salesdate[i],v[i],d[i],sourcefile)
                }
            }
        }' $EXC_FILE -
done

and also, i changed the 2nd file to .txt file and tab delimited like this.
Code:
$ cat flagfile.txt
PL_2003007476012        pl1
PL_2003000322606        pl2
PL_2003005081201        pl3
PL_2003010201151        pl4

and somehow, when i executed the script, the flagging already worked!!!

as you can see in this result, those values in the 2nd file are not in the list, so they were basicaly excluded correctly. Smilie
Though there is one particular weird line in the result (bold).. do you know where did this come from?
Code:
$ sh qa_del_2_1.sh ft-GNCT-3398-CD-2012-07-07-140112.txt.gz "   " 22,23,1,2,4 flagfile.txt
sales_date|cust_id|UPC|sum(POS_QTY)|sum(POS_AMT)|<source_file>
2012-07-02|PL_000000000034011|PL_2003000322491|78|   78.0000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003008732988|27|   27.6300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003008035225|63|   63.2700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003006044021|58|   58.5000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
DATE|CUST_ID|PROD_ID|0|    0.0000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003007529992|51|   51.1700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003009138383|43|   43.8900|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003011293384|10|   10.5600|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003120054116|43|   43.5100|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003010377276|42|   42.2400|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003006058004|27|   27.6300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003012060213|13|   13.8100|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003000322293|7|    7.7200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003006845246|11|   11.0800|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003120054139|130|  130.5300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003000427936|25|   25.5900|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003005104689|62|   62.4200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003000322477|22|   22.7400|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz

The flagging works perfectly, however, it seems to be applicable to .TXT file, and tab-delimited for the second file...

can we update the script to accept .csv file for the second file like this?
Code:
$ cat flagfile.csv
UPC,name
PL_2003007476012,pl1
PL_2003000322606,pl2
PL_2003005081201,pl3
PL_2003010201151,pl4

i think that's the only change left, and it's as good as done, right?
thanks a lot for all your help sir, i really couldn't have gone this far without your help.
# 11  
Old 08-29-2012
How about this:
Code:
#!/usr/bin/sh
FILES="$1"
DELIMITER="$2"
COL_LIST=$3
EXC_FILE=${4:-/dev/null}

for FILE in $FILES
do
    gzip -t ${FILE} 2>/dev/null
    if [ $? -eq 1 ]
    then
        comm=cat
    else
        comm=gzcat
    fi
    $comm $FILE | awk -v col_list=$COL_LIST -v sourcefile=$FILE -F "$DELIMITER" '
        BEGIN {
            cols=split(col_list, col, ",");
            split("sales_date,cust_id,UPC", titles, ",")
        }
        FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
        !($7 in EXC) {
            if(FNR!=1) {
                if($col[3]!="" && $col[1]!="" && $col[2]!="") {
                   key=$col[3]
                   val=key
                   for(m=4;m<=cols;m++) {
                       key=key $col[m]
                       val=val "|" $col[m]
                   }
                   salesdate[key] = val
                   v[key] += $col[1]
                   d[key] += $col[2]
                }
            }
        }
        END {
            for(i=2;i<cols;i++) printf "%s|", titles[i-1];
            printf("sum(POS_QTY)|sum(POS_AMT)|<source_file>\n")
            for (i in v) {
                if(salesdate[i]!=1) {
                    printf("%s|%d|%10.4f|%s\n",salesdate[i],v[i],d[i],sourcefile)
                }
            }
        }' $EXC_FILE -
done

sub(/,.*$/,"") = remove everything from first comma to end of line.

if(FNR>1)EXC[$0]; = Store new replaced value, from above, in EXC array (unless line number in file is 1, i.e. skip heading line).
This User Gave Thanks to Chubler_XL For This Post:
# 12  
Old 08-29-2012
hi Chubler_XL, my man, this is so cool, i tried executing the code you posted, and i actually changed 7 to 4 in EXC, because it worked the first time,

So for the first scenario, NO second file:
Code:
$ sh qa_del_7.sh ft-GNCT-3398-CD-2012-07-07-140112.txt.gz "     " 22,23,1,2,4
sales_date|cust_id|UPC|sum(POS_QTY)|sum(POS_AMT)|<source_file>
2012-07-05|PL_000000000034014|PL_2003007476012|75|   75.9200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003000322606|19|   19.5000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003005081201|38|   38.9800|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003010201151|39|   39.7700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003000322491|78|   78.0000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003008732988|27|   27.6300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003008035225|63|   63.2700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003006044021|58|   58.5000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003007529992|51|   51.1700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003009138383|43|   43.8900|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003011293384|10|   10.5600|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003120054116|43|   43.5100|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003010377276|42|   42.2400|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003006058004|27|   27.6300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003012060213|13|   13.8100|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003000322293|7|    7.7200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003006845246|11|   11.0800|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003120054139|130|  130.5300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003000427936|25|   25.5900|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003005104689|62|   62.4200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003000322477|22|   22.7400|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz

second scenario: second file? = yes, exclude records from second file from summation based on first column of second file (comma delimited)
Code:
$ sh qa_del_7.sh ft-GNCT-3398-CD-2012-07-07-140112.txt.gz "     " 22,23,1,2,4 flagfile.csv
sales_date|cust_id|UPC|sum(POS_QTY)|sum(POS_AMT)|<source_file>
2012-07-02|PL_000000000034011|PL_2003000322491|78|   78.0000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003008732988|27|   27.6300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003008035225|63|   63.2700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003006044021|58|   58.5000|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003007529992|51|   51.1700|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003009138383|43|   43.8900|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003011293384|10|   10.5600|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003120054116|43|   43.5100|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003010377276|42|   42.2400|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003006058004|27|   27.6300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003012060213|13|   13.8100|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-02|PL_000000000034011|PL_2003000322293|7|    7.7200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-05|PL_000000000034014|PL_2003006845246|11|   11.0800|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003120054139|130|  130.5300|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003000427936|25|   25.5900|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-04|PL_000000000034013|PL_2003005104689|62|   62.4200|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003000322477|22|   22.7400|ft-GNCT-3398-CD-2012-07-07-140112.txt.gz

it worked, right? this is so AMAZING! you are AMAZING, man, so awesome.

Though for this one,
Code:
sub(/,.*$/,"")

= remove everything from first comma to end of line.

i just found out, that the second column from the second file is needed.. Smilie
i apologize for saying this so late.

so the second file, now looks like this, so we need only the second column of second file to be compared, to still $7, (field 4) of first file
Code:
$ cat flagfile_2.csv
ID,UPC,name
1,PL_2003007476012,pl1
2,PL_2003000322606,pl2
3,PL_2003005081201,pl3
4,PL_2003010201151,pl4

how do we update the codes for it, then, sorry man, for causing too much trouble.

i am truly grateful for all your tremendous help. You are really a lifesaver to be honest. Thanks a lot man.
# 13  
Old 08-29-2012
Try this:
Code:
FILENAME != "-" {gsub(/,/,FS);if(FNR>1) EXC[$2];next}

The gsub replaces all commas with the field seperator character (tab in your instance). The 2nd field in the CSV can then be fetched using $2, just like it was a tab delimited file to start with.
# 14  
Old 08-30-2012
thanks again Chubler_XL, sir, for this.

though please bare with me for my questions again. i really want to learn all these codes that you've helped me with.

so this is the code before, right?
Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}

which has been changed to this:
Code:
FILENAME != "-" {gsub(/,/,FS);if(FNR>1) EXC[$2];next}

i'm not really familiar with sub and gsub though, so is it alright with you if you could tell me the difference?

though FS is field separator right? and i'm a bit unsure as to why
Code:
EXC[$0]

is changed to
Quote:
EXC[$2]
i tried executing the codes again, and it's perfect, got all the expected results i wanted! really thanks, man.


Though, there's another but LAST possibility/scenario needed for this script to handle.. Smilie

the possibility of a third file
So basically the second and third file have same purpose, to exclude records from first input file when summing values. they are both csv (comma delimited)

And for the third file, we need the first column, to be compared to 2nd field of first file:

so it's like this:
second file: (2nd column) compared to first file (field 4)
third file: (1st column) compared to first file (field 2) << this we need to add...

is it okay to add another argument for the third file when we execute it?

so basically, now the current usage is like this:
Code:
sh script.sh first_input_file "delimiter" col_1,col_2,key1,key2,key3 second_file.csv

so can we make the usage like this with the addition of the third file?
Code:
sh script.sh first_input_file "delimiter" col_1,col_2,key1,key2,key3 second_file.csv third_file.csv

is it possible to add that?

this code is usable for the third file right? since we need just the first column?
Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
        !($2 in EXC) {

although i don't know how to add it in the codes in a way that the script will still execute properly,

and do we need to declare another argument again
Code:
EXC_FILE=${4:-/dev/null}

for the third file?

or can we just make the fourth argument like a csv as well? like second_file,third_file? which one do you think would be more convenient?

im sorry for the many questions again, sorry for troubling you.
and thanks a lot for your patience so far, and i'm really learning a lot from you. you are really good at this, and your solutions are always spot on.

thanks a lot man. you're the best.

PS. for the first input file, is it possible for the first input file to accept excel (.xls) files as well?

---------- Post updated at 04:19 AM ---------- Previous update was at 03:47 AM ----------

just a curious question, though, is this scenario also possible?

because i'm trying to sum the values across multiple files.

so, in this case i have two input files (same type)

Code:
$ ll *inputfile*
-rw-r--r--   1 cc1771     all_users     6015 Aug 30 04:57 inputfile_comma.csv
-rw-r--r--   1 cc1771     all_users     6015 Aug 30 05:12 inputfile_comma_2.csv

so can the script handle summing the data across these two files as well? like before summing the data, the first input files (multiple) should be like combined in one file first? before we start excluding and summing data?

is that functionality possible in this script?
and the usage be like this?

Code:
sh script.sh *inputfile* "delimiter" col_1,col_2,key1,key2,key3 second_file.csv third_file.csv


Last edited by ramneim; 08-30-2012 at 06:53 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Need small modification in script

Hi All, In the below script, I am calling one sql file test.sql If this file returns any data then I have to generate this file test_$RUN_DATE.FCNA If the sql files returns no data then I dont want to generate this file test_$RUN_DATE.FCNA. I tried one approach like: check the size of FCNA files... (1 Reply)
Discussion started by: praveenk768
1 Replies

2. Shell Programming and Scripting

awk script modification

can someone help me identify what i'm doing wrong here: awk -F'|' 'BEGIN{c=0} /./ && /./ { if ($3 < 2) { print ; c++ } END { print c":OK" } else if (($3 >= 2) && ($3 < 4)) { print ; c++ } END { print c":WARNING" } else if ($3 >= 4) { print ; c++ } END { print c":CRITICAL" } }'... (4 Replies)
Discussion started by: SkySmart
4 Replies

3. Shell Programming and Scripting

Modification in script

Hi, I have below script, i want to monitor that that ntp server listed in setting is under sync or not. I wrote below script but it is not working properly. Here are problems, first it should server under sync if "*" shows and rest if shows "+" it means it is next server in waiting list.... (4 Replies)
Discussion started by: learnbash
4 Replies

4. Shell Programming and Scripting

ksh script modification

Hi I have some list of files in a .dat i need to read them line by line and assing them to variables. For ex: list of files are some,some1 i need two variables g1 as some and g2 as some1. and then need to perform some operations on g1 and g2 for which i can get some o/p, i need to capture... (2 Replies)
Discussion started by: Ravindra Swan
2 Replies

5. Shell Programming and Scripting

awk script modification

I want the below script to omit every chunk of data that contains a specific hostname. here's the scenario. i have a configuration file that contains the configuration of several hosts. a sample of this configuration file is this: define host { address ... (12 Replies)
Discussion started by: SkySmart
12 Replies

6. Shell Programming and Scripting

Modification in shell script

Hello Team, I have prepared script which will check for listening message for ports 1199,1200 and 1201. I need modifcation in script in such a way that if port 1200 is not listening then it should message rmi port 1200 is not listening. Smap for port 1199 and 1201. kindly guide me to acheive... (4 Replies)
Discussion started by: coolguyamy
4 Replies

7. Shell Programming and Scripting

Help with Shell Script Modification

Hi all Iam very new to Shell Scripting, I have to modify a shell script looking at an existing one except that it will query against some table X in A database. Befor Spooling check if there are any reload files if there archive the files. The above scipt executes some abc.sql which will b a new... (2 Replies)
Discussion started by: Varunkv
2 Replies

8. Shell Programming and Scripting

time modification in script

Hi All.. I have a file with a number of non-unique entries as below: 1243 01:42:29,567 --> 01:42:32,108 blah blah .... blah blah .. 1244 01:42:32,709 --> 01:42:34,921 blah blah .... 1245 01:42:35,214 --> 01:42:36,533 blah blah .... blah blah .. blah blah .... blah blah .. (4 Replies)
Discussion started by: UniRock
4 Replies

9. Shell Programming and Scripting

Need a modification on this script

Hi All I have files contains rows which look like this: 2 20090721_16:58:47.173 JSUD2 JD1M1 20 IAM 966591835270 249918113182 b 3610 ACM b 3614 ACM b 3713 CPG b 3717 CPG f 5799 REL b 5815 RLC b 5817 RLC :COMMA: NCI=00,FCI=6101,CPC=0A,TMR=00,OFI=00,USI: :COMMB: BCI=1234: :RELCAUSE:10: ... (1 Reply)
Discussion started by: zanetti321
1 Replies

10. Shell Programming and Scripting

help in script modification

i have the following perl script.but it searches for a given filename. i want to run the same script in my directoy which has subdirectories too and it has to display the file if sreach satisfies along with directory name. can anyone help me: perl script: my $FILE = $ARGV; for zf in... (4 Replies)
Discussion started by: a.suryakumar
4 Replies
Login or Register to Ask a Question