here is what the second file looks like.
It is a csv file(comma delimited), the header row/first line of this 2nd file should be ignored/skipped as well.
Sorry about the delay in my reply. Real life caught up with me for a bit there. I'll try an answer your questions in order:
Quote:
Originally Posted by ramneim
so this Begin statement -- it just basically separates the CSV into different columns, right? (will it also split it into different arguments as well?)
Spot on, the variable col_list is "22,23,1,2,4" and the split command separates each of the five values into the col array variable, so:
and also in these code below:will this take the first column of the second file and compare to $7, which i think is now.. only part of $3 (csv), right?
Code:
FILENAME != "-" {EXC[$1]; next}
!($7 in EXC) {
This stores that first column in the 2nd file into an array named EXC. And only records in the first files where field #7 is not in the EXC array are processed.
Do you really need comma seperator in the 2nd file? if So you might need to do:
Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
!($7 in EXC) {
Also, I see from your data files that the exclude field appears to be in field 4, not 7 (you might need !($4 in EXC) {)
Quote:
Originally Posted by ramneim
so in this added code, i'm not quite sure what it does?
This strings column values required from the input line together with pipes into val and directly together into key.
So if your col_list was "1,2,3,5,7,6" and the input line was "one,two,three,four,five,six,seven" then key ends up as "fivesevensix" and val ends up with "five|seven|six".
Quote:
Originally Posted by ramneim
for this added for loop in the last part of the code:
could you kindly enlighten me with this part?
Code:
for(i=2;i<cols;i++) printf "%s|", titles[i-1];
Prints the headings from the titles array for the additional fields you add to the csv (beyond the first 3).
To remove the heading line from the output replace if(NR!=1) { with if(FNR!=1) {
Last edited by Chubler_XL; 08-29-2012 at 12:26 AM..
This User Gave Thanks to Chubler_XL For This Post:
okay, thanks for the answers to those questions. i really appreciate it a lot.
so in the change that you made to the code:
Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
!($7 in EXC) {
compared to before:
Code:
FILENAME != "-" {EXC[$1]; next}
!($7 in EXC) {
i have additional questions again, i apologize in advance.
since the second input file is comma delimited, basically each column is separated by commas, right?
so the changes you made to the code was to be able to read the second file as having columns separated by commas? and still get the first column of that second file?
i'm not sure what this additional code does to be quite honest
Code:
{sub(/,.*$/,"")
and as to why
Code:
{EXC[$1]
is changed to
Code:
if(FNR>1)EXC[$0]
and just to add... shall i also change it further to this, given the code changes you suggested to change 7 to 4?
Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
!($4 in EXC) {
though i have questions regarding that one, though the "exclude" field appears in field 4 of the data file, didn't we already used field 4 in $3 (csv e.g 22,23,1,2,4 ?
because i tried executing having these code changes reflected, and it resulted to this error: (im a bit confused wth this error tbh )
Code:
$ sh qa_del_3.sh ft-GNCT-3398-CD-2012-07-07-140112.txt.gz " " 22,23,1,2,4 flagfile.csv
syntax error The source line is 7.
The error context is
!($4 in EXC) >>> { <<<
awk: The statement cannot be correctly parsed.
The source line is 7.
awk: There is a missing } character.
please see the current script now, and let me know if i updated it incorrectly
Code:
#!/usr/bin/sh
###########################################################################
###########################################################################
FILES="$1"
DELIMITER="$2"
COL_LIST=$3
EXC_FILE=${4:-/dev/null}
for FILE in $FILES
do
gzip -t ${FILE} 2>/dev/null
if [ $? -eq 1 ];
then
comm=cat
else
comm=gzcat
fi
$comm $FILE | awk -v col_list=$COL_LIST -v sourcefile=$FILE -F "$DELIMITER" '
BEGIN {
cols=split(col_list, col, ",");
split("sales_date,cust_id,UPC", titles, ",")
}
FILENAME != "-" {sub(/,.$/,"");if(FNR>1){EXC[$0]; next}
!($4 in EXC) {
if(NR!=1) {
if($col[3]!="" && $col[1]!="" && $col[2]!="") {
key=$col[3]
val=key
for(m=4;m<=cols;m++) {
key=key $col[m]
val=val "|" $col[m]
}
salesdate[key] = val
v[key] += $col[1]
d[key] += $col[2]
}
}
}
END {
for(i=2;i<cols;i++) printf "%s|", titles[i-1];
printf("sum(POS_QTY)|sum(POS_AMT)|<source_file>\n")
for (i in v) {
if(salesdate[i]!=1) {
printf("%s|%d|%10.4f|%s\n",salesdate[i],v[i],d[i],sourcefile)
}
}
}' $EXC_FILE -
done
---------- Post updated at 04:52 AM ---------- Previous update was at 04:26 AM ----------
i did some trial and error for the meantime while waiting for your reply.
i only changed 7 to 4 in EXC...
Code:
#!/usr/bin/sh
###########################################################################
###########################################################################
FILES="$1"
DELIMITER="$2"
COL_LIST=$3
EXC_FILE=${4:-/dev/null}
for FILE in $FILES
do
gzip -t ${FILE} 2>/dev/null
if [ $? -eq 1 ];
then
comm=cat
else
comm=gzcat
fi
$comm $FILE | awk -v col_list=$COL_LIST -v sourcefile=$FILE -F "$DELIMITER" '
BEGIN {
cols=split(col_list, col, ",");
split("sales_date,cust_id,UPC", titles, ",")
}
FILENAME != "-" {EXC[$1]; next}
!($4 in EXC) {
if(NR!=1) {
if($col[3]!="" && $col[1]!="" && $col[2]!="") {
key=$col[3]
val=key
for(m=4;m<=cols;m++) {
key=key $col[m]
val=val "|" $col[m]
}
salesdate[key] = val
v[key] += $col[1]
d[key] += $col[2]
}
}
}
END {
for(i=2;i<cols;i++) printf "%s|", titles[i-1];
printf("sum(POS_QTY)|sum(POS_AMT)|<source_file>\n")
for (i in v) {
if(salesdate[i]!=1) {
printf("%s|%d|%10.4f|%s\n",salesdate[i],v[i],d[i],sourcefile)
}
}
}' $EXC_FILE -
done
and also, i changed the 2nd file to .txt file and tab delimited like this.
and somehow, when i executed the script, the flagging already worked!!!
as you can see in this result, those values in the 2nd file are not in the list, so they were basicaly excluded correctly.
Though there is one particular weird line in the result (bold).. do you know where did this come from?
i think that's the only change left, and it's as good as done, right?
thanks a lot for all your help sir, i really couldn't have gone this far without your help.
The gsub replaces all commas with the field seperator character (tab in your instance). The 2nd field in the CSV can then be fetched using $2, just like it was a tab delimited file to start with.
i'm not really familiar with sub and gsub though, so is it alright with you if you could tell me the difference?
though FS is field separator right? and i'm a bit unsure as to why
Code:
EXC[$0]
is changed to
Quote:
EXC[$2]
i tried executing the codes again, and it's perfect, got all the expected results i wanted! really thanks, man.
Though, there's another but LAST possibility/scenario needed for this script to handle..
the possibility of a third file
So basically the second and third file have same purpose, to exclude records from first input file when summing values. they are both csv (comma delimited)
And for the third file, we need the first column, to be compared to 2nd field of first file:
so it's like this:
second file: (2nd column) compared to first file (field 4)
third file: (1st column) compared to first file (field 2) << this we need to add...
is it okay to add another argument for the third file when we execute it?
so basically, now the current usage is like this:
Code:
sh script.sh first_input_file "delimiter" col_1,col_2,key1,key2,key3 second_file.csv
so can we make the usage like this with the addition of the third file?
Code:
sh script.sh first_input_file "delimiter" col_1,col_2,key1,key2,key3 second_file.csv third_file.csv
is it possible to add that?
this code is usable for the third file right? since we need just the first column?
Code:
FILENAME != "-" {sub(/,.*$/,"");if(FNR>1)EXC[$0]; next}
!($2 in EXC) {
although i don't know how to add it in the codes in a way that the script will still execute properly,
and do we need to declare another argument again
Code:
EXC_FILE=${4:-/dev/null}
for the third file?
or can we just make the fourth argument like a csv as well? like second_file,third_file? which one do you think would be more convenient?
im sorry for the many questions again, sorry for troubling you.
and thanks a lot for your patience so far, and i'm really learning a lot from you. you are really good at this, and your solutions are always spot on.
thanks a lot man. you're the best.
PS. for the first input file, is it possible for the first input file to accept excel (.xls) files as well?
---------- Post updated at 04:19 AM ---------- Previous update was at 03:47 AM ----------
just a curious question, though, is this scenario also possible?
because i'm trying to sum the values across multiple files.
so, in this case i have two input files (same type)
so can the script handle summing the data across these two files as well? like before summing the data, the first input files (multiple) should be like combined in one file first? before we start excluding and summing data?
is that functionality possible in this script?
and the usage be like this?
Code:
sh script.sh *inputfile* "delimiter" col_1,col_2,key1,key2,key3 second_file.csv third_file.csv
Hi All,
In the below script, I am calling one sql file test.sql If this file returns any data then I have to generate this file test_$RUN_DATE.FCNA If the sql files returns no data then I dont want to generate this file test_$RUN_DATE.FCNA.
I tried one approach like: check the size of FCNA files... (1 Reply)
Hi,
I have below script, i want to monitor that that ntp server listed in setting is under sync or not. I wrote below script but it is not working properly.
Here are problems, first it should server under sync if "*" shows and rest if shows "+" it means it is next server in waiting list.... (4 Replies)
Hi
I have some list of files in a .dat
i need to read them line by line and assing them to variables.
For ex: list of files are some,some1
i need two variables g1 as some and g2 as some1.
and then need to perform some operations on g1 and g2
for which i can get some o/p, i need to capture... (2 Replies)
I want the below script to omit every chunk of data that contains a specific hostname.
here's the scenario. i have a configuration file that contains the configuration of several hosts. a sample of this configuration file is this:
define host {
address ... (12 Replies)
Hello Team,
I have prepared script which will check for listening message for ports 1199,1200 and 1201. I need modifcation in script in such a way that if port 1200 is not listening then it should message rmi port 1200 is not listening. Smap for port 1199 and 1201.
kindly guide me to acheive... (4 Replies)
Hi all
Iam very new to Shell Scripting, I have to modify a shell script looking at an existing one except that it will query against some table X in A database.
Befor Spooling check if there are any reload files if there archive the files.
The above scipt executes some abc.sql which will b a new... (2 Replies)
Hi All
I have files contains rows which look like this:
2 20090721_16:58:47.173 JSUD2 JD1M1 20 IAM 966591835270 249918113182 b 3610 ACM b 3614 ACM b 3713 CPG b 3717 CPG f 5799 REL b 5815 RLC b 5817 RLC :COMMA: NCI=00,FCI=6101,CPC=0A,TMR=00,OFI=00,USI: :COMMB: BCI=1234: :RELCAUSE:10:
... (1 Reply)
i have the following perl script.but it searches for a given filename.
i want to run the same script in my directoy which has subdirectories too and it has to display the file if sreach satisfies along with directory name.
can anyone help me:
perl script:
my $FILE = $ARGV;
for zf in... (4 Replies)