How to put the command to remove duplicate lines in my awk script?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to put the command to remove duplicate lines in my awk script?
# 1  
Old 08-08-2019
How to put the command to remove duplicate lines in my awk script?

I create a CGI in bash/html.


My awk script looks like :

Code:
echo "<table>"
for fn in /var/www/cgi-bin/LPAR_MAP/*;
do
echo "<td>"
echo "<PRE>"
awk -F',|;' -v test="$test" '
     NR==1 { 
        split(FILENAME ,a,"[-.]");
      }
      $0 ~ test {
          if(!header++){
              print "DATE ========================== : " a[4] 
          }
          print ""
          print "LPARS :" $2
          print "RAM : " $5
          print "CPU 1 : " $6
          print "CPU 2 : " $7
          print "" 
          print ""
      }' $fn;



echo "</PRE>"
echo "</td>"
done
echo "</table>"

This script allow to analyze 276 csv files that looks like :

Code:
MO2PPC20;mo2vio20b;Running;VIOS 2.2.5.20;7;1.0;2;DefaultPool;shared;uncap;192 MO2PPC20;mo2vio20a;Running;VIOS 2.2.5.20;7;1.0;2;DefaultPool;shared;uncap;192 MO2PPC21;mplaix0311;Running;AIX 7.1 7100-05-02-1832;35;0.6;4;DefaultPool;shared;uncap;64 MO2PPC21;miaibv194;Running;AIX 6.1 6100-09-11-1810;11;0.2;1;DefaultPool;shared;uncap;64 MO2PPC21;mplaix0032;Running;AIX 6.1 6100-09-11-1810;105;4.0;11;DefaultPool;shared;uncap;128 MO2PPC21;mplaix0190;Running;Unknown;243;4.9;30;DefaultPool;shared;uncap;128 MO2PPC21;mo2vio21b;Running;VIOS 2.2.6.10;6;1.5;3;DefaultPool;shared;uncap;192 MO2PPC21;miaibv238;Running;AIX 7.1 7100-05-02-1810;10;0.5;1;DefaultPool;shared;uncap;64 MO2PPC21;mo2vio21a;Running;VIOS 2.2.6.10;6;1.5;3;DefaultPool;shared;uncap;192 MO2PPC21;miaibv193;Running;AIX 6.1 6100-09-11-1810;12;0.2;1;DefaultPool;shared;uncap;64 MO1PPC17;miaibe03;Running;AIX 5.2 5200-10-08-0930;25;null;3;null;ded;share_idle_procs;null MO1PPC17;miaiba12;Running;AIX 5.2 5200-10-08-0930;17;null;2;null;ded;share_idle_procs;null MO1PPC17;miaibf03;Running;AIX 5.2 5200-10-08-0930;30;null;3;null;ded;share_idle_procs;null MO1PPC17;miaibc05;Running;AIX 5.2 
 5200-10-08-0930;40;null;2;null;ded;share_idle_procs;null

And allow to display them in my CGI like that :

Image

The numbers of columns is equal at the number of csv to analyze.


As you can see in the screenshot, some lines are sometimes the same in each csv files. The idea is to delete the lines that are the same in all my csv files.

I know the command :
Code:
awk '!a[$0]++'

But I don't know how to put it in my awk script... Do you have any idea ?

Thank you ! Smilie

Last edited by Don Cragun; 08-09-2019 at 12:21 AM.. Reason: Change ICODE tags to CODE tags for sample data.
This User Gave Thanks to Tim2424 For This Post:
# 2  
Old 08-09-2019
I always do these types of tasks in PHP; but that's just me. We have a lot of AWK lovers here who will help with the AWK code.

My comment is only on the HTML:

Regarding the code above; my only comment is that, for the most part web developers most agree that <table> tags should be avoided and <div> tags should be used instead.

I have mostly eliminated all <table> tags here at unix.com, but there are still a few <table> tags here at unix.com from decades of legacy code I need to obsolete someday......
# 3  
Old 08-09-2019
So from your description, you seem to be saying that the output you want instead of the output you've shown us would be just the two lines consisting of the one containing the dates and the line that contains the RAM values 99, 99, and .25???

The first line in the image you showed us is not produced by the script you have shown us (so I am not seeing how anything that awk might do will affect that line of output) and every other line in that image has the same value in all three columns. The output described above doesn't seem to be very useful.

Note that the awk code '!a[$0]++' will discard duplicated lines within all of the files processed by a single invocation of awk. But, since you are processing each of your 276 CSV input files in a separate invocation of awk there is no way for any of those invocations of awk to compare any input values from one input file to any input values from any other input file.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 08-09-2019
Hello !

Thank you for your answer !

Quote:
Note that the awk code '!a[$0]++' will discard duplicated lines within all of the files processed by a single invocation of awk. But, since you are processing each of your 276 CSV input files in a separate invocation of awk there is no way for any of those invocations of awk to compare any input values from one input file to any input values from any other input file.
That is why I decide to change my script.

For the moment, I have this :


Code:
awk -F",|;" ' {$0=$1","$2","$5","$6","$7 } /'$test'/ { if (!a[$0]++)
{
print ""
printf "LPARS : %s\n", $2
printf "RAM : %s\n", $3
printf "CPU1 : %s\n", $4
printf "CPU2 : %s\n", $5
print ""
}
}' /var/www/cgi-bin/LPAR_MAP/*.csv ;

And it works.

But I don't know how to put :

Code:
NR==1 { 
        split(FILENAME ,a,"[-.]");
     }


and

Code:
$0 ~ test {
          if(!header++){
              print "DATE ========================== : " a[4] 
          }


Any ideas ?
# 5  
Old 08-09-2019
I try something like that :

Code:
echo "<table>"
for fn in /var/www/cgi-bin/LPAR_MAP/*.csv;
do
echo "<td>"
echo "<PRE>"

awk -F',|;' '{ $0=$1","$2","$5","$6","$7} /'$test'/ { if (!a[$0]++) 
{
print "DATE ================= : " FILENAME 
printf "LPARS : %s\n", $2
printf "RAM : %s\n", $3
printf "CPU1 : %s\n", $4
printf "CPU2 : %s\n", $5
print ""
}
}' $fn

echo "</PRE>"
echo "</td>"
done
echo "</table"

The ouptut is :
Image

It's almost what I want. I continued to search ! Smilie
# 6  
Old 08-09-2019
I guess I still don't understand what output you're trying to produce.

Everything in the second column of the output you showed us in the image you included in post #5 in this thread is identical to the data to the first column of the output in that same image with the possible exception of the filename you have in the date field (which is chopped off because you didn't show us the entire image). In the first post you said you wanted to delete output lines where everything matched, but the output you've shown us has everything matching with nothing deleted???

Please try once more to clearly explain what you are trying to do. It is hard to help you come up with a solution to your problem if we can't figure out what you're trying to do!
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 08-12-2019
Hello there ! Smilie

I will try be more clear ( I'm french so I don't speak english very well and I think this is why you having trouble to understand some things Smilie)


With my first script in post #1, I could display the informations in column. For each csv files, I create a new column with de date of the file ( the filename looks like MYCSV-DATE-20190812.csv in the filename, that's why is use the " splitname(FILENAME... )" to keep only the date " 20190812 " ), and under the date, I display the informations from CSV files that corresponds to him :

Image


On this screenshot, you can see that each informations in the columns are the sames ( only 3 columns here, but I have 276 CSV files... So if you understand what I want to do, there should be 276 columns... ). It's normal that the informations are the sames because some lines are sometimes the same in each csv files ).

Sometimes, some columns are empty, so with this piece of code :

Code:
$0 ~ test 
{   
if(!header++)
{  print "DATE ========================== : " a[4]  }

I don't display the empty columns.


But now, I want to not display the lines that are the same. This will allow to reduce the number of columns. I know this command :

Code:
awk '!a[$0]++'

With this command in addition at my script and with the same frame as my screenshot, only 40 columns with only one or two infomations are displayed... that was 276 with my first script

But, ( certainly because I'm beginner and that my skills still pretty bad Smilie), I didn't succeed to put this command in my script. So... I decide to start from the beginning. I make this script :

Code:
 awk -F",|;" ' {$0=$1","$2","$5","$6","$7 } /'$test'/ { if (!a[$0]++)
 { 
print "" printf "LPARS : %s\n", $2 

printf "RAM : %s\n", $3 

printf "CPU1 : %s\n", $4 

printf "CPU2 : %s\n", $5 

print ""
 } 

}' /var/www/cgi-bin/LPAR_MAP/*.csv ;

This script allow to only display the differents lines from each files ( So, in my case, all the differents lines represents the moment where the % of consumption in RAM and CPU changed and this is what I want to displayed ). Now, I want to display these informations... Inside a columns, like my first script. One file = one date = one columns with the informations of the file under the date.

I changed my script for that :

Code:
echo "<table>" 
for fn in /var/www/cgi-bin/LPAR_MAP/*.csv; 
do 
echo "<td>" 
echo "<PRE>"  
awk -F',|;' '{ $0=$1","$2","$5","$6","$7} /'$test'/ { if (!a[$0]++)  
{ 
print "DATE ================= : " FILENAME  
printf "LPARS : %s\n", $2 
printf "RAM : %s\n", $3 
printf "CPU1 : %s\n", $4 
printf "CPU2 : %s\n", $5 
print "" 
} 
}' $fn  
echo "</PRE>" 
echo "</td>" 
done 
echo "</table">

And the output is :

Image


So yes, there is a mistake somewhere because each files are displayed in the same column and the content of this column is the same inside the others columns... But there are some good points :

  • Only the differents lines are displayed
  • The informations are displayed under the right date



Now I will change that to create one column for one date, and change the FILENAME output to only keep the date.


I hope it's more clearly !

And thank you for your help ! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies

2. Shell Programming and Scripting

Cant get awk 1liner to remove duplicate lines from Delimited file, get "event not found" error..help

Hi, I am on a Solaris8 machine If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error" ...or Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great. ****************** ... (3 Replies)
Discussion started by: andy b
3 Replies

3. Shell Programming and Scripting

AWK Command to duplicate lines in a file?

Hi, I have a file with date in it like: UserString1 UserString2 UserString3 UserString4 UserString5 I need two entries for each line so it reads like UserString1 UserString1 UserString2 UserString2 etc. Can someone help me with the awk command please? Thanks (4 Replies)
Discussion started by: Grueben
4 Replies

4. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

5. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

6. Shell Programming and Scripting

remove duplicate lines using awk

Hi, I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw :confused: (7 Replies)
Discussion started by: sudvishw
7 Replies

7. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance (9 Replies)
Discussion started by: cola
9 Replies

8. Shell Programming and Scripting

awk script to remove duplicate rows in line

i have the long file more than one ns and www and mx in the line like . i need the first ns record and first www and first mx from line . the records are seperated with tthe ; i am try ing in awk scripting not getiing the solution. ... (4 Replies)
Discussion started by: kiranmosarla
4 Replies

9. Shell Programming and Scripting

Command/Script to remove duplicate lines from the file?

Hello, Can anyone tell Command/Script to remove duplicate lines from the file? (2 Replies)
Discussion started by: Rahulpict
2 Replies

10. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies
Login or Register to Ask a Question