How to put the command to remove duplicate lines in my awk script?

Login or Register to Ask a Question and Join Our Community

How to put the command to remove duplicate lines in my awk script?

Tags

awk, command, programming, remove, script, shell scripts

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting How to put the command to remove duplicate lines in my awk script?

08-08-2019

Registered User

20, 2

Join Date: Apr 2019

Last Activity: 2 July 2020, 6:54 AM EDT

Posts: 20

Thanks Given: 4

Thanked 2 Times in 2 Posts

How to put the command to remove duplicate lines in my awk script?

I create a CGI in bash/html.

My awk script looks like :

Code:

echo "<table>"
for fn in /var/www/cgi-bin/LPAR_MAP/*;
do
echo "<td>"
echo "<PRE>"
awk -F',|;' -v test="$test" '
     NR==1 { 
        split(FILENAME ,a,"[-.]");
      }
      $0 ~ test {
          if(!header++){
              print "DATE ========================== : " a[4] 
          }
          print ""
          print "LPARS :" $2
          print "RAM : " $5
          print "CPU 1 : " $6
          print "CPU 2 : " $7
          print "" 
          print ""
      }' $fn;



echo "</PRE>"
echo "</td>"
done
echo "</table>"

This script allow to analyze 276 csv files that looks like :

Code:

MO2PPC20;mo2vio20b;Running;VIOS 2.2.5.20;7;1.0;2;DefaultPool;shared;uncap;192 MO2PPC20;mo2vio20a;Running;VIOS 2.2.5.20;7;1.0;2;DefaultPool;shared;uncap;192 MO2PPC21;mplaix0311;Running;AIX 7.1 7100-05-02-1832;35;0.6;4;DefaultPool;shared;uncap;64 MO2PPC21;miaibv194;Running;AIX 6.1 6100-09-11-1810;11;0.2;1;DefaultPool;shared;uncap;64 MO2PPC21;mplaix0032;Running;AIX 6.1 6100-09-11-1810;105;4.0;11;DefaultPool;shared;uncap;128 MO2PPC21;mplaix0190;Running;Unknown;243;4.9;30;DefaultPool;shared;uncap;128 MO2PPC21;mo2vio21b;Running;VIOS 2.2.6.10;6;1.5;3;DefaultPool;shared;uncap;192 MO2PPC21;miaibv238;Running;AIX 7.1 7100-05-02-1810;10;0.5;1;DefaultPool;shared;uncap;64 MO2PPC21;mo2vio21a;Running;VIOS 2.2.6.10;6;1.5;3;DefaultPool;shared;uncap;192 MO2PPC21;miaibv193;Running;AIX 6.1 6100-09-11-1810;12;0.2;1;DefaultPool;shared;uncap;64 MO1PPC17;miaibe03;Running;AIX 5.2 5200-10-08-0930;25;null;3;null;ded;share_idle_procs;null MO1PPC17;miaiba12;Running;AIX 5.2 5200-10-08-0930;17;null;2;null;ded;share_idle_procs;null MO1PPC17;miaibf03;Running;AIX 5.2 5200-10-08-0930;30;null;3;null;ded;share_idle_procs;null MO1PPC17;miaibc05;Running;AIX 5.2 
 5200-10-08-0930;40;null;2;null;ded;share_idle_procs;null

And allow to display them in my CGI like that :

The numbers of columns is equal at the number of csv to analyze.

As you can see in the screenshot, some lines are sometimes the same in each csv files. The idea is to delete the lines that are the same in all my csv files.

I know the command :

Code:

awk '!a[$0]++'

But I don't know how to put it in my awk script... Do you have any idea ?

Thank you !

Last edited by Don Cragun; 08-09-2019 at 12:21 AM.. Reason: Change ICODE tags to CODE tags for sample data.

This User Gave Thanks to Tim2424 For This Post:

Tim2424

View Public Profile for Tim2424

Find all posts by Tim2424

08-09-2019

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

I always do these types of tasks in PHP; but that's just me. We have a lot of AWK lovers here who will help with the AWK code.

My comment is only on the HTML:

Regarding the code above; my only comment is that, for the most part web developers most agree that <table> tags should be avoided and <div> tags should be used instead.

I have mostly eliminated all <table> tags here at unix.com, but there are still a few <table> tags here at unix.com from decades of legacy code I need to obsolete someday......

Neo

View Public Profile for Neo

Visit Neo's homepage!

Find all posts by Neo

08-09-2019

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

So from your description, you seem to be saying that the output you want instead of the output you've shown us would be just the two lines consisting of the one containing the dates and the line that contains the RAM values 99, 99, and .25???

The first line in the image you showed us is not produced by the script you have shown us (so I am not seeing how anything that awk might do will affect that line of output) and every other line in that image has the same value in all three columns. The output described above doesn't seem to be very useful.

Note that the awk code '!a[$0]++' will discard duplicated lines within all of the files processed by a single invocation of awk. But, since you are processing each of your 276 CSV input files in a separate invocation of awk there is no way for any of those invocations of awk to compare any input values from one input file to any input values from any other input file.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-09-2019

Registered User

20, 2

Join Date: Apr 2019

Last Activity: 2 July 2020, 6:54 AM EDT

Posts: 20

Thanks Given: 4

Thanked 2 Times in 2 Posts

Hello !

Thank you for your answer !

Quote:

Note that the awk code '!a[$0]++' will discard duplicated lines within all of the files processed by a single invocation of awk. But, since you are processing each of your 276 CSV input files in a separate invocation of awk there is no way for any of those invocations of awk to compare any input values from one input file to any input values from any other input file.

That is why I decide to change my script.

For the moment, I have this :

Code:

awk -F",|;" ' {$0=$1","$2","$5","$6","$7 } /'$test'/ { if (!a[$0]++)
{
print ""
printf "LPARS : %s\n", $2
printf "RAM : %s\n", $3
printf "CPU1 : %s\n", $4
printf "CPU2 : %s\n", $5
print ""
}
}' /var/www/cgi-bin/LPAR_MAP/*.csv ;

And it works.

But I don't know how to put :

Code:

NR==1 { 
        split(FILENAME ,a,"[-.]");
     }

and

Code:

$0 ~ test {
          if(!header++){
              print "DATE ========================== : " a[4] 
          }

Any ideas ?

Tim2424

View Public Profile for Tim2424

Find all posts by Tim2424

08-09-2019

Registered User

20, 2

Join Date: Apr 2019

Last Activity: 2 July 2020, 6:54 AM EDT

Posts: 20

Thanks Given: 4

Thanked 2 Times in 2 Posts

I try something like that :

Code:

echo "<table>"
for fn in /var/www/cgi-bin/LPAR_MAP/*.csv;
do
echo "<td>"
echo "<PRE>"

awk -F',|;' '{ $0=$1","$2","$5","$6","$7} /'$test'/ { if (!a[$0]++) 
{
print "DATE ================= : " FILENAME 
printf "LPARS : %s\n", $2
printf "RAM : %s\n", $3
printf "CPU1 : %s\n", $4
printf "CPU2 : %s\n", $5
print ""
}
}' $fn

echo "</PRE>"
echo "</td>"
done
echo "</table"

The ouptut is :

It's almost what I want. I continued to search !

Tim2424

View Public Profile for Tim2424

Find all posts by Tim2424

08-09-2019

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

I guess I still don't understand what output you're trying to produce.

Everything in the second column of the output you showed us in the image you included in post #5 in this thread is identical to the data to the first column of the output in that same image with the possible exception of the filename you have in the date field (which is chopped off because you didn't show us the entire image). In the first post you said you wanted to delete output lines where everything matched, but the output you've shown us has everything matching with nothing deleted???

Please try once more to clearly explain what you are trying to do. It is hard to help you come up with a solution to your problem if we can't figure out what you're trying to do!

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-12-2019

Registered User

20, 2

Join Date: Apr 2019

Last Activity: 2 July 2020, 6:54 AM EDT

Posts: 20

Thanks Given: 4

Thanked 2 Times in 2 Posts

Hello there !

I will try be more clear ( I'm french so I don't speak english very well and I think this is why you having trouble to understand some things

)

With my first script in post #1, I could display the informations in column. For each csv files, I create a new column with de date of the file ( the filename looks like MYCSV-DATE-20190812.csv in the filename, that's why is use the " splitname(FILENAME... )" to keep only the date " 20190812 " ), and under the date, I display the informations from CSV files that corresponds to him :

On this screenshot, you can see that each informations in the columns are the sames ( only 3 columns here, but I have 276 CSV files... So if you understand what I want to do, there should be 276 columns... ). It's normal that the informations are the sames because some lines are sometimes the same in each csv files ).

Sometimes, some columns are empty, so with this piece of code :

Code:

$0 ~ test 
{   
if(!header++)
{  print "DATE ========================== : " a[4]  }

I don't display the empty columns.

But now, I want to not display the lines that are the same. This will allow to reduce the number of columns. I know this command :

Code:

awk '!a[$0]++'

With this command in addition at my script and with the same frame as my screenshot, only 40 columns with only one or two infomations are displayed... that was 276 with my first script

But, ( certainly because I'm beginner and that my skills still pretty bad

), I didn't succeed to put this command in my script. So... I decide to start from the beginning. I make this script :

Code:

 awk -F",|;" ' {$0=$1","$2","$5","$6","$7 } /'$test'/ { if (!a[$0]++)
 { 
print "" printf "LPARS : %s\n", $2 

printf "RAM : %s\n", $3 

printf "CPU1 : %s\n", $4 

printf "CPU2 : %s\n", $5 

print ""
 } 

}' /var/www/cgi-bin/LPAR_MAP/*.csv ;

This script allow to only display the differents lines from each files ( So, in my case, all the differents lines represents the moment where the % of consumption in RAM and CPU changed and this is what I want to displayed ). Now, I want to display these informations... Inside a columns, like my first script. One file = one date = one columns with the informations of the file under the date.

I changed my script for that :

Code:

echo "<table>" 
for fn in /var/www/cgi-bin/LPAR_MAP/*.csv; 
do 
echo "<td>" 
echo "<PRE>"  
awk -F',|;' '{ $0=$1","$2","$5","$6","$7} /'$test'/ { if (!a[$0]++)  
{ 
print "DATE ================= : " FILENAME  
printf "LPARS : %s\n", $2 
printf "RAM : %s\n", $3 
printf "CPU1 : %s\n", $4 
printf "CPU2 : %s\n", $5 
print "" 
} 
}' $fn  
echo "</PRE>" 
echo "</td>" 
done 
echo "</table">

And the output is :

So yes, there is a mistake somewhere because each files are displayed in the same column and the content of this column is the same inside the others columns... But there are some good points :

Only the differents lines are displayed
The informations are displayed under the right date

Now I will change that to create one column for one date, and change the FILENAME output to only keep the date.

I hope it's more clearly !

And thank you for your help !

Tim2424

View Public Profile for Tim2424

Find all posts by Tim2424

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ...

2. Shell Programming and Scripting

Cant get awk 1liner to remove duplicate lines from Delimited file, get "event not found" error..help

Hi, I am on a Solaris8 machine If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error" ...or Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great. ****************** ...

3. Shell Programming and Scripting

AWK Command to duplicate lines in a file?

Hi, I have a file with date in it like: UserString1 UserString2 UserString3 UserString4 UserString5 I need two entries for each line so it reads like UserString1 UserString1 UserString2 UserString2 etc. Can someone help me with the awk command please? Thanks

4. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>...

5. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few...

6. Shell Programming and Scripting

remove duplicate lines using awk

Hi, I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw :confused:

7. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance

8. Shell Programming and Scripting

awk script to remove duplicate rows in line

i have the long file more than one ns and www and mx in the line like . i need the first ns record and first www and first mx from line . the records are seperated with tthe ; i am try ing in awk scripting not getiing the solution. ...

9. Shell Programming and Scripting

Command/Script to remove duplicate lines from the file?

Hello, Can anyone tell Command/Script to remove duplicate lines from the file?

10. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but...

Login or Register to Ask a Question