How to put the command to remove duplicate lines in my awk script?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to put the command to remove duplicate lines in my awk script?
# 8  
Old 08-12-2019
Please show us the exact output you hope to produce from the three sample files that we can assume were used to produce the output shown in the image you showed us in post #1 in this thread (preferably as text in CODE tags rather than as an image).

If you're trying to do what I think you are trying to do, you cannot use !a[$0]++ to filter your input because it will throw away input before you know whether or not you will want to print it. I think you need to read all of your input files and then compare the data for each LPARS value. If and only if all of the entries are identical, then you can decide not to print that row of output.

Note that in your sample input file shown in post #1 in this thread, you showed us two lines that seem to be in completely different formats. Please explain what the real format is for your input files.

It is also unclear as to whether or not all of the input files will contain an entry for each LPARS value. If a record for a specific LPARS value is not included in a file, should that be treated as a "different" value causing a line to be printed? Or should a file be ignored when determining whether or not to print an LPARS value line if there is no entry for that LPARS value in that file?
# 9  
Old 08-13-2019
Hello there ! Smilie

I will try to show you what I want :

I would like this output :

Code:
  Date ======== 201908XX      Date ======== 201908XX          Date ======== 201908XX
                         
 LPARS : XX                            LPARS : XX                          LPARS : XX
 RAM : XX                               RAM : XX                              RAM : XX
 CPU 1 : XX                            CPU 1 : XX                           CPU 1 : XX
 CPU 2 : XX                            CPU 2 : XX                            CPU 2 : XX

Each date is equal to a file.


Quote:
Note that in your sample input file shown in post #1 in this thread, you showed us two lines that seem to be in completely different formats. Please explain what the real format is for your input files.
Oh, I didn't see that... It was just a exemple of my CSV files. Layout problem I guess. The correct layout :

Code:
MO2PPC20;mo2vio20b;Running;VIOS 2.2.5.20;7;1.0;2;DefaultPool;shared;uncap;192
MO2PPC20;mo2vio20a;Running;VIOS 2.2.5.20;7;1.0;2;DefaultPool;shared;uncap;192 
MO2PPC21;mplaix0311;Running;AIX 7.1 7100-05-02-1832;35;0.6;4;DefaultPool;shared;uncap;64 
MO2PPC21;miaibv194;Running;AIX 6.1 6100-09-11-1810;11;0.2;1;DefaultPool;shared;uncap;64 
MO2PPC21;mplaix0032;Running;AIX 6.1 6100-09-11-1810;105;4.0;11;DefaultPool;shared;uncap;128 
MO2PPC21;mplaix0190;Running;Unknown;243;4.9;30;DefaultPool;shared;uncap;128 
MO2PPC21;mo2vio21b;Running;VIOS 2.2.6.10;6;1.5;3;DefaultPool;shared;uncap;192 
MO2PPC21;miaibv238;Running;AIX 7.1 7100-05-02-1810;10;0.5;1;DefaultPool;shared;uncap;64 
MO2PPC21;mo2vio21a;Running;VIOS 2.2.6.10;6;1.5;3;DefaultPool;shared;uncap;192 
MO2PPC21;miaibv193;Running;AIX 6.1 6100-09-11-1810;12;0.2;1;DefaultPool;shared;uncap;64 
MO1PPC17;miaibe03;Running;AIX 5.2 5200-10-08-0930;25;null;3;null;ded;share_idle_procs;null 
MO1PPC17;miaiba12;Running;AIX 5.2 5200-10-08-0930;17;null;2;null;ded;share_idle_procs;null 
MO1PPC17;miaibf03;Running;AIX 5.2 5200-10-08-0930;30;null;3;null;ded;share_idle_procs;null 
MO1PPC17;miaibc05;Running;AIX 5.2 5200-10-08-0930;40;null;2;null;ded;share_idle_procs;null

In my script, I keep only the column 1,2,5,6 and 7 thanks to awk.

Quote:
It is also unclear as to whether or not all of the input files will contain an entry for each LPARS value. If a record for a specific LPARS value is not included in a file, should that be treated as a "different" value causing a line to be printed? Or should a file be ignored when determining whether or not to print an LPARS value line if there is no entry for that LPARS value in that file?
There is a value for each LPARS. And even if there is no value, it's not problem. For exemple, if I have no value for the RAM, nothing will be displayed next to " RAM " :

Code:
 Date ======== 201908XX      
                         
 LPARS : foo                            
 RAM :                                
 CPU 1 : 4                            
 CPU 2 : 2

I just need to put the content of the csv file next to his " key name " ( LPARS, RAM, CPU 1 or CPU 2 ). So if there is no informations, nothing will be displayed.


I don't have the impression that is difficult, but I don't see the solution... I succeeded with my first script but all I needed was to delete the duplicate lines... Now that I've succeed to delete the duplicate lines, I juste need to put the ouput at the good layout...

Hope is more clear.

Have a nice day ! Smilie
# 10  
Old 08-14-2019
Your first script never referenced input file field #1 in any way. Your remaining scripts keep it in reformatted input lines but never reference it.

Your scripts show a variable named test being used to filter input, but gives no indication of how it is set, what it is used to match, nor why it is there.

Please show us the code that you have hidden from us. Am I correct in guessing that you are setting the shell variable test to a value that will be identical to one of the values that will be found in field #1 in each of your input files?

From the image you supplied in post #1 in this thread I thought the output you wanted would be something like:
Code:
DATE ===== 20180122    DATE ===== 20180124    DATE ===== 20180125
RAM : 99               RAM : 99               RAM : 0.25

which are the only two lines in your output that do not have identical values in all three columns. I would have thought that it would be more useful to also show the rest of the information lines in the output related to LPARS value miaibg04. But, since the data you say you want in post #9 has three input files with the same date (201908XX) and identical values for all of the other fields (XX), I am still just guessing at what output you want to produce. Smilie

It is after 1:00am here, so I am going to bed. When I get up I will see If I can manufacture some input file data that I can use to test something that might or might not be similar to three of your input files and then see if I can create an awk script that will produce output that I might find useful. Since you are making this so difficult for any of us who are trying to help you, this may take a while and will not be high on my priority list.
# 11  
Old 08-14-2019
Hello ! Smilie

Quote:
Your scripts show a variable named test being used to filter input, but gives no indication of how it is set, what it is used to match, nor why it is there.

Please show us the code that you have hidden from us. Am I correct in guessing that you are setting the shell variable test to a value that will be identical to one of the values that will be found in field #1 in each of your input files?
You're right.

The complete code is :

Code:
read a
test=$( echo $a | cut -d'=' -f2)


echo "<p><h2>FRAME : $test</h2></p>"

echo "<table>"
for fn in /var/www/cgi-bin/LPAR_MAP/*;
do
echo "<td>"
echo "<PRE>"
awk -F',|;' -v test="$test" ' 
     NR==1 {
        split(FILENAME ,a,"[-.]");
      }
     $0 ~ test  {
          if(!header++){
              print "DATE ========================== : " a[4] 
          }
          print ""
          print "RAM : " $5
          print "CPU 1 : " $6
          print "CPU 2 : " $7
          print "" 
          print ""
      }' $fn;

echo "</PRE>"
echo "</td>"
done
echo "</table>"

echo "<table>"
echo "<td>"
echo "<PRE>"

read a allow to recover the query string and test=$( echo $a | cut -d'=' -f2) allow to change the output. The basic output is FRAME_NAME=MIAIBYE00. It was generate from a listbox in my index page which is contain the list of my FRAMES. I use the cut command to keep only the right side of the =. My variable $test is equal to the query string with the cut. So I keep only the lines which is contain the query string.


Quote:
From the image you supplied in post #1 in this thread I thought the output you wanted would be something like:

which are the only two lines in your output that do not have identical values in all three columns. I would have thought that it would be more useful to also show the rest of the information lines in the output related to LPARS value miaibg04. But, since the data you say you want in post #9 has three input files with the same date (201908XX) and identical values for all of the other fields (XX), I am still just guessing at what output you want to produce. Smilie
In my post#1, I make a screenshot of only three columns, because... I can't do more. The date is from the filename, so yes, for this exemple, there is only three columns, bu as I have 276 csv files and if the date is from the filename... There is 276 columns. That's why there is only 3 columns here.
And like for the screenshot, the lines from my CSV are just here as an exemple. In reality, I have 226442 lines. You understand that I can't post all these lines as an exemple.

So, in a nuthsell :

- I have many CSV files ( 276 csv -> 226442 lines )
- I make awk to keep only the column 1,2,5,6,and 7. I would like to keep only the lines that are not the same, so I use the command if (!a[$0]++) to delete the duplicate lines ( By eliminating the duplicate lines, I reduce the number of columns too. )
- I would like to display these informations like that thanks to a html array :
Code:
DATE ===== XXXXXXXX    DATE ===== XXXXXXXX    DATE ===== XXXXXXXX
LPARS :  XXX           LPARS :  XXX           LPARS :  XXX
RAM : XX               RAM : XX               RAM : XXX
CPU1 : XX              CPU 1 : XX             CPU 1: XX
CPU 2 : XX             CPU 2 : XX             CPU2 : XX

LPARS :  XXX           LPARS :  XXX           LPARS :  XXX
RAM : XX               RAM : XX               RAM : XXX
CPU1 : XX              CPU 1 : XX             CPU 1: XX
CPU 2 : XX             CPU 2 : XX             CPU2 : XX
 
...

As in my first script and as you can see an exemple on the screenshot.

LPARS : the content of the column 2 kept by the awk command
RAM : the content of the column 5 kept by the awk command
CPU 1 : the content of the column 6 kept by the awk command
CPU 2 : the content of the column 7 kept by the awk command


Quote:
It is after 1:00am here, so I am going to bed. When I get up I will see If I can manufacture some input file data that I can use to test something that might or might not be similar to three of your input files and then see if I can create an awk script that will produce output that I might find useful. Since you are making this so difficult for any of us who are trying to help you, this may take a while and will not be high on my priority list.
There is no problem for that. It's just a simple request. If you haven't time to awser me or if you can't find a soluce, never mind ! I will continue to find a soluce for my part !

Have a nice day ! Smilie
# 12  
Old 08-14-2019
Quote:
Originally Posted by Tim2424
Hello ! Smilie

You're right.

The complete code is :

Code:
read a
test=$( echo $a | cut -d'=' -f2)


echo "<p><h2>FRAME : $test</h2></p>"

echo "<table>"
for fn in /var/www/cgi-bin/LPAR_MAP/*;
do
echo "<td>"
echo "<PRE>"
awk -F',|;' -v test="$test" ' 
     NR==1 {
        split(FILENAME ,a,"[-.]");
      }
     $0 ~ test  {
          if(!header++){
              print "DATE ========================== : " a[4] 
          }
          print ""
          print "RAM : " $5
          print "CPU 1 : " $6
          print "CPU 2 : " $7
          print "" 
          print ""
      }' $fn;

echo "</PRE>"
echo "</td>"
done
echo "</table>"

echo "<table>"
echo "<td>"
echo "<PRE>"

read a allow to recover the query string and test=$( echo $a | cut -d'=' -f2) allow to change the output. The basic output is FRAME_NAME=MIAIBYE00. It was generate from a listbox in my index page which is contain the list of my FRAMES. I use the cut command to keep only the right side of the =. My variable $test is equal to the query string with the cut. So I keep only the lines which is contain the query string.
OK. I know that the value stored in the shell variable test is used to filter the input. You still haven't clearly answered the question: Is the value stored in test a string that is an exact match for a $1 value in your input files? Assuming that it is, the awk test $1 == test would be a better test than using $0 ~ test. The $1 == test will only match exactly the value you want to match. The $0 ~ test will match the lines you do want, but could also match lines that you do not want.
Quote:
In my post#1, I make a screenshot of only three columns, because... I can't do more. The date is from the filename, so yes, for this exemple, there is only three columns, bu as I have 276 csv files and if the date is from the filename... There is 276 columns. That's why there is only 3 columns here.
And like for the screenshot, the lines from my CSV are just here as an exemple. In reality, I have 226442 lines. You understand that I can't post all these lines as an exemple.

So, in a nuthsell :

- I have many CSV files ( 276 csv -> 226442 lines )
- I make awk to keep only the column 1,2,5,6,and 7. I would like to keep only the lines that are not the same, so I use the command if (!a[$0]++) to delete the duplicate lines ( By eliminating the duplicate lines, I reduce the number of columns too. )
- I would like to display these informations like that thanks to a html array :
Code:
DATE ===== XXXXXXXX    DATE ===== XXXXXXXX    DATE ===== XXXXXXXX
LPARS :  XXX           LPARS :  XXX           LPARS :  XXX
RAM : XX               RAM : XX               RAM : XXX
CPU1 : XX              CPU 1 : XX             CPU 1: XX
CPU 2 : XX             CPU 2 : XX             CPU2 : XX

LPARS :  XXX           LPARS :  XXX           LPARS :  XXX
RAM : XX               RAM : XX               RAM : XXX
CPU1 : XX              CPU 1 : XX             CPU 1: XX
CPU 2 : XX             CPU 2 : XX             CPU2 : XX
 
...

As in my first script and as you can see an exemple on the screenshot.

LPARS : the content of the column 2 kept by the awk command
RAM : the content of the column 5 kept by the awk command
CPU 1 : the content of the column 6 kept by the awk command
CPU 2 : the content of the column 7 kept by the awk command
No, we cannot see that from your example in post #1. Your example in post #1 shows the output you would have gotten if you had run your script asking it to process three input files. It does not show us the output you want to get when you run your script with those same three input files. And, your refusal to show us the output your want to get from those three sample input files makes many of your later statements ambiguous.

PLEASE look at your sample output in post #1 and show us exactly what output you want to produce. (DO NOT use XX to hide the data you want; use the data that is in the image in post #1.) I assume that there will be somewhere between two and seven lines of output and I would have thought that you want three columns of output, but maybe you only want two columns of output. If you are unwilling to do this simple task for me, I don't think I will be able to figure out what you are trying to do. The language barrier is making it difficult for me to determine what you are trying to do. I need to see the actual output you are trying to produce from the three files used in your example.

Quote:
There is no problem for that. It's just a simple request. If you haven't time to awser me or if you can't find a soluce, never mind ! I will continue to find a soluce for my part !

Have a nice day ! Smilie
I want to help, but without a clear example of the output you are trying to produce I can't write the code you need.
# 13  
Old 09-13-2019
Programming Assignments Help

I create a CGI in bash/html.

My awk script looks like :
Code:

echo "<table>"
echo "<table>"
for fn in /var/www/cgi-bin/LPAR_MAP/*;
do
echo "<td>"
echo "<PRE>"

awk -F',|;' -v test="$test" '
NR==1 {
split(FILENAME ,a,"[-.]");
}
$0 ~ test {
if(!header++){
print "DATE ========================== : " a[4]
}
print ""
print "LPARS :" $2
print "RAM : " $5
print "CPU 1 : " $6
print "CPU 2 : " $7
print ""
print ""
}' $fn;



echo "</PRE>"
echo "</td>"
done
echo "</table>"
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies

2. Shell Programming and Scripting

Cant get awk 1liner to remove duplicate lines from Delimited file, get "event not found" error..help

Hi, I am on a Solaris8 machine If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error" ...or Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great. ****************** ... (3 Replies)
Discussion started by: andy b
3 Replies

3. Shell Programming and Scripting

AWK Command to duplicate lines in a file?

Hi, I have a file with date in it like: UserString1 UserString2 UserString3 UserString4 UserString5 I need two entries for each line so it reads like UserString1 UserString1 UserString2 UserString2 etc. Can someone help me with the awk command please? Thanks (4 Replies)
Discussion started by: Grueben
4 Replies

4. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

5. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

6. Shell Programming and Scripting

remove duplicate lines using awk

Hi, I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw :confused: (7 Replies)
Discussion started by: sudvishw
7 Replies

7. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance (9 Replies)
Discussion started by: cola
9 Replies

8. Shell Programming and Scripting

awk script to remove duplicate rows in line

i have the long file more than one ns and www and mx in the line like . i need the first ns record and first www and first mx from line . the records are seperated with tthe ; i am try ing in awk scripting not getiing the solution. ... (4 Replies)
Discussion started by: kiranmosarla
4 Replies

9. Shell Programming and Scripting

Command/Script to remove duplicate lines from the file?

Hello, Can anyone tell Command/Script to remove duplicate lines from the file? (2 Replies)
Discussion started by: Rahulpict
2 Replies

10. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies
Login or Register to Ask a Question