How to put the command to remove duplicate lines in my awk script?
I create a CGI in bash/html.
My awk script looks like :
This script allow to analyze 276 csv files that looks like :
And allow to display them in my CGI like that :
The numbers of columns is equal at the number of csv to analyze.
As you can see in the screenshot, some lines are sometimes the same in each csv files. The idea is to delete the lines that are the same in all my csv files.
I know the command :
But I don't know how to put it in my awk script... Do you have any idea ?
Thank you !
Last edited by Don Cragun; 08-09-2019 at 12:21 AM..
Reason: Change ICODE tags to CODE tags for sample data.
Location: Asia Pacific, Cyberspace, in the Dark Dystopia
Posts: 19,118
Thanks Given: 2,351
Thanked 3,359 Times in 1,878 Posts
I always do these types of tasks in PHP; but that's just me. We have a lot of AWK lovers here who will help with the AWK code.
My comment is only on the HTML:
Regarding the code above; my only comment is that, for the most part web developers most agree that <table> tags should be avoided and <div> tags should be used instead.
I have mostly eliminated all <table> tags here at unix.com, but there are still a few <table> tags here at unix.com from decades of legacy code I need to obsolete someday......
So from your description, you seem to be saying that the output you want instead of the output you've shown us would be just the two lines consisting of the one containing the dates and the line that contains the RAM values 99, 99, and .25???
The first line in the image you showed us is not produced by the script you have shown us (so I am not seeing how anything that awk might do will affect that line of output) and every other line in that image has the same value in all three columns. The output described above doesn't seem to be very useful.
Note that the awk code '!a[$0]++' will discard duplicated lines within all of the files processed by a single invocation of awk. But, since you are processing each of your 276 CSV input files in a separate invocation of awk there is no way for any of those invocations of awk to compare any input values from one input file to any input values from any other input file.
This User Gave Thanks to Don Cragun For This Post:
Note that the awk code '!a[$0]++' will discard duplicated lines within all of the files processed by a single invocation of awk. But, since you are processing each of your 276 CSV input files in a separate invocation of awk there is no way for any of those invocations of awk to compare any input values from one input file to any input values from any other input file.
I guess I still don't understand what output you're trying to produce.
Everything in the second column of the output you showed us in the image you included in post #5 in this thread is identical to the data to the first column of the output in that same image with the possible exception of the filename you have in the date field (which is chopped off because you didn't show us the entire image). In the first post you said you wanted to delete output lines where everything matched, but the output you've shown us has everything matching with nothing deleted???
Please try once more to clearly explain what you are trying to do. It is hard to help you come up with a solution to your problem if we can't figure out what you're trying to do!
This User Gave Thanks to Don Cragun For This Post:
I will try be more clear ( I'm french so I don't speak english very well and I think this is why you having trouble to understand some things )
With my first script in post #1, I could display the informations in column. For each csv files, I create a new column with de date of the file ( the filename looks like MYCSV-DATE-20190812.csv in the filename, that's why is use the " splitname(FILENAME... )" to keep only the date " 20190812 " ), and under the date, I display the informations from CSV files that corresponds to him :
On this screenshot, you can see that each informations in the columns are the sames ( only 3 columns here, but I have 276 CSV files... So if you understand what I want to do, there should be 276 columns... ). It's normal that the informations are the sames because some lines are sometimes the same in each csv files ).
Sometimes, some columns are empty, so with this piece of code :
I don't display the empty columns.
But now, I want to not display the lines that are the same. This will allow to reduce the number of columns. I know this command :
With this command in addition at my script and with the same frame as my screenshot, only 40 columns with only one or two infomations are displayed... that was 276 with my first script
But, ( certainly because I'm beginner and that my skills still pretty bad ), I didn't succeed to put this command in my script. So... I decide to start from the beginning. I make this script :
This script allow to only display the differents lines from each files ( So, in my case, all the differents lines represents the moment where the % of consumption in RAM and CPU changed and this is what I want to displayed ). Now, I want to display these informations... Inside a columns, like my first script. One file = one date = one columns with the informations of the file under the date.
I changed my script for that :
And the output is :
So yes, there is a mistake somewhere because each files are displayed in the same column and the content of this column is the same inside the others columns... But there are some good points :
Only the differents lines are displayed
The informations are displayed under the right date
Now I will change that to create one column for one date, and change the FILENAME output to only keep the date.
Hi All,
I am storing the result in the variable result_text using the below code.
result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines.
file and time for the interval 03:30 - 03:45
file and time for the interval 03:30 - 03:45 ... (4 Replies)
Hi,
I am on a Solaris8 machine
If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error"
...or
Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great.
******************
... (3 Replies)
Hi,
I have a file with date in it like:
UserString1
UserString2
UserString3
UserString4
UserString5
I need two entries for each line so it reads like
UserString1
UserString1
UserString2
UserString2
etc. Can someone help me with the awk command please?
Thanks (4 Replies)
Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example:
input:
<string-array name="threeItems">
<item>item1</item>
<item>item2</item>
<item>item3</item>
</string-array>
<string-array name="twoItems">
<item>item1</item>
<item>item2</item>... (19 Replies)
Hi, I have a huge file which is about 50GB. There are many lines. The file format likes
21 rs885550 0 9887804 C C T C C C C C C C
21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0
21 rs303304 0 9941889 A A A A A A A A A A
22 rs303304 0 9941890 0 A A A A A A A A A
The question is that there are a few... (4 Replies)
Hi,
I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates.
Thanks in advance,
sudvishw :confused: (7 Replies)
i have the long file more than one ns and www and mx in the line like .
i need the first ns record and first www and first mx from line .
the records are seperated with tthe ; i am try ing in awk scripting not getiing the solution.
... (4 Replies)
I have following file content (3 fields each line):
23 888 10.0.0.1
dfh 787 10.0.0.2
dssf dgfas 10.0.0.3
dsgas dg 10.0.0.4
df dasa 10.0.0.5
df dag 10.0.0.5
dfd dfdas 10.0.0.5
dfd dfd 10.0.0.6
daf nfd 10.0.0.6
...
as can be seen, that the third field is ip address and sorted. but... (3 Replies)