remove duplicating lines


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers remove duplicating lines
# 1  
Old 09-22-2009
remove duplicating lines

Hi,

i have a webserver logfile and want to count how many page views there have been.
I was thinking about removing lines that begin with the same user and same date&time, because it indicates they were just looking at one page and multiple hits were counted.
My question is how do I do this? I know how to remove lines that are completely alike (with uniq), but the problem is the lines aren't exactly alike, just the beginning.
for example, if you have:
i don't know how.
i don't know this.
how do i make sure the second line gets removed, because it begins with the same word(s)?
# 2  
Old 09-22-2009
for you example:
Code:
$more kk2.txt 
i don't know how.
i don't know this.
i don't know other
i don't know one more
$sort -t" " -k3,3 -u kk2.txt 
i don't know how.

# 3  
Old 09-22-2009
if i do that, i'll get two lines back:
$ sort -t" " -k3,3 -u kk2.txt

i don't know how.\
i don't know this.\

i tried it with my logfile and it does the same, giving two lines back instead of one..?
# 4  
Old 09-22-2009
can you copy/paste the content of your file?
look in man page for command sort in your system:
Code:
man sort

the command in my enviorement work's,
Code:
$cat /etc/issue; echo $SHELL
Red Hat Enterprise Linux AS release 3 (Taroon Update 6)
Kernel \r on an \m
/bin/bash

# 5  
Old 09-22-2009
assume you want uniqueness on the first 4 fields
Code:
awk '!arr[$1 $2 $3 $4]++'  filename > uniq.file

add or subtract the $n field specifiers as needed.
# 6  
Old 09-22-2009
it still gives me two lines back, also when i use the command

Code:
$ awk '!arr[$1 $2 $3 $4]++' kk1.txt > uniq.file
or
awk '!arr[$1 $2 $3 $4 $5]++' kk1.txt > uniq.file
or
awk '!arr[$1 $2 $3]++' kk1.txt > uniq.file

the content of my file is quite big, but it pretty much comes down to this:

Code:
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/javascript.js HTTP/1.1" 200 343 "http://ilps.science.uva.nl/MoodViews/index.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/images/menu_bg.png HTTP/1.1" 200 153 "http://ilps.science.uva.nl/MoodViews/mv-fp-styles.css" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/images/button_selected.png HTTP/1.1" 200 511 "http://ilps.science.uva.nl/MoodViews/mv-fp-styles.css" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/images/button_normal.png HTTP/1.1" 200 507 "http://ilps.science.uva.nl/MoodViews/mv-fp-styles.css" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/Moodfeeds/Moodstickers/sticker-linear-bottom-tiny.png HTTP/1.1" 200 2070 "http://ilps.science.uva.nl/MoodViews/index.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"

where i want to remove the last 4 lines, because they start with the exact same IP-adress and date&time as the first line.

Last edited by Franklin52; 09-22-2009 at 09:14 AM.. Reason: Please use code tags!
# 7  
Old 09-22-2009
Try:

Code:
awk '!a[$1$4]++' file

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

Remove lines that are subsets of other lines in File

Hello everyone, Although it seems easy, I've been stuck with this problem for a moment now and I can't figure out a way to get it done. My problem is the following: I have a file where each line is a sequence of IP addresses, example : 10.0.0.1 10.0.0.2 10.0.0.5 10.0.0.1 10.0.0.2... (5 Replies)
Discussion started by: MisterJellyBean
5 Replies

3. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

4. UNIX for Dummies Questions & Answers

Want to remove all lines but not latest 50 lines from a file

Hi, I have a huge file which has Lacs of lines. File system got full. I want your guys help to suggest me a solution so that I can remove all lines from that file but not last 50,000 lines. I want solution which can remove lines from existing file so that I can have some space left with. (28 Replies)
Discussion started by: prashant2507198
28 Replies

5. Shell Programming and Scripting

Duplicating and changing sh file

Hello, I have a file named file_1.sh that I want to duplicate into file_2.sh, file_3.sh,..., etc. I also need to change the text within each file so that it would fit the file name. For example, in file_1.sh there is a command to save some output as 'output_1.txt', and also there is an input... (3 Replies)
Discussion started by: haguyw
3 Replies

6. UNIX for Dummies Questions & Answers

duplicating a line

I have a text file which is the results of running a tests hundreds of times. For simplicity let's say that each test consists of 5 lines of text numbered 1-5 e.g. 1 aaa aaa aaa 2 bbb bbb bbb 3 ccc ccc ccc 4 ddd ddd ddd 5 eee eee eee 1 aaa aaa aaa 2 bbb bbb bbb 3 ccc... (4 Replies)
Discussion started by: millsy5
4 Replies

7. Shell Programming and Scripting

mv duplicating directories

Hi Folks, I've put together a script for sorting my backup files into sub folders to be run from a cron job. Each file is named username.tar.gz and the file /etc/trueuserowners contains all users and their owner in the format "user: owner". The script works fine identifying users and their owners... (10 Replies)
Discussion started by: beddo
10 Replies

8. Shell Programming and Scripting

remove blank lines and merge lines in shell

Hi, I'm not a expert in shell programming, so i've come here to take help from u gurus. I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command. I've a datatable csv of the below format - --in file format xx,xx,xx ,xx , , , , ,,xx, xxxx,, ,, xxx,... (11 Replies)
Discussion started by: dvah
11 Replies

9. Red Hat

Duplicating ethernet speed

Hi guys, Suppose you have a server with two ethernet cards (1GB each) and each cards are connecting to two different switches cisco 3750. My question is: How can I setup my server's network interfaces to increase the throughput up to 2GB? is it possible? If not, do you know another way to up... (3 Replies)
Discussion started by: iga3725
3 Replies

10. Post Here to Contact Site Administrators and Moderators

Sorry for duplicating posts

SORRY FOR DUPLICATING POSTS, COULD U JUST REMOVE THE FIRST ONE. Thank u. (1 Reply)
Discussion started by: vrguha
1 Replies
Login or Register to Ask a Question