Visit Our UNIX and Linux User Community


remove duplicating lines


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers remove duplicating lines
# 1  
Old 09-22-2009
remove duplicating lines

Hi,

i have a webserver logfile and want to count how many page views there have been.
I was thinking about removing lines that begin with the same user and same date&time, because it indicates they were just looking at one page and multiple hits were counted.
My question is how do I do this? I know how to remove lines that are completely alike (with uniq), but the problem is the lines aren't exactly alike, just the beginning.
for example, if you have:
i don't know how.
i don't know this.
how do i make sure the second line gets removed, because it begins with the same word(s)?
# 2  
Old 09-22-2009
for you example:
Code:
$more kk2.txt 
i don't know how.
i don't know this.
i don't know other
i don't know one more
$sort -t" " -k3,3 -u kk2.txt 
i don't know how.

# 3  
Old 09-22-2009
if i do that, i'll get two lines back:
$ sort -t" " -k3,3 -u kk2.txt

i don't know how.\
i don't know this.\

i tried it with my logfile and it does the same, giving two lines back instead of one..?
# 4  
Old 09-22-2009
can you copy/paste the content of your file?
look in man page for command sort in your system:
Code:
man sort

the command in my enviorement work's,
Code:
$cat /etc/issue; echo $SHELL
Red Hat Enterprise Linux AS release 3 (Taroon Update 6)
Kernel \r on an \m
/bin/bash

# 5  
Old 09-22-2009
assume you want uniqueness on the first 4 fields
Code:
awk '!arr[$1 $2 $3 $4]++'  filename > uniq.file

add or subtract the $n field specifiers as needed.
# 6  
Old 09-22-2009
it still gives me two lines back, also when i use the command

Code:
$ awk '!arr[$1 $2 $3 $4]++' kk1.txt > uniq.file
or
awk '!arr[$1 $2 $3 $4 $5]++' kk1.txt > uniq.file
or
awk '!arr[$1 $2 $3]++' kk1.txt > uniq.file

the content of my file is quite big, but it pretty much comes down to this:

Code:
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/javascript.js HTTP/1.1" 200 343 "http://ilps.science.uva.nl/MoodViews/index.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/images/menu_bg.png HTTP/1.1" 200 153 "http://ilps.science.uva.nl/MoodViews/mv-fp-styles.css" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/images/button_selected.png HTTP/1.1" 200 511 "http://ilps.science.uva.nl/MoodViews/mv-fp-styles.css" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/images/button_normal.png HTTP/1.1" 200 507 "http://ilps.science.uva.nl/MoodViews/mv-fp-styles.css" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"
38.119.128.203 - - [19/Jul/2006:07:56:25 +0200] "GET /MoodViews/Moodfeeds/Moodstickers/sticker-linear-bottom-tiny.png HTTP/1.1" 200 2070 "http://ilps.science.uva.nl/MoodViews/index.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4"

where i want to remove the last 4 lines, because they start with the exact same IP-adress and date&time as the first line.

Last edited by Franklin52; 09-22-2009 at 10:14 AM.. Reason: Please use code tags!
# 7  
Old 09-22-2009
Try:

Code:
awk '!a[$1$4]++' file

 

Previous Thread | Next Thread
Test Your Knowledge in Computers #719
Difficulty: Medium
Alan Minsky wrote the book Artificial Neural Networks, attacking the work of Frank Rosenblatt, which became the foundational work in the analysis of artificial intelligence applications in machine learning.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

Remove lines that are subsets of other lines in File

Hello everyone, Although it seems easy, I've been stuck with this problem for a moment now and I can't figure out a way to get it done. My problem is the following: I have a file where each line is a sequence of IP addresses, example : 10.0.0.1 10.0.0.2 10.0.0.5 10.0.0.1 10.0.0.2... (5 Replies)
Discussion started by: MisterJellyBean
5 Replies

3. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

4. UNIX for Dummies Questions & Answers

Want to remove all lines but not latest 50 lines from a file

Hi, I have a huge file which has Lacs of lines. File system got full. I want your guys help to suggest me a solution so that I can remove all lines from that file but not last 50,000 lines. I want solution which can remove lines from existing file so that I can have some space left with. (28 Replies)
Discussion started by: prashant2507198
28 Replies

5. Shell Programming and Scripting

Duplicating and changing sh file

Hello, I have a file named file_1.sh that I want to duplicate into file_2.sh, file_3.sh,..., etc. I also need to change the text within each file so that it would fit the file name. For example, in file_1.sh there is a command to save some output as 'output_1.txt', and also there is an input... (3 Replies)
Discussion started by: haguyw
3 Replies

6. UNIX for Dummies Questions & Answers

duplicating a line

I have a text file which is the results of running a tests hundreds of times. For simplicity let's say that each test consists of 5 lines of text numbered 1-5 e.g. 1 aaa aaa aaa 2 bbb bbb bbb 3 ccc ccc ccc 4 ddd ddd ddd 5 eee eee eee 1 aaa aaa aaa 2 bbb bbb bbb 3 ccc... (4 Replies)
Discussion started by: millsy5
4 Replies

7. Shell Programming and Scripting

mv duplicating directories

Hi Folks, I've put together a script for sorting my backup files into sub folders to be run from a cron job. Each file is named username.tar.gz and the file /etc/trueuserowners contains all users and their owner in the format "user: owner". The script works fine identifying users and their owners... (10 Replies)
Discussion started by: beddo
10 Replies

8. Shell Programming and Scripting

remove blank lines and merge lines in shell

Hi, I'm not a expert in shell programming, so i've come here to take help from u gurus. I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command. I've a datatable csv of the below format - --in file format xx,xx,xx ,xx , , , , ,,xx, xxxx,, ,, xxx,... (11 Replies)
Discussion started by: dvah
11 Replies

9. Red Hat

Duplicating ethernet speed

Hi guys, Suppose you have a server with two ethernet cards (1GB each) and each cards are connecting to two different switches cisco 3750. My question is: How can I setup my server's network interfaces to increase the throughput up to 2GB? is it possible? If not, do you know another way to up... (3 Replies)
Discussion started by: iga3725
3 Replies

10. Post Here to Contact Site Administrators and Moderators

Sorry for duplicating posts

SORRY FOR DUPLICATING POSTS, COULD U JUST REMOVE THE FIRST ONE. Thank u. (1 Reply)
Discussion started by: vrguha
1 Replies

Featured Tech Videos