05-09-2006
faster way to loop?
Sample Log file
IP.address Date&TimeStamp GET/POST URL ETC
123.45.67.89 MMDDYYYYHHMM GET myURL
http://ABC.com
123.45.67.90 MMDDYYYYHHMM GET myURL
http://XYZ.com
I have a very huge web server log file (about 1.3GB) that contains entries like the one above. I need to get the last entries of all the different IPs that has myURL in it? Is there a quick way of looping? My idea was
# Get all the Unique IP addresses and then proceed to check each
cat weblog | awk '{print $1} > ip.list
for i in `cat ip.list`
do
cat weblog | grep $i | grep myURL > lastpages.lis
done
each day has around 3000+ unique IP entries and a day's log is about 48MB. with this process, it takes around 30 mins to process a days worth of data. is there a faster way to do this?
9 More Discussions You Might Find Interesting
1. IP Networking
For some reason 8.1 Mandrake Linux seems much slower than Windows 2000 with my cable modem. DSL reports test says they conferable speed with Windows2 though.
This is consistant slow with both of my boxes, at the same time. Linux used to be faster, but not with Mandrake. Any way to fix this? (17 Replies)
Discussion started by: lancest
17 Replies
2. Shell Programming and Scripting
Hi ,
I need to copy every day about 35GB of files from one file system to another.
Im using the cp command and its toke me about 25 min.
I also tried to use dd command but its toke much more.
Is there better option ?
Regards. (6 Replies)
Discussion started by: yoavbe
6 Replies
3. UNIX for Dummies Questions & Answers
Hi I have to grep for 2000 strings in a file one after the other.Say the file name is Snxx.out which has these strings.
I have to search for all the strings in the file Snxx.out one after the other.
What is the fastest way to do it ??
Note:The current grep process is taking lot of time per... (7 Replies)
Discussion started by: preethgideon
7 Replies
4. UNIX for Dummies Questions & Answers
i)wc -c/etc/passwd|awk'{print $1}'
ii)ls -al/etc/passwd|awk'{print $5}' (4 Replies)
Discussion started by: karthi_g
4 Replies
5. UNIX for Dummies Questions & Answers
I have read anecdotes about people installing RAID0 (RAID - Wikipedia, the free encyclopedia) on some of their machines because it gives a performance boost. Because bandwidth on the motherboard is limited, can someone explain exactly why it should be faster? (7 Replies)
Discussion started by: figaro
7 Replies
6. Shell Programming and Scripting
Hi all,
In bash scripting, I use to read files:
cat $file | while read line; do
...
doneHowever, it's a very slow way to read file line by line.
E.g. In a file that has 3 columns, and less than 400 rows, like this:
I run next script:
cat $line | while read line; do ## Reads each... (10 Replies)
Discussion started by: AlbertGM
10 Replies
7. Shell Programming and Scripting
I have the following code running against a file. The file can have upwards of 10000 lines.
problem is, the for loop takes a while to go through all those lines. is there a faster way to go about it?
for line in `grep -P "${MONTH} ${DAY}," file | ${AWK} -F" " '{print $4}' | awk -F":"... (2 Replies)
Discussion started by: SkySmart
2 Replies
8. UNIX for Dummies Questions & Answers
i'm trying to decide if to move operations from one of these hosts to the other. but i cant decide which one of them is the most powerful.
each host has 8 cpus.
HOSTA
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU ... (6 Replies)
Discussion started by: SkySmart
6 Replies
9. Shell Programming and Scripting
Hello,
I am under Ubuntu 18.04 Bionic.
I have one shell script run.sh (which is out of my topic) to run files under multiple directories and one file to control all processes running under those directories (control.sh).
I set a cronjob task to check each of them with two minutes of intervals.... (3 Replies)
Discussion started by: baris35
3 Replies
LEARN ABOUT DEBIAN
mergelogs
MERGELOGS(1) General Commands Manual MERGELOGS(1)
NAME
mergelogs - merge and consolidate web server logs
SYNOPSIS
mergelogs -p penlog [-c] [-d] [-j jitter] [-t seconds] server1:logfile1 [server2:logfile2 ...]
EXAMPLES
mergelogs -p pen.log 10.0.0.1:access_log.1 10.0.0.2:access_log.2
mergelogs -p pen.log 10.0.18.6:access_log-10.0.18.6 10.0.18.8:access_log-10.0.18.8
DESCRIPTION
When pen is used to load balance web servers, the web server log file lists all accesses as coming from the host running pen. This makes it
more difficult to analyze the log file.
To solve this, pen creates its own log file, which contains the real client address, the time of the access, the target server address and
the first few bytes of the requests.
Mergelogs reads pen's log file and the log files of all load balanced web servers, compares each entry and creates a combined log file that
looks as if the web server cluster were a single physical server. Client addresses are replaced with the real client addresses.
In the event that no matching client address can be found in the pen log, the server address is used instead. This should never happen, and
is meant as a debugging tool. A large number of these indicates that the server system date needs to be set, or that the jitter value is
too small.
You probably don't want to use this program. Penlog is a much more elegant and functional solution.
OPTIONS
-c Do not cache pen log entries. The use of this option is not recommended, as it will make mergelogs search the entire pen log for
every line in the web server logs.
-d Debugging (repeat for more).
-p penlog
Log file from pen.
-j jitter
Jitter in seconds (default 600). This is the maximum variation in time stamps in the pen and web server log files. A smaller value
will result in a smaller pen log cache and faster processing, at the risk of missed entries.
-t seconds
The difference in seconds between the time on the pen server and UTC. For example, this is 7200 (two hours) in Finland.
server:logfile
Web server address and name of log file.
AUTHOR
Copyright (C) 2001-2003 Ulric Eriksson, <ulric@siag.nu>.
SEE ALSO
pen(1), webresolve(1), penlog(1), penlogd(1)
LOCAL MERGELOGS(1)