Sponsored Content
Top Forums Shell Programming and Scripting Gawk --- produce the output in pattern space instead of END space Post 303023843 by busyboy on Monday 24th of September 2018 04:16:20 AM
Old 09-24-2018
Gawk --- produce the output in pattern space instead of END space

hi,

I'm trying to calculate IP addresses and their respective calls to our apache Server. The standard format of the input is

Code:
HOST IP DATE/TIME - - "GET/POST reuest" "User Agent"
HOST IP DATE/TIME - - "GET/POST reuest" "User Agent"
HOST IP DATE/TIME - - "GET/POST reuest" "User Agent"
HOST IP DATE/TIME - - "GET/POST reuest" "User Agent"
HOST IP DATE/TIME - - "GET/POST reuest" "User Agent"


I'm using below given gawk code to do this ( that is accumulating all requests for all IPs in a given input file.

Code:
gawk --re-interval -F\"  '
 /./  { split($1,IP," "); IPPP[IP[2]]++;}
 /./  { split($1,IP," "); LINE[IP[2]]=LINE[IP[2]]"<br>"$2; } 
END  { for(i in LINE){{  printf("\n\n%s\t%s",i,LINE[i]) }} }' other_vhosts_access.log


the problem:

input-file is actually around 47Gib in size and when I return the LINE array in END space of gawk, The process consumes all the available memory of the system and the system starts running out of memory for all other processes.

Question:

Can i return the LINE array in our pattern space rather than END space so that every IP matched is returned -- instead of adding it into array and then displaying the result.

------ Post updated at 02:16 PM ------

BTW, this code works fine for smaller file ( when I split the file into smaller chunks, which doesn't satisfy the requirement, as all the file must be scanned at once, so that I get all IPs list )
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing Space at the end of file

Hi.... I have a situation...I have a data file...that has space(an extra row with no data) at the end of file. I am trying to remove that spaces only if the file has a space at the end of file and if there is no space I don't want to do anything. Can you please help me in this regards. ... (4 Replies)
Discussion started by: rkumar28
4 Replies

2. Shell Programming and Scripting

remove space in front or end of each field

Hi, I have a txt file called a.txt which contain over 10,000 records and I would like to remove space before comma or after comma....like below: The input (for example two record 00001,00002): 00001,client,card limited ,02292,N ,162:41 , 192, ... (6 Replies)
Discussion started by: happyv
6 Replies

3. Shell Programming and Scripting

to see space, tab, end of the line chracters

what can I use ?? In vi, I can use :set list <-- and see end of line $.. or use cat -A but I am wondering if there is command or program that allows me to see all the hidden characters( space, tab and etc) Please help thanks. (3 Replies)
Discussion started by: convenientstore
3 Replies

4. UNIX for Advanced & Expert Users

how can I read the space in the end of line

cat file1|while read i do echo "$i"|wc done with this command the space in the end of the line not considered how can solve that for example: read h "hgyr " echo "$h"|wc 4 (2 Replies)
Discussion started by: Ehab
2 Replies

5. Shell Programming and Scripting

Calculate total space, total used space and total free space in filesystem names matching keyword

Good afternoon! Im new at scripting and Im trying to write a script to calculate total space, total used space and total free space in filesystem names matching a keyword (in this one we will use keyword virginia). Please dont be mean or harsh, like I said Im new and trying my best. Scripting... (4 Replies)
Discussion started by: bigben1220
4 Replies

6. Shell Programming and Scripting

Add a space at end of file

Hi I guess this is very simple.... I want to add a space at the last line in a file. The space has to be the last charachter on the last line, not at a new line. Anyone ?? (7 Replies)
Discussion started by: disel
7 Replies

7. Shell Programming and Scripting

Replace end of line with a space

for eg: i have i/p file as: ================ i wnt to change end of line ================= my require ouput is like: i wnt to change end of line ==================== (7 Replies)
Discussion started by: RahulJoshi
7 Replies

8. UNIX for Dummies Questions & Answers

removal of space from the end

HI, I need the help from the experts like I have created one file with text like: Code: a b c de f g hi j k l So my question is that i have to write the script in which like in the first sentence it will take only one space after d and remove all the extra space in the end.I dont... (0 Replies)
Discussion started by: bhanudhingra
0 Replies

9. Shell Programming and Scripting

Pad space at the end of string and reformat

I need to read in the string from input file and reform it by cut each segment and check the last segement lenght. If the last segment length is not as expected (see below segment file or table. It is predefined), then pad enough space. Old string FU22222222CA6666666666AKxvbFMddreeadBP999... (11 Replies)
Discussion started by: menglm
11 Replies

10. Shell Programming and Scripting

Remove trailing space in Gawk

Hi, I have simply made a shell script to convert *.csv to *.xml file. Xml file is required for input to one tool. But i am getting space after last field. How can i remove it. Shell script is as follows :- if then echo "" echo "Wrong syntax, Databse_update.sh... (6 Replies)
Discussion started by: stillrules
6 Replies
acctcom(8)						      System Manager's Manual							acctcom(8)

NAME
acctcom - Displays selected process accounting record summaries SYNOPSIS
/usr/bin/acctcom [-abfhikmqrtv] [-C seconds] [-e time] [-E time] [-g group] [-H factor] [-I number] [-l line] [select_flag -o file] [-n pattern] [-O seconds] [-s time] [-S time] [-u username] [file ...] FLAGS
Displays average statistics about the selected processes. Statistics are displayed at the end of the output records in the format var=# (for example, CMDS=439), where the value (#) is given to the nearest hundredth. The var specifies the following: Value Total number of commands listed in the specified file Average real time per process Average CPU time per process Average user CPU time per process Average system CPU time per process Average number of characters transferred Average number of blocks transferred Average CPU factor (average user time divided by total CPU time) Average hog factor (average CPU time divided by average elapsed time) Displays information about the most recently executed commands first. This flag has no effect when the acctcom command reads from the default input device or if more than one process accounting file is specified. The column heading format is the same as the default column heading format. Lists processes whose total CPU time (system time + user time) is greater than the value specified by seconds. The column heading format is the same as the default column heading format. Displays information only about processes that start at or before the specified time, which is specified as hh[:mm[:ss]]. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy START BEFORE: day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) Displays information only about processes that end at or before the specified time, which is specified as hh[:mm[:ss]]. If you specify the same time for both the -E and -S flags, the acctcom command displays processes that existed at the specified time. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy END BEFORE : day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) Displays information about the fork/exec flag (used to execute another process) in the F column and the system exit value STAT, which can be zero (0) or an error code, in the STAT column in addition to the default column heading format. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) F STAT Displays only the processes that belong to the specified group. You may specify either the group ID or the group name. The column heading format is the same as the default column heading format. Displays the hog factor instead of the mean memory size. The hog factor is the CPU time used by the process divided by the real time. The output is the same as the default column format output except the MEAN SIZE(K) column heading is replaced by the .HOG FACTOR heading. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU HOG NAME USER TTYNAME TIME TIME (SECS) (SECS) FACTOR Displays information about processes that exceed the value specified by the hogfactor variable. The output format is the same as the default column heading format. Displays the number of characters and blocks transferred during read or write I/O operations. The output is similar to the default column heading format, except the CHARS TRANSFD column replaces the MEAN SIZE(K) column, and the BLOCKS READ column is added to the output. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU CHARS BLOCKS NAME USER TTYNAME TIME TIME (SECS) (SECS) TRANSFD READ Displays information about the processes that transfer more than the number of characters specified by the number variable. The output format is the same as the default column heading format. Displays the total number of K-core min- utes, which is the number of kilobytes of memory used by the process multiplied by the buffer time used. The output format is the same as the default column heading format, except the KCORE MIN column heading replaces the MEAN SIZE(K) column heading. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU KCORE NAME USER TTYNAME TIME TIME (SECS) (SECS) MIN Displays information about the processes that belong to the workstation whose tty line is specified by the line variable (for example, ttyp0). The heading format is the same as the default column heading format. Displays the median amount of process memory used. If you also specify the -h or -k flag, the -m flag is ignored. The output format is the same as the default column heading format. Displays information only about the processes whose names include the regular expression specified by the pattern variable. The output format is the same as the default column heading format. Copies selected process records to the specified filename. The select_flag variable specifies the following process selection flags: -C, -e, -E, -g, -H, -I, -l, -n, -O, -s, -S, and -u. If you do not specify a selection flag with the select_flag variable, all process records are copied to filename. The output format includes only the date and time: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy Displays information about the processes that have a CPU system time exceeding the time specified by the seconds variable. The output format is the same as the default column heading format. Displays only the average statistics, which are shown at the end of the command output when you use the -a flag. Displays the CPU factor, which is the user time divided by the total CPU time. The output format is the same as the default column heading format, except the CPU FACTOR column replaces the MEAN SIZE(K) column. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU CPU NAME USER TTYNAME TIME TIME (SECS) (SECS) FACTOR Displays information about the processes that existed on or after time, which is specified as hh[:mm[:ss]]. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy END AFTER : day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) Displays information only about the processes that started at or after time, which is specified as hh[:mm[:ss]]. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy START AFTER : day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) Displays system and user CPU times under separate column headings. The CPU SYS column heading, which shows the system CPU time, replaces the CPU (SECS) default column heading. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU (SECS) NAME USER TTYNAME TIME TIME (SECS) SYS USER Displays information about the processes that are owned by user username. You can specify a user identification number, a login name converted to a user identification number, a number sign (#) to specify root or a question mark (?), which selects pro- cesses associated with unknown user identification numbers. The output format is the same as the default column heading format. Removes column headings from the output; otherwise the output is the same as the default column heading format. DESCRIPTION
The acctcom command displays process accounting records from files specified by the file parameter, from standard input, or from the /var/adm/pacct file. If you do not specify a file and if standard input is assigned to a workstation or to /dev/null (for example, if a process runs in the background), the acctcom command reads the /var/adm/pacct file. You do not have to be root to use the acctcom command, which is located in the /usr/bin directory. If you specify more than one filename, the acctcom process reads each file chronologically in time-descending order according to process completion time. Usually, the /var/adm/pacct file is used, but you can have several /var/adm/pacct/*Vn files, which are created by the ckpacct command. Each record specifies the execution times for a completed process. The default output format includes the command name, user name, tty name, process start time, process end time, real seconds, CPU seconds, and mean memory size (in kilobytes). The process summary output has the following default column heading format: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yyyy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) If a specified time is later than the current time, it is interpreted as occurring on the previous day. You can use flags to display the state of the fork/exec flag, F column; the system exit value, STAT column; the ratio of total CPU time to elapsed time, HOG FACTOR column; the product of memory used and elapsed time, KCORE MIN column; the ratio of user time to total (system plus user) time, CPU FACTOR column; the number of characters transferred during I/O operations, CHARS TRNSFD column; and the total number of blocks read or written, BLOCKS READ column. If a process is run under root or su authority, the command name is prefixed with a number sign (#). If a process is not assigned to a known tty (for example, if the cron daemon runs the process), a question mark (?) is displayed in the TTYNAME column. The acctcom command reports only on processes that have completed. Use the ps command to examine the status of active processes. For any flag value that produces a timestamp in an output heading, the order of date and time information is locale dependent. The time- stamps shown in the examples use the default format for date and time values. EXAMPLES
The following command displays information about processes that exceed 2.0 seconds of CPU time: /usr/sbin/acct/acctcom -O 2 < /var/adm/pacct The following command displays information about processes belonging to the Finance group: /usr/sbin/acct/acctcom -g Finance < /var/adm/pacct The following command displays information about processes belonging to tty /dev/console that run after 5:00 p.m.: /usr/sbin/acct/acctcom -l /dev/console -s 17:00 FILES
Specifies the command path. The active process accounting database file. User and group database files. Accounting header files that define formats for writing accounting files. RELATED INFORMATION
Commands: ed(1), ps(1), su(1), acct(8), cron(8), runacct(8) Functions: acct(2) delim off acctcom(8)
All times are GMT -4. The time now is 05:50 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy