Extracting high frequency data-lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting high frequency data-lines
# 1  
Old 01-04-2011
Extracting high frequency data-lines

Hi,

I have a very large log file in the following format:
Code:
198.28.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 348
244.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 211
198.28.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 191
4.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 1131
244.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 1131
244.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 1131
4.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 1131
244.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 211
4.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 1131

The first column is the source IP address. The entire file contains entries a finite set of source IP addresses, each having some frequency. In the example 198.28.0.0 (2), 244.48.0.0 (4), 4.48.0.0 (3).

I require a sed/awk script to take "Frequency" as the user input (or it can be hard-coded as well) and extract all the lines higher than or equal to that frequency. For example if user gives 3 as the input that entries from source IP's appearing more than or equal to 3 times should be in the output file. Hence the output file should contain entries from source IP 244.48.0.0 and 4.48.0.0 i.e.
Code:
244.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 211
244.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 211
244.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 211
244.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 211
4.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 1131
4.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 1131
4.48.0.0 - - [08/Jul/1998:19:00:01 +0000]  200 1131

Thanks and Regards


Moderator's Comments:
Mod Comment Please use code tags when posting data and code samples!

Last edited by Franklin52; 01-04-2011 at 03:19 AM..
# 2  
Old 01-04-2011
One way:
Code:
awk 'NR==FNR{c[$1]++;next}c[$1]>=3' file file | sort

# 3  
Old 01-04-2011
Try this:
Code:
awk 'NR==FNR{A[$1]++;next}A[$1]>=n' n=3 infile infile

Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting data from multiple lines

Hi All, I am stuck in one step.. I have one file named file.txt having content: And SGMT.perd_id = (SELECT cal.fiscal_perd_id FROM $ODS_TARGT.TIM_DT_CAL_D CAL FROM $ODS_TARGT.GL_COA_SEGMNT_XREF_A SGMT SGMT.COA_XREF_TYP_IDN In (SEL COA_XREF_TYP_IDN From... (4 Replies)
Discussion started by: Shilpi Gupta
4 Replies

2. UNIX for Dummies Questions & Answers

Extracting data between specific lines, multiple times

I need help extracting specific lines in a text file. The file looks like this: POSITION TOTAL-FORCE (eV/Angst) ----------------------------------------------------------------------------------- 1.86126 1.86973 1.86972 ... (14 Replies)
Discussion started by: captainalright
14 Replies

3. UNIX for Advanced & Expert Users

Extracting specific lines from data file

Hello, Is there a quick awk one-liner for this extraction?: file1 49389 text55 52211 text66 file2 59302 text1 49389 text2 85939 text3 52211 text4 13948 text5 Desired output 49389 text2 52211 text4 Thanks!! (5 Replies)
Discussion started by: palex
5 Replies

4. UNIX for Dummies Questions & Answers

Filtering data -extracting specific lines

I have a table to data which one of the columns include string of text from within that, I am searching to include few lines but not others for example I want to to include some combination of word address such as (address.| address? |the address | your address) but not (ip address | email... (17 Replies)
Discussion started by: A-V
17 Replies

5. Shell Programming and Scripting

Extracting specific lines of data from a file and related lines of data based on a grep value range?

Hi, I have one file, say file 1, that has data like below where 19900107 is the date, 19900107 12 144 129 0.7380047 19900108 12 168 129 0.3149017 19900109 12 192 129 3.2766666E-02 ... (3 Replies)
Discussion started by: Wynner
3 Replies
Login or Register to Ask a Question