Select only top "N" records based on column value


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Select only top "N" records based on column value
# 1  
Old 11-02-2010
Select only top "N" records based on column value

Hi Gurus,
I know this'll be simple task for all the geeks out here but me being a newbie is finding it hard to crack this shell.

Ok coming to the task I've a delimited file as below

==================================================
====================================================

I've to select top 'N" records based on column 4 (aaaaa"),
so the output should be as below if "N" is 2

==================================================
==================================================

The next task is if the first 3 characters of column 6 = "SQL" then it should replace column 6 with "SQL" or if column 6 contains "SELECT" then it should replace it with "SELECT" or if none of the above condtion is satisfied then it should replace column 6 with "NULL".
So the output will be somthing like below

========================================================
[========================================================

Any help is greatly appreciated.

Thanks

Last edited by asandy1234; 12-06-2010 at 01:30 PM.. Reason: Please use code tags!
# 2  
Old 11-02-2010
Code:
awk -F\" '
count[$4]++ < n {
  if ($6 ~ /^(SQL|SELECT)/)
    $6 = $6 ~ /^SQL/ ? "SQL" : "SELECT"
  else 
    $6 = "NULL"
  print 
  }' OFS=\" n=2 infile

This User Gave Thanks to radoulov For This Post:
# 3  
Old 11-02-2010
Quote:
Originally Posted by radoulov
Code:
awk -F\" '
count[$4]++ < n {
  if ($6 ~ /^(SQL|SELECT)/)
    $6 = $6 ~ /^SQL/ ? "SQL" : "SELECT"
  else 
    $6 = "NULL"
  print 
  }' OFS=\" n=2 infile

Thank you so much radoulov you are such a genius...
# 4  
Old 11-08-2010
Quote:
Originally Posted by asandy1234
Thank you so much radoulov you are such a genius...
Hi radoulov,
Will the above code work for this example below to display SELECT as the output

132""

Thanks

Last edited by asandy1234; 12-06-2010 at 01:31 PM..
# 5  
Old 11-08-2010
No,
would you want the script to display the above as select statement?

You could use something like this:

Code:
awk -F\" '
count[$4]++ < n {
  if ($6 ~ /^(SQL|SELECT)|select/)
    $6 = $6 ~ /^SQL/ ? "SQL" : "SELECT"
  else 
    $6 = "NULL"
  print 
  }' OFS=\" n=2 infile

Or, if you want to capture the patterns sql and select in any position in the sixth field case-insensitively with GNU awk:

Code:
awk -F\" '
count[$4]++ < n {
  if ($6 ~ /(sql|select)/)
    $6 = $6 ~ /^sql/ ? "SQL" : "SELECT"
  else 
    $6 = "NULL"
  print 
  }' OFS=\" n=2 IGNORECASE=1 infile

The IGNORECASE variable is a GNU awk extension!

Last edited by radoulov; 11-08-2010 at 12:56 PM..
# 6  
Old 11-08-2010
Quote:
Originally Posted by radoulov
No,
would you want the script to display the above as select statement?

You could use something like this:

Code:
awk -F\" '
count[$4]++ < n {
  if ($6 ~ /^(SQL|SELECT)|select/)
    $6 = $6 ~ /^SQL/ ? "SQL" : "SELECT"
  else 
    $6 = "NULL"
  print 
  }' OFS=\" n=2 infile

Or, if you want to capture the patterns sql and select in any position in the sixth field case-insensitively with GNU awk:

Code:
awk -F\" '
count[$4]++ < n {
  if ($6 ~ /(sql|select)/)
    $6 = $6 ~ /^sql/ ? "SQL" : "SELECT"
  else 
    $6 = "NULL"
  print 
  }' OFS=\" n=2 IGNORECASE=1 infile

The IGNORECASE variable is a GNU awk extension!
Again and again you are proving that you are a genius. I really appreciate your valueble contribution.

Thanks once again...
# 7  
Old 11-09-2010
Quote:
Originally Posted by asandy1234
Again and again you are proving that you are a genius. I really appreciate your valueble contribution.

Thanks once again...
Hi radoulov,
I've one final query,in the logic to remove records whose count is more than 'n' based on column4 if I've to include one more column say column2 will the query change as count[$2,$4]++ < n ?

Thanks,
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract delta records using with "comm" and "sort" commands combination

Hi All, I have 2 pipe delimited files viz., file_old and file_new. I'm trying to compare these 2 files, and extract all the different rows between them into a new_file. comm -3 < sort file_old < sort file_new > new_file I am getting the below error: -ksh: sort: cannot open But if I do... (7 Replies)
Discussion started by: njny
7 Replies

2. Shell Programming and Scripting

Choosing between repeated entries based on the "absolute values" of a column

Hello, I was looking for a way to select between the repeated entries (column1) based on the values of absolute values of column 3 (larger value). For example if the same gene id has FC value -2 and 1, I should get the output as -2. Kindly help. GeneID Description FC ... (2 Replies)
Discussion started by: Sanchari
2 Replies

3. Shell Programming and Scripting

awk based script to print the "mode(statistics term)" for each column in a data file

Hi All, Thanks all for the continued support so far. Today, I need to find the most occurring string/number(also called mode in statistics terminology) for each column in a data file (.csv type). For one column of data(1.txt) like below Sample 1 2 2 3 4 1 1 1 2 I can find the mode... (6 Replies)
Discussion started by: ks_reddy
6 Replies

4. Shell Programming and Scripting

Substituting comma "," for dot "." in a specific column when comma"," is a delimiter

Hi, I'm dealing with an issue and losing a lot of hours figuring out how i would solve this. I have an input file which looks like this: ('BLABLA +200-GRS','Serviço ','TarifaçãoServiço','wap.bla.us.0000000121',2985,0,55,' de conversão em escada','Dia','Domingos') ('BLABLA +200-GRR','Serviço... (6 Replies)
Discussion started by: poliver
6 Replies

5. Shell Programming and Scripting

AWK for multiple line records RS="^" FS="#"

I have to pull multiple line records with ^ as the record separator(RS)... # should be my field separator (FS)... Sample record is: ^-60#ORA-00060: deadlock detected while waiting for resource ORA-00001: unique constraint (SARADM.TCKNUM_PK) violated#PROC:AVAILABLE_FOR_GETNXTTIC#02/27/2012... (7 Replies)
Discussion started by: Vidhyaprakash
7 Replies

6. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

7. AIX

AIX 5.3 - Discrepancies between "top" and "vmstat"

Can someone explain the differences I'm seeing below in TOP and VMSTAT commands on my AIX 5.3 server? Thanks! CPUs: 4; load averages: 0.86, 0.97, 0.97 18:09:26 926 processes: 4 stopped, 922 running CPU states: 78.4% idle, 8.5% user, 12.6% kernel, 0.3% wait Memory: 23680M Total.... (1 Reply)
Discussion started by: troym72
1 Replies

8. Debian

Debian: doubt in "top" %CPU and "sar" output

Hi All, I am running my application on a dual cpu debian linux 3.0 (2.4.19 kernel). For my application: <sar -U ALL> CPU %user %nice %system %idle ... 10:58:04 0 153.10 0.00 38.76 0.00 10:58:04 1 3.88 0.00 4.26 ... (0 Replies)
Discussion started by: jaduks
0 Replies

9. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Hi All, I need to select only those records having a non zero record in the first column of a comma delimited file. Suppose my input file is having data like: "0","01/08/2005 07:11:15",1,1,"Created",,"01/08/2005" "0","01/08/2005 07:12:40",1,1,"Created",,"01/08/2005"... (2 Replies)
Discussion started by: shashi_kiran_v
2 Replies

10. UNIX for Advanced & Expert Users

Commands on Digital Unix equivalent to for "top" and "sar" on other Unix flavour

Hi, We have a DEC Alpha 4100 Server with OSF1 Digital Unix 4.0. Can any one tell me, if there are any commands on this Unix which are equivalent to "top" and "sar" on HP-UX or Sun Solaris ? I am particularly interested in knowing the CPU Load, what process is running on which CPU, etc. ... (1 Reply)
Discussion started by: sameerdes
1 Replies
Login or Register to Ask a Question