The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
csh syntax charlie11k UNIX for Dummies Questions & Answers 0 06-01-2007 02:29 AM
vim and syntax max os kezzol OS X (Apple) 1 04-30-2007 02:22 PM
grep syntax for this... roshanjain2 Shell Programming and Scripting 4 02-19-2007 06:21 AM
What does this syntax mean... DrAwesomePhD UNIX for Dummies Questions & Answers 1 07-31-2006 11:54 AM
correct syntax for using "grep regex filename" ? yongho UNIX for Dummies Questions & Answers 2 06-13-2005 02:44 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 11-04-2005
kthri kthri is offline
Registered User
  
 

Join Date: Nov 2005
Posts: 7
Need help with the syntax using awk+grep

Hi,
I need to extract information from a 4 GB file based on the following conditions:

1) Check for the presence of a set of account numbers

Each account number is present along with other information within
a PAGESTART and PAGEEND.

The file looks like this:
PAGESTART
ACCOUNT NO 123
DATE 10-01-2004
money 10982
PAGEEND
PAGESTART
ACCOUNT NO 245
DATE 10-03-2005
MONEY 254
PAGEND


2) If the account numbers are present then the information corresponding PAGESTART and PAGEEND must be determined.

If one of the specified account no is 123,
I require the following information
PAGESTART
ACCOUNT NO 123
DATE 10-01-2004
money 10982
PAGEEND


Can anyone help with this!!
  #2 (permalink)  
Old 11-05-2005
Abhishek Ghose Abhishek Ghose is offline
Registered User
  
 

Join Date: Sep 2005
Location: Chennai
Posts: 81
This may not be efficient, but it works:
I am assuming that all the acct nos are placed in a file vertically. For ex. if you need data regarding accounts 123 and 456 they are present in a file in the following format:

acct_file:
123
456

Use sed to reformat the file like this:
sed 's/^/ACCOUNT NO /g' acct_file|sed 's/$/,/g' >temp

temp:
ACCOUNT NO 123,
ACCOUNT NO 456,


Use this as a pattern file for grep, and use paste on the information containing file (lets call it acct_info):

paste -s -d",,,,\n" acct_info|grep -f temp

(paste will horizontally paste every 5 rows of the file. grep uses temp as a pattern file)

Hope this helps.
  #3 (permalink)  
Old 11-05-2005
kthri kthri is offline
Registered User
  
 

Join Date: Nov 2005
Posts: 7
Thanks Abhishek for your response,

The separator is a tag.

ACCOUNT NO |12345
INVOICE NO |578

There are about 80 fields between the PAGESTART and PAGEEND
which has to be retrieved for a matching account.

PAGESTART
...
ACCOUNT NAME | Business Level
ACCOUNT NO |1234
MONEY |54
...
PAGEEND
  #4 (permalink)  
Old 11-05-2005
ranj@chn ranj@chn is offline Forum Advisor  
Playing with Ubuntu Now!
  
 

Join Date: Oct 2005
Location: Chennai
Posts: 365
trythis

Try this
At cmd line
awk -F'|' -f awkfile s=acct_no ip_filename

where awkfile contains:
$1 ~ "PAGESTART" {prevline=$0;getline;}
$1 ~ /ACCOUNT NO/ && $2 ~ s {print prevline; do {print $0;getline;}while($0 !~ "PAGEEND"); exit;}
END {print "PAGEEND";}

May be using this you can go through the required nos in a loop and print them out.
  #5 (permalink)  
Old 11-05-2005
Abhishek Ghose Abhishek Ghose is offline
Registered User
  
 

Join Date: Sep 2005
Location: Chennai
Posts: 81
Guys try perl!
  #6 (permalink)  
Old 11-06-2005
Ygor's Avatar
Ygor Ygor is offline Forum Staff  
Moderator
  
 

Join Date: Oct 2003
Location: -31.96,115.84
Posts: 1,408
Try....
Code:
awk -v RS=PAGEEND '/ACCOUNT NO 123/{print $0 RS}' file1
  #7 (permalink)  
Old 11-07-2005
Abhishek Ghose Abhishek Ghose is offline
Registered User
  
 

Join Date: Sep 2005
Location: Chennai
Posts: 81
Hope this helps

This is a quick and dirty method (I doubt its efficiency for data of your size):
( Assuming the file name containing nos is acct_nos, wherein the acct numbers are vertically placed, like this:
acct_nos:
123
456
I am assuming the file name of the file containing acct information as "acct_info")


The following statements should work ( these rely on certain special characters...again assuming that you your data does not use characters "#" AND "@". In case if they do replace these by characters not being used)

sed 's/^/ACCOUNT NO |/g' acct_nos|sed 's/$/@/g' >temp_nos
sed 's/PAGESTART/#PAGESTART/g' acct_info|tr '\n' '@'|tr '#' '\n'>temp_info
grep -f temp_nos temp_info|tr '@' '\n'



Heres another way with PERL. This should IDEALLY be faster(and better--- it takes care a lot of whitespace worries. For ex if the acct_nos file lists nos as:
123
456
it wouldnt be affected. Also the script works irrespective of whether the ACCOUNT NO line has some no. of whitespaces at the start or before the "pipe" (or tag as u might say) delimiter (though it is assumed that "ACCOUNT" and "NO" are separated by one space only). Same goes for the account no.):


find_acct.pl:
#!/usr/bin/perl

open (ACCT_INFO,"acct_info");
open (ACCT_NOS,"acct_nos");

@acct_nos=<ACCT_NOS>;
close (ACCT_NOS);

$acct_present="no";
while(<ACCT_INFO>)
{
chop($_);
@buffer;

@chk_pagestart_or_acct=split(/\|/);

if($chk_pagestart_or_acct[0] =~ /^\s*PAGESTART\s*$/)
{ if($acct_present eq "no")
{splice(@buffer,0,@buffer);}
else
{ print ("@buffer");
splice(@buffer,0,@buffer);
}
}
else {
if($chk_pagestart_or_acct[0] =~ /^\s*ACCOUNT NO\s*$/)
{
$chk_pagestart_or_acct[1]=~ s/^\s+//;
$chk_pagestart_or_acct[1]=~ s/\s+$//;
@found=grep(/^\s*$chk_pagestart_or_acct[1]\s*$/,@acct_nos);
$acct_present=($#found == -1 ? "no" : "yes");

splice(@found,0,@found);
}
}

push(@buffer,$_."\n");


}



if($acct_present eq "yes")
{ print("@buffer");}


close (ACCT_INFO);
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 05:23 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0