Need help with the syntax using awk+grep


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need help with the syntax using awk+grep
# 1  
Old 11-04-2005
Need help with the syntax using awk+grep

Hi,
I need to extract information from a 4 GB file based on the following conditions:

1) Check for the presence of a set of account numbers

Each account number is present along with other information within
a PAGESTART and PAGEEND.

The file looks like this:
PAGESTART
ACCOUNT NO 123
DATE 10-01-2004
money 10982
PAGEEND
PAGESTART
ACCOUNT NO 245
DATE 10-03-2005
MONEY 254
PAGEND


2) If the account numbers are present then the information corresponding PAGESTART and PAGEEND must be determined.

If one of the specified account no is 123,
I require the following information
PAGESTART
ACCOUNT NO 123
DATE 10-01-2004
money 10982
PAGEEND


Can anyone help with this!!
# 2  
Old 11-05-2005
This may not be efficient, but it works:
I am assuming that all the acct nos are placed in a file vertically. For ex. if you need data regarding accounts 123 and 456 they are present in a file in the following format:

acct_file:
123
456

Use sed to reformat the file like this:
sed 's/^/ACCOUNT NO /g' acct_file|sed 's/$/,/g' >temp

temp:
ACCOUNT NO 123,
ACCOUNT NO 456,


Use this as a pattern file for grep, and use paste on the information containing file (lets call it acct_info):

paste -s -d",,,,\n" acct_info|grep -f temp

(paste will horizontally paste every 5 rows of the file. grep uses temp as a pattern file)

Hope this helps.
# 3  
Old 11-05-2005
Thanks Abhishek for your response,

The separator is a tag.

ACCOUNT NO |12345
INVOICE NO |578

There are about 80 fields between the PAGESTART and PAGEEND
which has to be retrieved for a matching account.

PAGESTART
...
ACCOUNT NAME | Business Level
ACCOUNT NO |1234
MONEY |54
...
PAGEEND
# 4  
Old 11-05-2005
trythis

Try this
At cmd line
awk -F'|' -f awkfile s=acct_no ip_filename

where awkfile contains:
$1 ~ "PAGESTART" {prevline=$0;getline;}
$1 ~ /ACCOUNT NO/ && $2 ~ s {print prevline; do {print $0;getline;}while($0 !~ "PAGEEND"); exit;}
END {print "PAGEEND";}

May be using this you can go through the required nos in a loop and print them out.
# 5  
Old 11-05-2005
Guys try perl!
# 6  
Old 11-06-2005
Try....
Code:
awk -v RS=PAGEEND '/ACCOUNT NO 123/{print $0 RS}' file1

# 7  
Old 11-07-2005
Hope this helps

This is a quick and dirty method (I doubt its efficiency for data of your size):
( Assuming the file name containing nos is acct_nos, wherein the acct numbers are vertically placed, like this:
acct_nos:
123
456
I am assuming the file name of the file containing acct information as "acct_info")


The following statements should work ( these rely on certain special characters...again assuming that you your data does not use characters "#" AND "@". In case if they do replace these by characters not being used)

sed 's/^/ACCOUNT NO |/g' acct_nos|sed 's/$/@/g' >temp_nos
sed 's/PAGESTART/#PAGESTART/g' acct_info|tr '\n' '@'|tr '#' '\n'>temp_info
grep -f temp_nos temp_info|tr '@' '\n'



Heres another way with PERL. This should IDEALLY be faster(and better--- it takes care a lot of whitespace worries. For ex if the acct_nos file lists nos as:
123
456
it wouldnt be affected. Also the script works irrespective of whether the ACCOUNT NO line has some no. of whitespaces at the start or before the "pipe" (or tag as u might say) delimiter (though it is assumed that "ACCOUNT" and "NO" are separated by one space only). Same goes for the account no.):


find_acct.pl:
#!/usr/bin/perl

open (ACCT_INFO,"acct_info");
open (ACCT_NOS,"acct_nos");

@acct_nos=<ACCT_NOS>;
close (ACCT_NOS);

$acct_present="no";
while(<ACCT_INFO>)
{
chop($_);
@buffer;

@chk_pagestart_or_acct=split(/\|/);

if($chk_pagestart_or_acct[0] =~ /^\s*PAGESTART\s*$/)
{ if($acct_present eq "no")
{splice(@buffer,0,@buffer);}
else
{ print ("@buffer");
splice(@buffer,0,@buffer);
}
}
else {
if($chk_pagestart_or_acct[0] =~ /^\s*ACCOUNT NO\s*$/)
{
$chk_pagestart_or_acct[1]=~ s/^\s+//;
$chk_pagestart_or_acct[1]=~ s/\s+$//;
@found=grep(/^\s*$chk_pagestart_or_acct[1]\s*$/,@acct_nos);
$acct_present=($#found == -1 ? "no" : "yes");

splice(@found,0,@found);
}
}

push(@buffer,$_."\n");


}



if($acct_present eq "yes")
{ print("@buffer");}


close (ACCT_INFO);
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

FIND and GREP syntax

I have a question to this command find . -type f -name ".*txt" -exec grep "text" {}\. The find command will locate a file name with the extension of txt once per round and find the word "text" in the content of the file or the find command will locate all the file names with the extension of... (2 Replies)
Discussion started by: TestKing
2 Replies

2. Shell Programming and Scripting

Help on grep syntax in UNIX

Dear Team /app/Appln/logs/ echo Session used server are 'grep -i pid|grep -i session | cut -d'.' -f1 | awk '{print $9}' | sort | uniq' Output - lxserver01 lxserver02 lxserver03 When I grep session pid in logs server details I can see above distinct server details but I... (6 Replies)
Discussion started by: skp
6 Replies

3. Shell Programming and Scripting

Grep syntax print after certain character

My current code is: user@ubuntu:~/Desktop$ grep -e "\(packaged by\)\|\(employee\)\|\(file name\)\|\(Total Data (MB) Read\)\|\(Begin Time\)" log.txt packaged by = Ron Mexico employee = Michael Vick file name = Mike_Vick_2011.bat Total Data (MB) Read: 11.82 Begin Time: 6/13/2011... (8 Replies)
Discussion started by: chipperuga
8 Replies

4. UNIX for Dummies Questions & Answers

Find/Grep Syntax Question

Hi Folks, I am trying to dig through about 100 directories that have 1 or 2 .jpg images stored in each. I want to copy the .jpg to another file in the root directory. Really my ultimate goal is not to have to dig down into each directory to copy the images individually. I thought I could use a... (2 Replies)
Discussion started by: alpinescott
2 Replies

5. Shell Programming and Scripting

Perl Grep Error - Possible Syntax?

Alrighty, I'm trying to get a perl script going to search through a bunch of files for me and compile it to a single location. I am currently having troubles on just getting the grep to work. Here is what I currently have: #!/usr/bin/perl open (LOG, "errors.txt") or die ("Unable to open... (2 Replies)
Discussion started by: adelsin
2 Replies

6. UNIX for Dummies Questions & Answers

| help | unix | grep (GNU grep) 2.5.1 | advanced regex syntax

Hello, I'm working on unix with grep (GNU grep) 2.5.1. I'm going through some of the newer regex syntax using Regular Expression Reference - Advanced Syntax a guide. ls -aLl /bin | grep "\(x\)" Which works, just highlights 'x' where ever, when ever. I'm trying to to get (?:) to work but... (4 Replies)
Discussion started by: MykC
4 Replies

7. UNIX for Dummies Questions & Answers

Help | Unix | grep | regular expression | backreference | Syntax/Logic

Hello, I'm working on learning regular expressions and what I can do with them. I'm using unix to and its programs to experiment and learn what my limitations are with them. I'm working on duplicating the regular expression: ^(.*)(\r?\n\1)+$ This is supposed to delete duplicate lines... (2 Replies)
Discussion started by: MykC
2 Replies

8. UNIX for Dummies Questions & Answers

Syntax Help | unix | grep | regular expression | repetition

Hello, This is my first post so, Hello World! Anyways, I'm learning how to use unix and its quickly become apparent that a strong foundation in regular expressions will make things easier. I'm not sure if my syntax is messing things up or my logic is messing things up. ps -e | grep... (4 Replies)
Discussion started by: MykC
4 Replies

9. BSD

proper syntax of grep command

I'm learning UNIX on my mac (BSD), using a manual. I'm trying to figure out the grep command, and am getting something wrong. I've opened one of my files in NeoOffice and am looking for a string, the phrase 'I am writing.' I've been to some sites to get the proper syntax, and from what I can see... (5 Replies)
Discussion started by: Straitsfan
5 Replies

10. Shell Programming and Scripting

grep syntax for this...

I wanna grep for a pattern logs 1 2 & 3 within a folder containing 100 logs grep "test" /folder/log1 /folder/log2 /folder/log3 The above command will work fine but is there any command like grep "test" /folder/log1, log2, log3 or something similar (4 Replies)
Discussion started by: roshanjain2
4 Replies
Login or Register to Ask a Question