![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| csh syntax | charlie11k | UNIX for Dummies Questions & Answers | 0 | 06-01-2007 02:29 AM |
| vim and syntax max os | kezzol | OS X (Apple) | 1 | 04-30-2007 02:22 PM |
| grep syntax for this... | roshanjain2 | Shell Programming and Scripting | 4 | 02-19-2007 06:21 AM |
| What does this syntax mean... | DrAwesomePhD | UNIX for Dummies Questions & Answers | 1 | 07-31-2006 11:54 AM |
| correct syntax for using "grep regex filename" ? | yongho | UNIX for Dummies Questions & Answers | 2 | 06-13-2005 02:44 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Need help with the syntax using awk+grep
Hi,
I need to extract information from a 4 GB file based on the following conditions: 1) Check for the presence of a set of account numbers Each account number is present along with other information within a PAGESTART and PAGEEND. The file looks like this: PAGESTART ACCOUNT NO 123 DATE 10-01-2004 money 10982 PAGEEND PAGESTART ACCOUNT NO 245 DATE 10-03-2005 MONEY 254 PAGEND 2) If the account numbers are present then the information corresponding PAGESTART and PAGEEND must be determined. If one of the specified account no is 123, I require the following information PAGESTART ACCOUNT NO 123 DATE 10-01-2004 money 10982 PAGEEND Can anyone help with this!! |
|
||||
|
This may not be efficient, but it works:
I am assuming that all the acct nos are placed in a file vertically. For ex. if you need data regarding accounts 123 and 456 they are present in a file in the following format: acct_file: 123 456 Use sed to reformat the file like this: sed 's/^/ACCOUNT NO /g' acct_file|sed 's/$/,/g' >temp temp: ACCOUNT NO 123, ACCOUNT NO 456, Use this as a pattern file for grep, and use paste on the information containing file (lets call it acct_info): paste -s -d",,,,\n" acct_info|grep -f temp (paste will horizontally paste every 5 rows of the file. grep uses temp as a pattern file) Hope this helps. |
|
||||
|
Thanks Abhishek for your response,
The separator is a tag. ACCOUNT NO |12345 INVOICE NO |578 There are about 80 fields between the PAGESTART and PAGEEND which has to be retrieved for a matching account. PAGESTART ... ACCOUNT NAME | Business Level ACCOUNT NO |1234 MONEY |54 ... PAGEEND |
|
||||
|
trythis
Try this
At cmd line awk -F'|' -f awkfile s=acct_no ip_filename where awkfile contains: $1 ~ "PAGESTART" {prevline=$0;getline;} $1 ~ /ACCOUNT NO/ && $2 ~ s {print prevline; do {print $0;getline;}while($0 !~ "PAGEEND"); exit;} END {print "PAGEEND";} May be using this you can go through the required nos in a loop and print them out. |
|
||||
|
Hope this helps
This is a quick and dirty method (I doubt its efficiency for data of your size):
( Assuming the file name containing nos is acct_nos, wherein the acct numbers are vertically placed, like this: acct_nos: 123 456 I am assuming the file name of the file containing acct information as "acct_info") The following statements should work ( these rely on certain special characters...again assuming that you your data does not use characters "#" AND "@". In case if they do replace these by characters not being used) sed 's/^/ACCOUNT NO |/g' acct_nos|sed 's/$/@/g' >temp_nos sed 's/PAGESTART/#PAGESTART/g' acct_info|tr '\n' '@'|tr '#' '\n'>temp_info grep -f temp_nos temp_info|tr '@' '\n' Heres another way with PERL. This should IDEALLY be faster(and better--- it takes care a lot of whitespace worries. For ex if the acct_nos file lists nos as: 123 456 it wouldnt be affected. Also the script works irrespective of whether the ACCOUNT NO line has some no. of whitespaces at the start or before the "pipe" (or tag as u might say) delimiter (though it is assumed that "ACCOUNT" and "NO" are separated by one space only). Same goes for the account no.): find_acct.pl: #!/usr/bin/perl open (ACCT_INFO,"acct_info"); open (ACCT_NOS,"acct_nos"); @acct_nos=<ACCT_NOS>; close (ACCT_NOS); $acct_present="no"; while(<ACCT_INFO>) { chop($_); @buffer; @chk_pagestart_or_acct=split(/\|/); if($chk_pagestart_or_acct[0] =~ /^\s*PAGESTART\s*$/) { if($acct_present eq "no") {splice(@buffer,0,@buffer);} else { print ("@buffer"); splice(@buffer,0,@buffer); } } else { if($chk_pagestart_or_acct[0] =~ /^\s*ACCOUNT NO\s*$/) { $chk_pagestart_or_acct[1]=~ s/^\s+//; $chk_pagestart_or_acct[1]=~ s/\s+$//; @found=grep(/^\s*$chk_pagestart_or_acct[1]\s*$/,@acct_nos); $acct_present=($#found == -1 ? "no" : "yes"); splice(@found,0,@found); } } push(@buffer,$_."\n"); } if($acct_present eq "yes") { print("@buffer");} close (ACCT_INFO); |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|