The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Shell script for text extraction from a file vignesh53 Shell Programming and Scripting 3 02-05-2008 08:16 AM
Report file extraction based on Date range ganapati Shell Programming and Scripting 2 07-13-2006 11:26 AM
date-extraction from a file in KSH homer_hn Shell Programming and Scripting 6 04-21-2006 01:51 AM
need help appending lines/combining lines within a file... mr_manny Shell Programming and Scripting 2 01-06-2006 06:45 PM
help on file extraction apalex UNIX for Dummies Questions & Answers 1 05-01-2001 10:29 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-27-2008
srsahu75 srsahu75 is offline
Registered User
  
 

Join Date: Jan 2008
Posts: 14
Extraction of various lines from a hugh file

Dear Members,
I have a huge file generated by the command 'whois' for hundred of IPs. Each section in the file starts with [Querying whois

I want to extract those lines which start with any of these words: [Querying whois, OrgName, NetRange, inetnum, descr, owner, Country in that section.

Input:

[Querying whois.XJHIOUIIOOPIOP]


OrgName: University of C
OrgID: U1
Address: OIT
Address: NH
City: BC
StateProv: XY
PostalCode: 000000
Country: MN

NetRange: XXX.YYY.M.N - XXX.YYY.M.Q
CIDR: LMANERIE
NetName: UC


[Querying whois.ABCE.TSD]

% Rights restricted by copyright.
% See

% Note: This output has been filtered.
% To receive output for a database update, use the "-B" flag


inetnum: XXX.YYY.M.N - XXX.YYY.M.Q
netname: NET-C
descr: HB
descr: The University
country: PQ
admin-c: TYE
tech-c: SDF
status: FGRG
mnt-by: FSDGFG
source: FGDFSG

role: OPRROKROTR
address: The University
address: DJFIEJRE
address: DIJAIRJEJ
address: EIREROERE

Required output:

[Querying whois.BUHIOUJIOU]
OrgName: HHHHHHHHHH (May or may not present)
NetRange:TTTTTTTTT (May or may not present)
inetnum: FTYFYYYUII (May or may not present)
descr: HIJKJKLLKL (It will be better if only first occurrence)
owner: JHKJOJOIPI (May or may not present)
Country: OIOPOPOP (1st occurrence)

Thanking you
With regards
  #2 (permalink)  
Old 04-27-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
Different registrars use different output formats. So unless you are querying a very restricted set of domains, for example domains all registered by one person, or for other reasons all registered with the same registrar or only a small set of registrars, this may turn out to be more complex than you thought.

Perhaps it would be useful as a first step to separate the entries to different files depending on the [Querying ... line? Try the csplit command for that. Then you can create a parser for each of the formats you find in there.

How do you know when to stop? Often a record will include hierarchical information (especially for the ARIN information, which is what your ABCE.TSD example looks like) in which the later lines are more specific than the earlier ones. Then you often want the later lines, not the earlier ones. (But this depends on what you need this for, of course.)

Anyway, here's an attempt at implementing your current spec. This simply picks out the first of anything after the Querying line:

Code:
perl -ne 'if (/^\[Querying/) {
  print; @wanted = qw(OrgName NetRange inetnum descr owner Country);
  $wanted = &wanted(@wanted);
}
sub wanted {
  return "^(" . join ("|", map { quotemeta $_ } @_) . "):";
}
if ($wanted && $_  =~ m/$wanted/i) {
  print;
  @wanted = grep { $_ ne $1 } @wanted;
  $wanted = @wanted ? &wanted(@wanted) : "";
}' file
This came out a little more monstrous than I'd like it to be, but maybe you can use it as a starting point.

(In retrospect, maybe it would have been better to use a hash to keep track of which values are already captured, and not capture if the hash says we already have the one we are looking at. Push the captured ones to an array if preserving order is important.)

Last edited by era; 04-27-2008 at 07:53 AM.. Reason: Add /i flag to make matching ignore case
  #3 (permalink)  
Old 04-29-2008
srsahu75 srsahu75 is offline
Registered User
  
 

Join Date: Jan 2008
Posts: 14
Hi,
Thank you very much for the help. The script is very useful upto 70% of my need. I will try to do something for rest of my 30%.

Thanking you
With regards
Satya
  #4 (permalink)  
Old 05-05-2008
srsahu75 srsahu75 is offline
Registered User
  
 

Join Date: Jan 2008
Posts: 14
Dear Era,
I want the script should take the input file as a variable as well as output file. I have two text files: (1) List of folders in which the script should work (2) List of input files on which the script should work.
Due to lack of Perl knowledge I tried unsuccessful. In Shell script I use:

for i in `(cat countries.txt)`
do

for j in `(cat year.txt)`

do

for k in `(cat countries/$i/$j)`

do



Same way I want the perl script take the inputfile as variable

Thanks
  #5 (permalink)  
Old 05-05-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
As a matter of shell coding style, the parentheses are completely unnecessary, and stuff in backticks works badly if there's a file name with spaces in it.

I don't see why you couldn't use that shell script to wrap the Perl code; there's nothing much there which Perl does better than the shell, other than not having to read the country file over and over again (but you could optimize that in the shell script, too). But anyway, here goes. I'm afraid this is completely untested.

Code:
#!/usr/bin/perl

die "Usage: $0 dir yearfile countryfile" unless (@ARGV == 3);

open (Y, "$ARGV[1]") || die "$0: Could not open $ARGV[1]: $!\n";
open (C, "$ARGV[2]") || die "$0: Could not open $ARGV[2]: $!\n";
my @countries = <C>;
close C;
while ($year = <Y>) {
  for $country (@countries) {
    handle ("$ARGV[0]/$year/$country");
  }
}
close Y;

sub handle {
  my ($file) = @_;
  open (F, $file) || die "$0: Could not open $file: $!\n";
  while (<F>) {
    if (/^\[Querying/) {
      print; @wanted = qw(OrgName NetRange inetnum descr owner Country);
      $wanted = &wanted(@wanted);
    }
    if ($wanted && $_  =~ m/$wanted/i) {
      print;
      @wanted = grep { $_ ne $1 } @wanted;
      $wanted = @wanted ? &wanted(@wanted) : "";
    }
    close F;
  }
}  
sub wanted {
  return "^(" . join ("|", map { quotemeta $_ } @_) . "):";
}
  #6 (permalink)  
Old 05-07-2008
srsahu75 srsahu75 is offline
Registered User
  
 

Join Date: Jan 2008
Posts: 14
Thank you very much for the code

Regards
Sponsored Links
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 09:18 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language translation by Google.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0