Different registrars use different output formats. So unless you are querying a very restricted set of domains, for example domains all registered by one person, or for other reasons all registered with the same registrar or only a small set of registrars, this may turn out to be more complex than you thought.
Perhaps it would be useful as a first step to separate the entries to different files depending on the
[Querying ... line? Try the
csplit command for that. Then you can create a parser for each of the formats you find in there.
How do you know when to stop? Often a record will include hierarchical information (especially for the ARIN information, which is what your ABCE.TSD example looks like) in which the later lines are more specific than the earlier ones. Then you often want the later lines, not the earlier ones. (But this depends on what you need this for, of course.)
Anyway, here's an attempt at implementing your current spec. This simply picks out the first of anything after the Querying line:
Code:
perl -ne 'if (/^\[Querying/) {
print; @wanted = qw(OrgName NetRange inetnum descr owner Country);
$wanted = &wanted(@wanted);
}
sub wanted {
return "^(" . join ("|", map { quotemeta $_ } @_) . "):";
}
if ($wanted && $_ =~ m/$wanted/i) {
print;
@wanted = grep { $_ ne $1 } @wanted;
$wanted = @wanted ? &wanted(@wanted) : "";
}' file
This came out a little more monstrous than I'd like it to be, but maybe you can use it as a starting point.
(In retrospect, maybe it would have been better to use a hash to keep track of which values are already captured, and not capture if the hash says we already have the one we are looking at. Push the captured ones to an array if preserving order is important.)