![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Help with Fixed width File Parsing | sate911 | UNIX Desktop for Dummies Questions & Answers | 4 | 05-19-2008 08:18 AM |
| Changing particular field in fixed width file | dsravan | Shell Programming and Scripting | 4 | 02-11-2008 02:08 PM |
| Converting a Delimited File to Fixed width file | raghavan.aero | Shell Programming and Scripting | 2 | 06-06-2007 11:44 AM |
| adding delimiter to a fixed width file | sumeet | Shell Programming and Scripting | 2 | 03-21-2007 06:19 AM |
| Fixed Width file using AWK | alok.benjwal | UNIX for Dummies Questions & Answers | 2 | 12-05-2005 07:39 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Display Modes |
|
|||
|
Extracting records with unique fields from a fixed width txt file
Greetings,
I would like to extract records from a fixed width text file that have unique field elements. Data is structured like this: John A Smith NY Mary C Jones WA Adam J Clark PA Mary Jones WA Fieldname / start-end position Firstname 1-10 MI 11-12 Lastname 13-23 State 24-25 I want to compare firstname and lastname fields exclusively and output the unique records to a new file: John A Smith NY Adam J Clark PA Any assistance would be greatly appreciated. Last edited by sitney; 02-08-2008 at 10:39 PM. |
| Forum Sponsor | ||
|
|
|
|||
|
Your requirements are a bit vague, but here is a possible perl solution:
Code:
#!/usr/bin/perl
use warnings;
use strict;
#use Data::Dumper; #uncomment for debugging
unless (scalar @ARGV == 2){
die "Usage: perl scriptname.pl inputfile outputfile\n";
}
my $outfile = pop @ARGV;
my %names = ();
my %count = ();
while (<>){
chomp;
my ($first,$mi,$last,$state) = unpack("a10a2a11a2",$_);
(s/^\s*//, s/\s*$//) for ($first,$mi,$last,$state);
$names{"$first,$last"}={count => ++$count{"$first,$last"},
name => "$first $mi $last $state",
};
}
#print Dumper \%names; #uncomment for debugging
open my $out , '>' , $outfile or die "$!";
foreach my $person (keys %names) {
next if $names{$person}{count}>1;
print $out $names{$person}{name},"\n";
}
close $out;
print STDOUT "finished";
exit(0);
perl scriptname.pl path/to/inputfile path/to/outputfile Last edited by KevinADC; 02-09-2008 at 01:08 AM. |
|
|||
|
KevinADC - I really appreciate your response here.
It works! When I run your perl script, I get these results: $ cat newnames.txt John A Smith NY Adam J Clark PA Despite my vague requirements, you understood them perfectly. I am trying to decipher the workhorse part of the script you wrote: while (<>){ chomp; #Assign variables to fixed width sections using unpack. my ($first,$mi,$last,$state) = unpack("a10a2a11a2",$_); #Remove whitespace from variables. (s/^\s*//, s/\s*$//) for ($first,$mi,$last,$state); #Please describe what is going on here. $names{"$first,$last"}={count => ++$count{"$first,$last"}, name => "$first $mi $last $state", }; } Thanks again. |
|
|||
|
Quote:
$names{"$first,$last"} creates a hash key from the first and last name. its' value is in turn a hash: Code:
$names{"$first,$last"} = {count=>'' , name => '' };
Code:
++$count{"$first,$last"}
the "name" keys is just the original line from the file which we use to print to the output file if the value of the "count" key is 1 (one). You can uncomment the lines that say to "uncomment for debugging" and you will see the data structure of %names printed when the script finishes running. |
|
|||
|
You have here:
Quote:
John W "Van Johnson" (last name in quotes to show it is one field) John W VanJohnson This is probaly a rare circumstance (and not a very good example) but it is possible, especially if the names are not in English. |
|
|||
|
You said,
Quote:
However, the hash structure you used Code:
$names{"$first,$last"}={count => ++$count{"$first,$last"},
name => "$first $mi $last $state",
};
Even though I don't fully grasp this data structure, I can use it, modify it, and apply it. So thanks again KevinADC! |
|
|||
|
You're welcome. Actually that data structure could have been a bit simpler:
Code:
while (<>){
chomp;
my ($first,$mi,$last,$state) = unpack("a10a2a11a2",$_);
(s/^\s*//, s/\s*$//) for ($first,$mi,$last,$state);
$names{"$first,$last"}{count}++;
$names{"$first,$last"}{name} = "$first $mi $last $state",
}
|
|||
| Google The UNIX and Linux Forums |