Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

How to store info from a txt file into a hash?

Shell Programming and Scripting


Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 1 Week Ago
Eric1 Eric1 is offline
Registered User
 
Join Date: Apr 2017
Last Activity: 26 May 2017, 12:55 AM EDT
Posts: 26
Thanks: 0
Thanked 0 Times in 0 Posts
How to store info from a txt file into a hash?

I'm trying to make a perl script using the "open" command to open and read a file, storing the information in said file into a hash structure.

This is what is inside my file-


Code:
Celena Standard  F 01/24/94 Cancer 
Jeniffer Orlowski  F 06/24/86 None
Brent Koehler  M 12/05/97  HIV
Mao Schleich  M 04/17/60  Cancer
Goldie Moultrie  F 04/05/96  None
Silva Rizzo  F 10/26/78  Amyloidosis
Leatha Papenfuss  F 10/15/97  CREST
Vita Sabb  F 05/28/87  Autism
Alyce Ugarte  F 12/21/64  HIV
Ela Prout  F 12/05/57  Autism
Mohamed Buchannon  M 07/24/91  Caner
Lael Stall  M 12/05/97  None

The first column is a name, the second is gender, third is birthdate, fourth is disease. The name is supposed to be the key while the other three columns are the values.

Also how would I allow the user to change information and output information to another file?


Moderator's Comments:
How to store info from a txt file into a hash? Please use CODE tags for data as well as required by forum rules!

Last edited by RudiC; 1 Week Ago at 12:28 AM.. Reason: Added CODE tags.
Sponsored Links
    #2  
Old Unix and Linux 1 Week Ago
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 28 May 2017, 8:31 AM EDT
Location: San Jose, CA, USA
Posts: 10,308
Thanks: 521
Thanked 3,587 Times in 3,055 Posts
Since the "columns" in your file seem to be separated by one or more spaces, how do you know where the name "column" ends and the gender "column" starts? If more than one disease is associated with a name, does that add more <space>s to the last "column" in your file? If a disease has more than one word (e.g., diabetes mellitus or mitral valve prolapse), how are diseases separated from each other in the last "column"?

What have you tried to solve this problem on your own?
The Following User Says Thank You to Don Cragun For This Useful Post:
drl (1 Week Ago)
Sponsored Links
    #3  
Old Unix and Linux 1 Week Ago
Aia's Unix or Linux Image
Aia Aia is offline
Registered User
 
Join Date: May 2008
Last Activity: 27 May 2017, 9:15 PM EDT
Posts: 1,586
Thanks: 43
Thanked 602 Times in 566 Posts
Quote:
Originally Posted by Eric1 View Post
I'm trying to make a perl script using the "open" command to open and read a file, storing the information in said file into a hash structure.

This is what is inside my file-


Code:
Celena Standard  F 01/24/94 Cancer 
Jeniffer Orlowski  F 06/24/86 None
Brent Koehler  M 12/05/97  HIV
Mao Schleich  M 04/17/60  Cancer
Goldie Moultrie  F 04/05/96  None
Silva Rizzo  F 10/26/78  Amyloidosis
Leatha Papenfuss  F 10/15/97  CREST
Vita Sabb  F 05/28/87  Autism
Alyce Ugarte  F 12/21/64  HIV
Ela Prout  F 12/05/57  Autism
Mohamed Buchannon  M 07/24/91  Caner
Lael Stall  M 12/05/97  None

The first column is a name, the second is gender, third is birthdate, fourth is disease. The name is supposed to be the key while the other three columns are the values.

Also how would I allow the user to change information and output information to another file?


Moderator's Comments:
How to store info from a txt file into a hash? Please use CODE tags for data as well as required by forum rules!
Hello Eric1,

Let me give you a few examples.
If I were to implement at face value what you are asking this would be the result:

Code:
$ cat read_list.pl
#!/usr/bin/perl
#
use strict;
use warnings;
use Data::Dumper;

my %patient;
while(<>) {
    my @pair = /^(\w+\s\w+)\s+(.+)$/;
    $patient{$pair[0]} = $pair[1];
}
print Dumper \%patient;


Output:


Code:
$VAR1 = {
          'Leatha Papenfuss' => 'F 10/15/97  CREST',
          'Celena Standard' => 'F 01/24/94 Cancer ',
          'Vita Sabb' => 'F 05/28/87  Autism',
          'Jeniffer Orlowski' => 'F 06/24/86 None',
          'Alyce Ugarte' => 'F 12/21/64  HIV',
          'Silva Rizzo' => 'F 10/26/78  Amyloidosis',
          'Lael Stall' => 'M 12/05/97  None',
          'Mao Schleich' => 'M 04/17/60  Cancer',
          'Brent Koehler' => 'M 12/05/97  HIV',
          'Mohamed Buchannon' => 'M 07/24/91  Caner',
          'Ela Prout' => 'F 12/05/57  Autism',
          'Goldie Moultrie' => 'F 04/05/96  None'
        };

But I suspect that's not what you want. Probably, you would like something more like:

Code:
$ cat read_names.pl
#!/usr/bin/perl
#
use strict;
use warnings;
use Data::Dumper;

my %patient;
while(<>) {
    my @record = split;
    $patient{"@record[0..1]"} = {
        'gender' => "$record[2]",
        'birthday' => "$record[3]",
        'disease' => "@record[4..$#record]",
    }

}
print Dumper \%patient;

Output:

Code:
$ perl read_names.pl people.list
$VAR1 = {
          'Leatha Papenfuss' => {
                                  'disease' => 'CREST',
                                  'birthday' => '10/15/97',
                                  'gender' => 'F'
                                },
          'Celena Standard' => {
                                 'disease' => 'Cancer',
                                 'birthday' => '01/24/94',
                                 'gender' => 'F'
                               },
          'Vita Sabb' => {
                           'disease' => 'Autism',
                           'birthday' => '05/28/87',
                           'gender' => 'F'
                         },
          'Jeniffer Orlowski' => {
                                   'disease' => 'None',
                                   'birthday' => '06/24/86',
                                   'gender' => 'F'
                                 },
          'Alyce Ugarte' => {
                              'disease' => 'HIV',
                              'birthday' => '12/21/64',
                              'gender' => 'F'
                            },
          'Silva Rizzo' => {
                             'disease' => 'Amyloidosis',
                             'birthday' => '10/26/78',
                             'gender' => 'F'
                           },
          'Lael Stall' => {
                            'disease' => 'None',
                            'birthday' => '12/05/97',
                            'gender' => 'M'
                          },
          'Mao Schleich' => {
                              'disease' => 'Cancer',
                              'birthday' => '04/17/60',
                              'gender' => 'M'
                            },
          'Brent Koehler' => {
                               'disease' => 'HIV',
                               'birthday' => '12/05/97',
                               'gender' => 'M'
                             },
          'Mohamed Buchannon' => {
                                   'disease' => 'Caner',
                                   'birthday' => '07/24/91',
                                   'gender' => 'M'
                                 },
          'Ela Prout' => {
                           'disease' => 'Autism',
                           'birthday' => '12/05/57',
                           'gender' => 'F'
                         },
          'Goldie Moultrie' => {
                                 'disease' => 'None',
                                 'birthday' => '04/05/96',
                                 'gender' => 'F'
                               }
        };

However, depending of the real input, that might have a serious flaw. Name plus last name is not unique enough. There is the strong possibility that two or more entries might contain the same name last-name record even when the data would mean different people. Translation: you loose data, since a hash will keep only the last read.

Adding the birthday to the id might help to prevent that. Here's a modification of the previous code, using a modified input to prove handling of name collision and multi-word decease:

INPUT:



Code:
$ cat name.list
Celena Standard  F 01/24/94 Cancer
Jeniffer Orlowski  F 06/24/86 None
Brent Koehler  M 12/05/97  HIV
Mao Schleich  M 04/17/60  Cancer
Goldie Moultrie  F 04/05/96  None
Silva Rizzo  F 10/26/78  Amyloidosis
Leatha Papenfuss  F 10/15/97  CREST
Vita Sabb  F 05/28/87  Autism
Alyce Ugarte  F 12/21/64  HIV
Ela Prout  F 12/05/57  Autism
Silva Rizzo  F 22/5/81  Dissociative Indentity Disorder
Mohamed Buchannon  M 07/24/91  Caner
Lael Stall  M 12/05/97  None


Code:
$ cat read_names.pl
#!/usr/bin/perl
#
use strict;
use warnings;
use Data::Dumper;

my %patient;
while(<>) {
    my @record = split;
    $patient{"@record[0..1,3]"} = {
        'gender' => "$record[2]",
        'birthday' => "$record[3]",
        'disease' => "@record[4..$#record]",
    }

}
print Dumper \%patient;

Output:


Code:
$ perl read_names.pl name.list
$VAR1 = {
          'Ela Prout 12/05/57' => {
                                    'disease' => 'Autism',
                                    'birthday' => '12/05/57',
                                    'gender' => 'F'
                                  },
          'Silva Rizzo 10/26/78' => {
                                      'disease' => 'Amyloidosis',
                                      'birthday' => '10/26/78',
                                      'gender' => 'F'
                                    },
          'Mohamed Buchannon 07/24/91' => {
                                            'disease' => 'Caner',
                                            'birthday' => '07/24/91',
                                            'gender' => 'M'
                                          },
          'Vita Sabb 05/28/87' => {
                                    'disease' => 'Autism',
                                    'birthday' => '05/28/87',
                                    'gender' => 'F'
                                  },
          'Mao Schleich 04/17/60' => {
                                       'disease' => 'Cancer',
                                       'birthday' => '04/17/60',
                                       'gender' => 'M'
                                     },
          'Brent Koehler 12/05/97' => {
                                        'disease' => 'HIV',
                                        'birthday' => '12/05/97',
                                        'gender' => 'M'
                                      },
          'Jeniffer Orlowski 06/24/86' => {
                                            'disease' => 'None',
                                            'birthday' => '06/24/86',
                                            'gender' => 'F'
                                          },
          'Lael Stall 12/05/97' => {
                                     'disease' => 'None',
                                     'birthday' => '12/05/97',
                                     'gender' => 'M'
                                   },
          'Leatha Papenfuss 10/15/97' => {
                                           'disease' => 'CREST',
                                           'birthday' => '10/15/97',
                                           'gender' => 'F'
                                         },
          'Silva Rizzo 22/5/81' => {
                                     'disease' => 'Dissociative Indentity Disorder',
                                     'birthday' => '22/5/81',
                                     'gender' => 'F'
                                   },
          'Alyce Ugarte 12/21/64' => {
                                       'disease' => 'HIV',
                                       'birthday' => '12/21/64',
                                       'gender' => 'F'
                                     },
          'Goldie Moultrie 04/05/96' => {
                                          'disease' => 'None',
                                          'birthday' => '04/05/96',
                                          'gender' => 'F'
                                        },
          'Celena Standard 01/24/94' => {
                                          'disease' => 'Cancer',
                                          'birthday' => '01/24/94',
                                          'gender' => 'F'
                                        }
        };

Note:
The code assumes that the patient will always be name and last-name and not a variation like name alone or name, middle name, last-name, etc...

Once you decide and practice with extracting the data based on actual data, you could show your effort on it and follow up with your second question.

By the way, I hope the example does not contain real people's names and birthdays that you happen to be trusted with. That would be a 'terrible' thing to post.

Last edited by Aia; 1 Week Ago at 01:22 PM..
    #4  
Old Unix and Linux 1 Week Ago
Eric1 Eric1 is offline
Registered User
 
Join Date: Apr 2017
Last Activity: 26 May 2017, 12:55 AM EDT
Posts: 26
Thanks: 0
Thanked 0 Times in 0 Posts
Don't worry, the info I posted aren't real people. I'll try out your code in a bit Aia, thank you for the examples. Is there no way for me to accomplish this task of mine with the open commend though? As in something like-

open($patient_names, "+<Patient_Names.txt");
Sponsored Links
    #5  
Old Unix and Linux 1 Week Ago
Aia's Unix or Linux Image
Aia Aia is offline
Registered User
 
Join Date: May 2008
Last Activity: 27 May 2017, 9:15 PM EDT
Posts: 1,586
Thanks: 43
Thanked 602 Times in 566 Posts
Here's an example how you might be able to open files to read and to write.
Open the patient file, search for cancer records and write the result to another file.



Code:
$ cat read_and_write_names.pl


Code:
#!/usr/bin/perl
#
use strict;
use warnings;

my $patient_names = 'patient.list';

#
# open patient.lit to read or exit
#
open my $in_file, '<', $patient_names or die "Could not open file $patient_names: $!\n";

#
# structure the database
#
my %patients;
while(<$in_file>) {
    my @record = split;
    $patients{"@record[0..1,3]"} = {
        'name' => "$record[0]",
        'lastname' => "$record[1]",
        'gender' => "$record[2]",
        'birthday' => "$record[3]",
        'disease' => "@record[4..$#record]",
    }

}
close $in_file;

#
# to reassemble the original order of fields
#
my @fields = qw(name lastname gender birthday disease);

#
# open new processed list of names or exit
#
my $new_patient_names = 'processed_names.list';
open my $out_file, '>', $new_patient_names or die "Could not open file $new_patient_names: $!\n";

#
# save only patients with cancer. Since the word Cancer can be found misspelled as Caner
# here's the opportunity how to handle misspells as well.
#
for my $record (keys %patients) {
    if ($patients{$record}{'disease'} =~ /^Canc?er/i) {
      print $out_file join (" ", @{$patients{$record}}{@fields}) . "\n";
    }
}
close $out_file;

Output:

Code:
$ cat processed_names.list
Mohamed Buchannon M 07/24/91 Caner
Mao Schleich M 04/17/60 Cancer
Celena Standard F 01/24/94 Cancer

Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Dynamically parse BibTeX and create hash of hash wakatana Shell Programming and Scripting 0 12-07-2012 08:51 AM
Compare values of hashes of hash for n number of hash in perl without sorting. asak Shell Programming and Scripting 1 10-10-2012 04:25 PM
store information in hash and display number of keys. veerubiji Programming 0 10-11-2011 11:07 AM
Print Entire hash list (hash of hashes) Alalush Shell Programming and Scripting 1 08-06-2008 08:40 AM
When a file is created where does unix store the info? goodmis UNIX for Advanced & Expert Users 5 02-04-2007 12:16 AM



All times are GMT -4. The time now is 12:41 PM.