Perl, open multiple files with wildcards


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl, open multiple files with wildcards
# 1  
Old 06-11-2010
Perl, open multiple files with wildcards

I have a question regarding Perl scripting.

If I want to say open files that all look like this and assign them to a filehandle and then assign the filehandle to a variable, how do I do this?

The file names are

strand1.fa.gz.tmp
strand2.fa.gz.tmp
strand3.fa.gz.tmp
strand4.fa.gz.tmp
...
...
strand19.fa.gz.tmp

I want to assign them to file handles similar to their name
strand1.fa.gz.tmp.fh
strand2.fa.gz.tmp.fh
...
...
etc

then I want to set variables so its

$strand1=<strand1.fa.gz.tmp.fh>
...
...

So far I have this

Code:
#! /usr/bin/perl

@files=glob("*.tmp");

foreach $data (@files) {
        open($data.fh, "<$data") || die "failed $data\n";
}

I don't know if using $data.fh is right or not. Should I push the filehandles into an array (is that possible)?

I want to do this so I don't have to open the file 19 times by writing 19 lines and then writing an additional 19 lines to assign them to a variable.

Thanks.

Last edited by pludi; 06-11-2010 at 03:35 PM.. Reason: code tags, please...
# 2  
Old 06-11-2010
Do all 19 need to be open simultaneously? Can you revisit your algorithm or what you are trying to do? Since ultimately the results of your 19 file handles are ending up in memory (with your statement of $strand1=<strand1.fa.gz.tmp.fh>), why not just loop through the 19 files, read them into memory and set up appropriate data structures in memory?

As of Perl 5.6, you can pass an uninitialized reference (a scalar value) as a file handle and Perl will return a reference to the filehandle if the file can be opened. Read the section on indirect file handles in Perldoc Open Tutorial.

In theory, you can loop through your file names, pass an uninitialized scalar to open, and push the result onto a list. There may be issues with this in terms of maximum number of handles per process or total open files on your platform.

Alternatively, if you want to go the Perl OO route, you can use IO::Handle.

A third alternative (in Perl, there is ALWAYS other ways to do it!) is to directly insert references in the symbol table that are associated with your file handle. This method was common before Perl 5.6 but is deprecated now (ie, don't do it unless you really know what you are doing and it may not work in the next version of Perl). This method is outlined by Randall Schwartz in an article here.

Several things to note on your code snippet:
  • Use strict and use warnings;
  • Use the 3 argument form of Open;
  • Use Perl failure code to see why the file did not open;
  • Use lexically scoped file handles (my $fh) and you will not colide with other parts of your program with an open file. You will also not need to use a 'close' since Perl autocloses files as the handle goes out of scope.

If you move the globbing to the command line, you may be able to avoid the complexity. Perl, by default will open a file on the command line. So:

Code:
perl -nle [your script] *.tmp

Will handle all the dirty bits for you and you just need to focus on what you want done to the files. This does not work if you want all files open simultaneously. Look at Perldoc perlrun for the details of that.

Last edited by drewk; 06-11-2010 at 02:49 PM..
This User Gave Thanks to drewk For This Post:
# 3  
Old 06-11-2010
thanks for the reply, sorry for not being clear, all 19 files need to be opened simultaneously =/
# 4  
Old 06-11-2010
Quote:
Originally Posted by japaneseguitars
thanks for the reply, sorry for not being clear, all 19 files need to be opened simultaneously =/
What is it you want to do with the 19 files?
# 5  
Old 06-11-2010
Quote:
Originally Posted by drewk
What is it you want to do with the 19 files?
i have a file that contains information about genes and I need to obtain the sequences upstream and downstream these genes and do analysis on that. These 19 files are the chromosome sequences. I have sorted the gene file based on chromosome location so I can essentially use 1 chromosome file at a time since I just tried to open all of them and perl ran out of memory =/
# 6  
Old 06-11-2010
Quote:
Originally Posted by japaneseguitars
i have a file that contains information about genes and I need to obtain the sequences upstream and downstream these genes and do analysis on that. These 19 files are the chromosome sequences. I have sorted the gene file based on chromosome location so I can essentially use 1 chromosome file at a time since I just tried to open all of them and perl ran out of memory =/
So why do all 19 files need to be open if you are processing 1 file at a time?

Do you have simple logic and maybe the Perl experts here can help? ie, present pseudo code of what you want to do, size of files, output desired, intermediate processing needed, platform.
# 7  
Old 06-14-2010
Quote:
Originally Posted by drewk
So why do all 19 files need to be open if you are processing 1 file at a time?

Do you have simple logic and maybe the Perl experts here can help? ie, present pseudo code of what you want to do, size of files, output desired, intermediate processing needed, platform.
Hey drewk,

I realized I could process them once at a time especially after I ran out of memory, before, all I wanted to do was to open it all at once instead of writing switch or if-else statements to open all the files. thanks for all the help =)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep multiple patterns that contain wildcards

job_count=`grep -e "The job called .* has finished | The job called .* is running" logfile.txt | wc -l` Any idea how to count those 2 patterns so i have a total count of the finished and running jobs from the log file? If i do either of the patterns its works okay but adding them together... (8 Replies)
Discussion started by: finn
8 Replies

2. Shell Programming and Scripting

open files with multiple indexes

Hi, I want to work with multiple files which all contain 2 numbers. I tried to make a nested for loop but for some reason it doesn't recognize the $j as a number. The output is cannot open file `175-T-pvalue.xls'. How do I make sure that it takes the numbers from the inner loop as $j? ... (4 Replies)
Discussion started by: linseyr
4 Replies

3. Shell Programming and Scripting

Perl - work with open files or write to @lists first?

I am dealing will many thousand fairy small files. I need to search them for various matches and depending on what I find, may need to search some files again for additional matches. Generally speaking, is it better to write a txt file to an @array/@list and then work with it (multiple... (1 Reply)
Discussion started by: OldGaf
1 Replies

4. Shell Programming and Scripting

Perl - grep issue in filenames with wildcards

Hi I have 2 directories t1 and t2 with some files in it. I have to see whether the files present in t1 is also there in t2 or not. Currently, both the directories contain the same files as shown below: $ABC.TXT def.txt Now, when I run the below script, it tells def.txt is found,... (5 Replies)
Discussion started by: guruprasadpr
5 Replies

5. Programming

Control multiple program instances - open multiple files problem

Hello. This shouldn't be an unusual problem, but I cannot find anything about it at google or at other search machine. So, I've made an application using C++ and QtCreator. I 've made a new mime type for application's project files. My system (ubuntu 10.10), when I right click a file and I... (3 Replies)
Discussion started by: hakermania
3 Replies

6. Shell Programming and Scripting

rename multiple files with wildcards

Hi All I am having hundred over file in the below pattern. AA050101.INI BB090101.INI . . ZX980101.INI Need to rename these files with an extension .bak AA050101.INI.bak BB090101.INI.bak . . ZX980101.INI.bak (5 Replies)
Discussion started by: karthikn7974
5 Replies

7. Shell Programming and Scripting

perl script on multiple files

I have a script that runs on one file (at a time). like this: $> perl myscript.pl filename > output How can I run it on >6000 files and have the output sent out into slightly modified file name $> perl myscript 6000files> output6000files.new extension Thanks in anticipation (4 Replies)
Discussion started by: aritakum
4 Replies

8. Shell Programming and Scripting

Perl script to search and extract using wildcards.

Good evening All, I have a perl script to pull out all occurrences of a files beginning with xx and ending in .p. I will then loop through all 1K files in a directory. I can grep for xx*.p files but it gives me the entire line. I wish to output to a single colum with only the hits found. ... (3 Replies)
Discussion started by: CammyD
3 Replies

9. Shell Programming and Scripting

Grepping using multiple wildcards

Is there anyway you can grep using multiple wildcards? When I run the below line the results return fine; grep 12345 /usr/local/production/soccermatchplus/distributor/clients/*/out/fixtures.xml | awk -F/ '{print $8}' However ideally, I need it to grep for; grep 12345... (3 Replies)
Discussion started by: JayC89
3 Replies

10. Shell Programming and Scripting

using wildcards in this perl command

Hi there, is it possible to use wild cards in this statement ssh $remote_server 'perl -pi -e "s,EXP_SERIAL_19b8be67=\"\",EXP_SERIAL_`hostid`=\"UNKNOWN\"," /var/myfile' This command works fine but the bit in bold (the 8 character hostid) will not always be 19b8be67 so I was hoping I could... (2 Replies)
Discussion started by: hcclnoodles
2 Replies
Login or Register to Ask a Question