11-04-2009
Deleting files that don't contain particular text strings / more than one instance of a string
Hi all,
I have a directory containing many subdirectories each named like KOG#### where # represents any digit 0-9. There are several files in each KOG#### folder but the one I care about is named like KOG####_final.fasta. I am trying to write a script to copy all of the KOG####_final.fasta files to the same directory and then apply some filters to them.
For the filters, I want to go through each of the KOG####_final.fasta files and remove any of them that don't contain at least 10 different text strings that are specified in a text file or somewhere in the script. I'd also like to have a filter that removes files that have more than one instance of any one string.
I know this is a lot but I'm really stumped as to where to start on this one. Any assistance in getting started with this would be much appreciated!
Thanks!
Kevin
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have a directory with permissions set 777, and some gumby has dumped a bunch of files and directories in there.
I don't own the culprit files or directories, but do own the containing directory - Is there some way I can delete this other user's files?
The other interesting thing is that... (5 Replies)
Discussion started by: kumachan
5 Replies
2. Shell Programming and Scripting
I have a directory full of text data files.
Unfortunately I need to get rid of the 7th and 8th line from them all so that I can input them into a GIS application.
I've used an awk script to do one at a time but due to the sheer number of files I need some kind of loop mechanism to automate... (3 Replies)
Discussion started by: vrms
3 Replies
3. Shell Programming and Scripting
There are a lot of ways to extract text from between two strings, but what if those strings occur multiple times and you only want the text from the first two strings? I can't seem to find anything to work here. I'm using sed to process the text after it's extracted, so I prefer a sed answer, but... (4 Replies)
Discussion started by: fubaya
4 Replies
4. Shell Programming and Scripting
Hello!
I need to delete one line in a file which matches one very precise instance of a string only. When searching the forum I unfortunately only found a solution which would delete each line on which a particular string occurs.
Let's assume I have a file composed of thousands of lines... (4 Replies)
Discussion started by: Black Sun
4 Replies
5. Shell Programming and Scripting
Hi all
I have two files X.txt and Y.txt. The file format of X.txt is :
madras is also the fountainhead of the theosophical movement which spread worldwide .
and second file Y.txt is of the format:
madra|s|nsubj is|cop also|advmod the|det fountainhead|empty of|prep the|det... (3 Replies)
Discussion started by: my_Perl
3 Replies
6. UNIX for Dummies Questions & Answers
Hi all,
I am still learning my way around unix commands and I have the following question.
I have a website and I want to search for all the html pages that don't contain a certain js file. The file I am searching for is located under /topfolder/js/rules.js . So I assume in my grep search I... (5 Replies)
Discussion started by: SyphaX
5 Replies
7. UNIX for Dummies Questions & Answers
Hi,
I have a space delimited text file that looks like the following:
250 rs10000056 0.04 0.0888 4 189321617
250 rs10000062 0.05 0.0435 4 5254744
250 rs10000064 0.02 0.2403 4 127809621
250 rs10000068 0.01 NA
250 rs1000007 0.00 0.9531 2 237752054
250 rs10000081 0.03 0.1400 4 17348363... (5 Replies)
Discussion started by: evelibertine
5 Replies
8. Shell Programming and Scripting
I need to be able to search for a beginning line header, then use grep or something else to get the very next instance of a particular string, which will ALWAYS be in "Line5". What I have is some data that appears like this:
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line1
Line2
...... (4 Replies)
Discussion started by: Akilleez
4 Replies
9. Homework & Coursework Questions
Me and a friend are working on a project, and We have to create a script that can go into a file, and replace all occurances of a certain expression/word/letter with another using Sed. It is designed to go through multiple tests replacing all these occurances, and we don't know what they will be so... (1 Reply)
Discussion started by: Johnny2518
1 Replies
10. Windows & DOS: Issues & Discussions
So I want to skim through all folders (ongoing from the curr dir) and delete all files that contain the string:
"in conflikt standing copy".
Is this possible WITH DOS ? (1 Reply)
Discussion started by: pasc
1 Replies
LEARN ABOUT DEBIAN
bio::alignio::fasta
Bio::AlignIO::fasta(3pm) User Contributed Perl Documentation Bio::AlignIO::fasta(3pm)
NAME
Bio::AlignIO::fasta - fasta MSA Sequence input/output stream
SYNOPSIS
Do not use this module directly. Use it via the Bio::AlignIO class.
DESCRIPTION
This object can transform Bio::SimpleAlign objects to and from fasta flat file databases. This is for the fasta alignment format, not for
the FastA sequence analysis program. To process the alignments from FastA (FastX, FastN, FastP, tFastA, etc) use the Bio::SearchIO module.
FEEDBACK
Support
Please direct usage questions or support issues to the mailing list:
bioperl-l@bioperl.org
rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address
it. Please include a thorough description of the problem with code and data examples if at all possible.
Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the
web:
https://redmine.open-bio.org/projects/bioperl/
AUTHORS
Peter Schattner
APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
next_aln
Title : next_aln
Usage : $aln = $stream->next_aln
Function: returns the next alignment in the stream.
Returns : Bio::Align::AlignI object - returns 0 on end of file
or on error
Args : -width => optional argument to specify the width sequence
will be written (60 chars by default)
See Bio::Align::AlignI
write_aln
Title : write_aln
Usage : $stream->write_aln(@aln)
Function: writes the $aln object into the stream in fasta format
Returns : 1 for success and 0 for error
Args : L<Bio::Align::AlignI> object
See Bio::Align::AlignI
_get_len
Title : _get_len
Usage :
Function: determine number of alphabetic chars
Returns : integer
Args : sequence string
width
Title : width
Usage : $obj->width($newwidth)
$width = $obj->width;
Function: Get/set width of alignment
Returns : integer value of width
Args : on set, new value (a scalar or undef, optional)
perl v5.14.2 2012-03-02 Bio::AlignIO::fasta(3pm)