The KOG*_final.fasta files look like the example below. There is a one-line header that always begins with a greater-than sign, has a 3-4 letter species abbreviation, and a sequence identifier. The next line contains the corresponding amino-acid sequence which is always on one line and doesn't wrap no matter how long it is.
I'm trying to write a script that will go through each of these files and check them to see if they meet certain criteria. For example, I want to move all files containing fewer than 10 greater-than signs (fewer than 10 sequences) into a "trash" folder. I've played around using if and grep -c \> for this part but I haven't figured it out yet. Is there a better way to go about this?
I'd also like to trash any files that have more than 1 sequence for any one species (although I'd like to be able to vary this number if it turns out that is too strict). Would I have to use an array for this? Or another file that specifies all of the taxon names?
Thanks!
Kevin
---------- Post updated 11-06-09 at 04:29 PM ---------- Previous update was 11-05-09 at 06:38 PM ----------
I figured out the first filter:
Code:
for FileName in *.fa
do
sequences=`grep -c \> $FileName`
cutoff=6
echo $FileName $sequences
if [ "$sequences" -lt "$cutoff" ] ; then
printf "Too few sequences in file $FileName"
mv $FileName ./rejected_few_seq/
fi
done
I'm having trouble figuring out the other part. Here's what I've got so far:
Code:
for FileName in *.fa
do
grep -c ACAL_ $FileName >> taxon_count.txt
grep -c HROB_ $FileName >> taxon_count.txt
(...repeated for all species abbreviations)
?
done
I am trying to figure out how to add all the values put into the taxon_count.txt file and remove $FileName if that value is smaller than a desired value. I'd also like to set a max value for number of sequences per taxon and if that is exceeded, remove #FileName. Any guidance would be greatly appreciated.
I have a directory with permissions set 777, and some gumby has dumped a bunch of files and directories in there.
I don't own the culprit files or directories, but do own the containing directory - Is there some way I can delete this other user's files?
The other interesting thing is that... (5 Replies)
I have a directory full of text data files.
Unfortunately I need to get rid of the 7th and 8th line from them all so that I can input them into a GIS application.
I've used an awk script to do one at a time but due to the sheer number of files I need some kind of loop mechanism to automate... (3 Replies)
There are a lot of ways to extract text from between two strings, but what if those strings occur multiple times and you only want the text from the first two strings? I can't seem to find anything to work here. I'm using sed to process the text after it's extracted, so I prefer a sed answer, but... (4 Replies)
Hello!
I need to delete one line in a file which matches one very precise instance of a string only. When searching the forum I unfortunately only found a solution which would delete each line on which a particular string occurs.
Let's assume I have a file composed of thousands of lines... (4 Replies)
Hi all
I have two files X.txt and Y.txt. The file format of X.txt is :
madras is also the fountainhead of the theosophical movement which spread worldwide .
and second file Y.txt is of the format:
madra|s|nsubj is|cop also|advmod the|det fountainhead|empty of|prep the|det... (3 Replies)
Hi all,
I am still learning my way around unix commands and I have the following question.
I have a website and I want to search for all the html pages that don't contain a certain js file. The file I am searching for is located under /topfolder/js/rules.js . So I assume in my grep search I... (5 Replies)
I need to be able to search for a beginning line header, then use grep or something else to get the very next instance of a particular string, which will ALWAYS be in "Line5". What I have is some data that appears like this:
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line1
Line2
...... (4 Replies)
Me and a friend are working on a project, and We have to create a script that can go into a file, and replace all occurances of a certain expression/word/letter with another using Sed. It is designed to go through multiple tests replacing all these occurances, and we don't know what they will be so... (1 Reply)
So I want to skim through all folders (ongoing from the curr dir) and delete all files that contain the string:
"in conflikt standing copy".
Is this possible WITH DOS ? (1 Reply)
Discussion started by: pasc
1 Replies
LEARN ABOUT ULTRIX
refile
refile(1mh)refile(1mh)Name
refile - file message in other folders
Syntax
refile [ msgs ] [ +folder ] [ options ]
Description
Use the command to move the specified message from the current folder to another folder. You can refile messages in more than one folder
by giving multiple folder names as arguments.
If you do not specify a message, the current message is refiled. You can refile a message other than the current message by giving its
number as a msgs argument. You can also refile more than one message at a time by specifying more than one message number, or a range of
message numbers, or a message sequence. See for more information on sequences.
The current folder remains the same unless the -src option is specified; in that case, the source folder becomes current. Normally, the
last message specified becomes the current message. However, if the -link option is used, the current message is not changed.
If the Previous-Sequence: entry is set in the file, in addition to defining the named sequences from the source folder, will also define
those sequences for the destination folders. See for information concerning the previous sequence.
Options-draft Refiles the draft message, or the current message in your folder, if you have one set up. You cannot give a msgs argument when
you use this option.
-file filename
Moves a file into a folder. This option takes a file from its directory and places it in the named folder, as the next message
in the folder. The file must be formatted as a legal mail message. This means that the message must have the minimum header
fields separated from the body of the message by a blank line or a line of dashes.
-help Prints a list of the valid options to this command.
-link
-nolink Keeps a copy of the message in the source folder. Normally, removes the messages from the original folder when it refiles them.
The -link option keeps a copy in the original folder, as well as filing a copy in the new folder.
-preserve
-nopreserve
Preserves the number of a message in the new folder. Normally, when a message is refiled in to another folder, it is set to the
next available number in that folder. The -preserve option keeps the number of the message the same in the new folder as it had
been in the old.
You cannot have two messages with the same number in one folder, so you should use this option with care.
-src +folder
Specifies the source folder to take messages from. Normally, messages are refiled from the current folder into another folder.
However, you can take messages from a different folder by using the -src +folder option to specify the alternative source folder.
Examples
The following example refiles messages 3 and 5 in the folder
% refile 3 5 +records
The next example files the current message into two folders:
% refile +jones +map
The next example takes message 13 in the current folder and refiles it in the folder. The message remains in the current folder as well as
appearing in the folder.
% refile -link 13 +test
The next example takes a message from the folder when it is not the current folder, and places it in the folder
% refile 3 -src +test +outbox
Profile Components
Path: To determine your Mail directory
Folder-Protect: To set protections when creating a new folder
rmmproc: Program to delete the message
Files
The user profile.
See Alsofolder(1mh), mark(1mh), mh_profile(5mh)refile(1mh)