Help with a bash loop script Post: 303037350

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

loop does not execute in bash script?

I have a very basic bash shell script, which has many "while... done; for .... done" loop clauses, like the following ~~ #!/bin/bash while blablalba; do .... done < /tmp/file for line in `cat blablabla`; do grep $line /tmp/raw ; done > /tmp/1; while blablalba2; do .... done <...

2. Shell Programming and Scripting

error in bash script 'if' loop

SEND_MESSAGE=test echo $SEND_MESSAGE if then echo `date` > update_dt_ccaps.lst echo "The file transfer failed" >> update_dt_ccaps.lst SEND_MESSAGE=false fi The above code is showing error in bash shell as : ./test: line 5: [: test: integer expression expected ...

3. Shell Programming and Scripting

Whitespace in filenames in for loop in bash script

I'm trying to search all .odt files in a directory for a string in the text of the file. I've found a bash script that works, except that it can't handle whitespace in the filenames. #!/bin/bash if ; then echo "Usage: searchodt searchterm" exit 1 fi for file in $(ls *.odt); do ...

4. Shell Programming and Scripting

Getting error on for loop - bash script

Hi, I am working on bash script after a long time. I am getting error near done statement while running a for loop snippet. The error says "Syntax error near unexpcted token 'done'" please suggest what could be wrong. here is the snippet elements=${#option_arr} //an array of values...

5. Shell Programming and Scripting

Expect script called in loop from Bash Script

Having issues with an expect script. I've been scripting bash, python, etc... for a couple years now, but just started to try and use Expect. Trying to create a script that takes in some arguments, and then for now, just runs a pwd command(for testing, final will be command I pass). Here is...

6. Shell Programming and Scripting

Bash script - loop question

Hi Folks, I have a loop that goes through an array and the output is funky. sample: array=( 19.239.211.30 ) for i in "${array}" do echo $i iperf -c $i -P 10 -x CSV -f b -t 50 | awk 'END{print '$i',$6}' >> $file done Output: 19.239.211.30 19.2390.2110.3 8746886 seems that when...

7. Shell Programming and Scripting

While loop with input in a bash script

I have the following while loop that I put in a script, demo.sh: while read rna; do aawork=$(echo "${rna}" | sed -n -e 's/$...$\1 /gp' | sed -f rna.sed) echo "$aawork" | sed 's/ //g' echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/*$*$ $.*$/\2: \...

8. Shell Programming and Scripting

Loop through multiple files in bash script

Hi Everybody, I'm a newbie to shell scripting, and I'd appreciate some help. I have a bunch of .txt files that have some unwanted content. I want to remove lines 1-3 and 1028-1098. #!/bin/bash for '*.txt' in <path to folder> do sed '1,3 d' "$f"; sed '1028,1098 d' "$f"; done I...

9. Shell Programming and Scripting

How to use grep in a loop using a bash script?

Dear all, Please help with the following. I have a file, let's call it data.txt, that has 3 columns and approx 700,000 lines, and looks like this: rs1234 A C rs1236 T G rs2345 G T Please use code tags as required by forum rules! I have a second file, called reference.txt,...

10. UNIX for Beginners Questions & Answers

Help with date in bash script for loop from YYYYMMDDHHMM

Hi everyone I need some help I want to create an script which does some processing it takes the two arguments 201901010000 and 201901020200 - so YYYMMDDHHMM I want to split processing into hours from start until end, I dont get why this works but when I add to a future variable...

LEARN ABOUT DEBIAN

mlv-smile

SMILE(1)						      General Commands Manual							  SMILE(1)

NAME

       mlv-smile - inference of structured signals in multiple sequences

SYNOPSIS

       mlv-smile <parameter_file>
       mlv-smile [-g number]

DESCRIPTION

       This  manual page documents briefly the mlv-smile command.  For more details and example, you should have a look to the documentation files
       installed with it.

       mlv-smile is a program that was primarily made to extract promoter sequences from DNA sequences. The interest of this program is  to  infer
       simultaneously several motifs (called boxes) that respects distance constraints. The user has to write in a parameter_file the list of cri-
       teria that he wants the signal to respect. In a first step of extraction, all signals respecting these criteria are  found.   In  a  second
       step,  they are all statistically evaluated, aiming to detect the ones that are exceptionally represented in the original sequences.  Since
       the 1.4 version mlv-smile allows one to extract such signals on any alphabet in any kind of sequences.

OPTIONS

       The program usually waits for a parameter file that contains all the criteria needed. The only option is:

       -g number
	      produces on the standard output a generic parameter file to extract number boxes signals.

HOW TO

   How to use mlv-smile?
       The only command you'll use is 'mlv-smile'. You have to give it just one parameter, which is the name of a  parameters  file  which  should
       contain the characteristics of the motifs you want to extract.

   How to start?
       You first have to write an alphabet file, which contains the alphabet used to describe the motifs. Then you have to write a parameter file,
       and you're ready to use mlv-smile.

   What should I write in the alphabet file?
       The first line should contain the type of the alphabet's elements, to choose between "Nucleotides", "Proteins", or  "Others".  This  is	to
       allow  mlv-smile  to  change, for instance, the "A or G" symbol into an R in DNA sequences.  Then, on each line, you have to write the ele-
       ments of the motifs's alphabet.

       Example: if you want to extract simple motifs (A,C,G,T) from clean DNA sequences written with a four letters alphabet (A,C,G,T),  then  you
       may write an alphabet file containing:
	   Type:Nucleotides
	   A
	   C
	   G
	   T
       Let's call this file 'alpha'.

   How to write a simple parameter file?
       You  have to first write an alphabet file. You also need a sequence file, at the FASTA format. Then, you can create a parameter file, using
       the "mlv-smile -g number_of_boxes" command to help you.

       Example: Let's write a parameter file to extract simple motifs. If you don't already have one, let's first create a small DNA file in FASTA
       format, containing several sequences:

	   > Seq A
	   AGGCTAGTCAGGGCATGCGATCAGCAGGCATCAGGCGAGCATCGACAGCA
	   > Seq B
	   GGAGAGCGCAGAGCGAGCATCATCATGCAGCATCAGAGATCTTTCT
       Let's call this file 'seq'.

       Our  purpose is now to extract from these sequences all motifs of length 13 that appears at least one time in 100% of the sequences, allow-
       ing one substitution.  We may write the following parameter file (helped with the 'mlv-smile -g 1' command):
	   FASTA file	       seq     // previously created
	   Output file	       results

	   Alphabet file       alpha   //previously created
	   Quorum	       100
	   Total min length    13
	   Total max length    13
	   Total substitutions 1
	   Boxes	       1
       Let's call this file 'param'.

   How to extract a simple motif?
       You can launch  "mlv-smile" after having created the alphabet and parameter files.

       Example: With the previous alphabet, sequences and parameter files, you can now launch mlv-smile: "mlv-smile param". You  will  obtain  the
       following motifs in the "results" file:
	   GCGAGCATCAACA 2120210310010 2
	   Seq	   1   Pos    12
	   Seq	   0   Pos    34
	   2
	   GCGAGCATCGTCA 2120210312310 2
	   Seq	   1   Pos    12
	   Seq	   0   Pos    34
	   2
       The  first  motif  found,  GCGAGCATCAACA,  appears at position 12 in the second sequence and position 34 in the first one (all positions or
       sequences counts starts at zero).

   How to evaluate the significance of the motifs found?
       You have to add some evaluation lines at the end of the parameter file.

       Example: At the bottom of the previous "param" parameter file, you can add:
	   Shufflings  100
	   Size k-mer  2
       which means that the original sequences will be shuffled 100 times, conserving dinucleotides. The significance of the motifs  found  previ-
       ously  will  be computed from their frequency of apparition in the shuffled sequences. The more number of shuffling you do, the more stable
       are the results, but it's longer to compute.

       For this example, you may find such results (in the "results.shuffle"):
	   STATISTICS ON THE NUMBER OF SEQUENCES HAVING AT LEAST ONE OCCURRENCE
	   Model	  %right  #right %shfl. #shfl. Sigma Chi2 Z-score
	   ==============================================================
	   GCGAGCATCGTCA  100.00%    2	 0.50%	 0.01  0.10  3.96   19.90
	   GCGAGCATCAACA  100.00%    2	 1.00%	 0.02  0.14  3.92   14.07

	   STATISTICS ON THE TOTAL NUMBER OF OCCURRENCES
	   Model	    #right  #shfl. Sigma   Chi2    Z-score
	   =======================================================
	   GCGAGCATCGTCA	2   0.01   0.10    1.99    19.90
	   GCGAGCATCAACA	2   0.02   0.14    1.96    14.07

       The first block of results shows the statistics on the number of sequences having at least one occurrence. You can  read,  for  each  motif
       found,  the  frequency  of apparition in the original and shuffled sequences, and two statistical scores (Chi2 and Z-score) deduced. Motifs
       are sorted according to the highest Z-scores. A high Z-score means that the motif appears in a surprising way in the original sequences.

   How to extract structured motifs?
       The parameter file should be modified to indicate the characteristics of the structured motifs to infer. You have to write  global  parame-
       ters for the whole motif, and local parameters for each box of it.

       Example:  Let's	extract from the previous "seq" sequences structured motifs composed of 2 boxes of length 5 to 6, but the whole motif must
       have a length 11. The two boxes may be separated by 10 to 15 nucleotides. You allow at most one substitution in each box, and at least  one
       occurrence of a motif must appear in 100% of the sequences, you may write the following parameter file:
	   FASTA file	       seq
	   Output file	       results

	   Alphabet file       alpha
	   Quorum	       100
	   Total min length    11
	   Total max length    11
	   Total substitutions 2
	   Boxes	       2

	   BOX 1 ================
	   Min length	       5
	   Max length	       6
	   Substitutions       1
	   Min spacer length   10
	   Max spacer length   15

	   BOX 2 ================
	   Min length	       5
	   Max length	       6
	   Substitutions       1

PARAMETER FILE CRITERIA

       FASTA File <filename>
	      The name of the file which contains the sequences to use for inference.  These sequences must be at the FASTA format. This file must
	      contain at least two sequences, as you cannot detect motifs which are common to several sequences in one sequence!

       Output file <filename>
	      The name of the file where results of extraction will be written.

       Alphabet file <filemane>
	      The name of the file where you have to tell mlv-smile on which alphabet it will infer motifs. The first line of this  file  contains
	      "Type:"  followed by the type of symbols you use, to choose between "Nucleotides", "Proteins" or "Others". Then, on each line of the
	      file, must be written the symbols of the sequence that may be matched by a symbol of a motif. A line  containing	"ANR"  means  that
	      there  is  a symbol in the motif's alphabet which matches A, N or R in the sequences. If Type is defined with Nucleotides, mlv-smile
	      will change this ANR symbol into an A to make it more readable. These associations will be printed at the beginning  of  the  execu-
	      tion.

       Quorum <number>
	      The  percentage of sequences where at least one occurrence of a motif must appear to make it valid. 100 means that a motif must have
	      occurrences in every sequences.

       Total min length <number>
	      The minimal length of the whole motif, i.e. the sum of minimal lengths of each box. Warning: the length of the  gaps  between  boxes
	      mustn't me taken into account. The total minimal length may differ of the sum of boxs's minimal length: you can, for instance, infer
	      motifs made of two boxes, with min length of boxes equals to 4 and a total min length equals to 10.

       Total max length <number>
	      Same explanation as "Total min length", excepted that a 0 length means "infinity".

       Total substitutions <number>
	      Total maximum number of substitutions for the whole motif. As for the total length, this is not necessarily the sum  of  each  box's
	      substitution number.

       Boxes <number>
	      The  number of boxes that compose the motifs to infer.  When inferring simple one box motifs, it's not necessary to use local crite-
	      ria as global and local criteria will be the same.

       Composition in <symbol> <number> [OPTIONAL]
	      The number of a given symbol of the motif's alphabet may be restrained to a maximum by this criteria.

       BOX <number>
	      Begin the description of the criteria of a given box of the motif.

       Min length <number>
	      Minimum length for the current box.

       Max length <number>
	      Same explanation as "Min length", excepted that a 0 length means "infinity".

       Substitution <number>
	      Maximum number of substitutions allowed for the current box.

       Composition in <symbol> <number> [OPTIONAL]
	      Same as the global composition, but for the current box.

       Min spacer length <number>
	      Minimum number of symbols between the end of the current box and the beginning of the next one. This parameter mustn't appear in the
	      last box's criteria, which has no next box!

       Max spacer length <number>
	      Same explanation as "Max spacer length".

       Delta <number>  [OPTIONAL]
	      This  criteria allows one to infer motifs composed of several boxes without really knowing the distance between these boxes. The min
	      and max spacer length will be used as a "large" interval, and the delta's value will define the size of small  intervals	into  this
	      large one. An inference of two boxes motifs with a [10-20] range of distance between the boxes will produce motifs whose occurrences
	      respect this range. A "Delta" criteria fixed to 2, for instance, will realize the same inference in  all	the  possible  ranges  [i-
	      delta, i+delta] (here: [10-14], [11-15], ...). As many output files as different ranges will be produced.

       Palindrome of box <number>  [OPTIONAL]
	      Indicate that the concerned box must be the biological palindrome of one of the previous boxes.

       Shufflings <number> [OPTIONAL]
	      The  number  of  shufflings  of  the  original sequences to realize for the evaluation of the statistical significance of the motifs
	      found.

       Size k-mer <number> [OPTIONAL, always with shuffling]
	      Length of the words to conserve during shufflings (usually 2).

       Against wrong sequences <filename> [OPTIONAL]
	      Another method to evaluate the significance of the motifs (not compatible with the shuffling method). In the case where you  have  a
	      sequence file where you believe that the motifs you look for in the first sequences set won't appear, you can give to mlv-smile such
	      a sequence file. The statistical evaluation of motifs found will be made by computing theit frequency in the "wrong sequences".

WARNING

       mlv-smile is an exact combinatorial algorithm. It is not made to infer any kind of motifs. The amount of data where the extraction is  made
       can  be	very large, but some criteria (in particular the number of substitutions) must be restrained to reasonable values: one or two sub-
       stitutions allowed in a 10 length motif is ok, but not 6 or 8 substitutions. The notion of spacers is made to avoid the use of to much sub-
       stitutions.

BUGS

       A  bug has been found in the 1.46 version, which could generate wrong results in some particular cases. In particular, results may be wrong
       for incoherent length criteria.	There are still probably a lot of bugs in mlv-smile. This 1.47 version is quite stable, but do	not  hesi-
       tate to report any bug to <lama AT prism.uvsq DOT fr>.

SEE ALSO

       This software has been implemented from an algorithm proposed in

       L. Marsan and M.-F. Sagot, Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site
       consensus identification", J. of Comput. Biol. 7, 2001, 345-360

       You should refer to these paper for algorithmic details. If bored by such things, just notice that the  extraction  step  of  mlv-smile	is
       exact, which means that all motifs respecting the given criteria are found.  Please quote this article if you produce some results given by
       mlv-smile.

       For some examples of applications we made on biological datas (with good results), refer to

       A. Vanet and L. Marsan and M.-F. Sagot,"Promoter sequences and algorithmically methods for identifying them", Research in Microbiology 150,
       1999, 779-799

       and

       A. Vanet and L. Marsan and A. Labigne and M.-F. Sagot, Inferring regulatory elements from a whole genome. An application to the analysis of
       genome of Helicobacter Pylori Sigma 80 family of promoter signals", J. Mol. Biol. 297, 2000, 335-353

AUTHOR

       This manual page was written by	Laurent Marsan <lama AT prism.uvsq DOT fr>, for the Debian GNU/Linux system (but may be used by others).

																	  SMILE(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

loop does not execute in bash script?

Discussion started by: fedora

2. Shell Programming and Scripting

error in bash script 'if' loop

Discussion started by: DILEEP410

3. Shell Programming and Scripting

Whitespace in filenames in for loop in bash script

Discussion started by: triplemaya

4. Shell Programming and Scripting

Getting error on for loop - bash script

Discussion started by: arundhati_s