Comparing the minimum values of a character in lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing the minimum values of a character in lines
# 1  
Old 05-16-2013
Comparing the minimum values of a character in lines

Hello,

I have files as follows:

Code:
ACTGCCCTG
ACCGGCTCC
ACAAATTTC
ACCCGGGTT

I want to do the following:
I want to find certain strings in each line, for example CT and TT. Then I want the script to give me the number of the characters before my string, for example, 6 for the first line, 5 for the second line, 5 for the third line and 7 for the last line.
I then want it to give me the minimum of all numbers and cut all my lines from that number, which is here 5, so, it will be:

Code:
ACTGC
ACCGG
ACAAA
ACCCG

If you can help me with this in awk, it would be great.
# 2  
Old 05-16-2013
Looks like home work / class work!?
Code:
man awk

mentions the index function. Store it in variables, and do some comparison with a minimum variable.
Then you need a second pass through the file in order to cut it.
# 3  
Old 05-16-2013
We frequently get requests from biology researchers for things like this(ACTG being DNA codons), I don't see a problem.
# 4  
Old 05-16-2013
Here is an awk program that might help:
Code:
awk '
        BEGIN {
                A["CT"]
                A["TT"]
                min = 9
        }
        {
                S[++c] = $0
                for ( k in A )
                {
                        if ( $0 ~ k )
                                n = split ( $0, V, k )
                }
                if ( n > 2 )
                {
                        for ( i = 1; i <= ( n - 1 ); i++ )
                                L += length(V[i])
                        L += 2
                }
                if ( n == 2 )
                {
                        L = length (V[1])
                }
                printf "String: %s Characters Before: %d\n", $0, L
                min = ( min > L ) ? L : min
                L = 0
        }
        END {
                print "--------------------------------------"
                for ( i = 1; i <= c; i++ )
                        print substr(S[i], 1, min)
        }
' file

Produces following output:
Code:
String: ACTGCCCTG Characters Before: 6
String: ACCGGCTCC Characters Before: 5
String: ACAAATTTC Characters Before: 5
String: ACCCGGGTT Characters Before: 7
--------------------------------------
ACTGC
ACCGG
ACAAA
ACCCG

# 5  
Old 05-16-2013
Not sure why CT didn't match at char 2 on line 1, I'm assuming the first 3 chars are to be ignored when searching.

If the input file has a large number of records it should be faster if a array isn't populated with each line, I've used a simple two pass approach:

Code:
awk -vNEED=CT,TT '
BEGIN{split(NEED,N,",");min=999}
FNR==NR {
  for (need in N) {
    pos=index(substr($0,3),N[need])
    if (pos && pos < min) min=pos
  }
  next
}
{ print substr($0,0,min+1) }' infile infile

This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting and comparing values

I was trying to extract value of g1 and p1 only inside the tags where t1 is "Reading C (bytes)" and comparing them to make sure p1 is always less than g1. Here is the Json file I'm using - File:- { "g1" : 1482568, "n1" : "v_4", "p1" : 0, "s1" : "RC", "t1" : "LM", } { "g1" :... (3 Replies)
Discussion started by: Mannu2525
3 Replies

2. Shell Programming and Scripting

Find minimum and maximum values based on column with associative array

Hello, I need to find out the minimum and maximum values based on specific column, and then print out the entire row with the max value. Infile.txt: scf6 290173 290416 . + X_047241 T_00113118-1 scf6 290491 290957 . + X_047241 T_00113118-2 scf6 290898 290957 . + X_047241 T_00113119-3 scf6... (2 Replies)
Discussion started by: yifangt
2 Replies

3. Shell Programming and Scripting

Filter all the lines with minimum specified length of words of a text file

Hi Can someone tell me which script will work best (in terms of speed and simplicity to write and run) for a large text file to filter all the lines with a minimum specified length of words ? A sample script with be definitely of great help !!! Thanks in advance. :) (4 Replies)
Discussion started by: my_Perl
4 Replies

4. Shell Programming and Scripting

Output minimum and maximum values for replicates ID

Hi All I hope that someone could help me! I have an input file like this, with 4 colum(ID, feature1, start, end): a x 1 5 b x 3 10 b x 4 9 b x 5 16 c x 5 9 c x 4 8 And my output file should be like this: a x 1 5 b x 3 16 c x 4 9 What I would like to do is to output for each ID... (2 Replies)
Discussion started by: giuliangiuseppe
2 Replies

5. Programming

Computations using minimum values

I have the following code and into into trying to simplifying it. Any suggestions please? pmin = min (p(1), p(2), p(3), p(4), p(5), p(6)) ni = 0 xint = 0.0 yint = 0.0 zint = 0.0 !--------------------------------------------- ! if ((0.99999 * p(1)) <= pmin) then ... (3 Replies)
Discussion started by: kristinu
3 Replies

6. Shell Programming and Scripting

Print minimum and maximum values using awk

Can I print the minimum and maximum values of values in first 4 columns ? input 3038669 3038743 3037800 3038400 m101c 3218627 3218709 3217600 3219800 m290 ............. output 3037800 3038743 m101c 3217600 3219800 m290 (2 Replies)
Discussion started by: quincyjones
2 Replies

7. UNIX for Dummies Questions & Answers

Extract minimum values among 3 columns

Hi. I would like to ask for helps on extracting a minimum values among three columns using gawk in tab separator. input file: as1 10 20 30 as2 22 21 23 as3 300 391 567 as4 19 20 15 Output file: as1 10 as2 21 as3 300 as4 15 I am extremely appreciate your helps and comments.... (2 Replies)
Discussion started by: Amanda Low
2 Replies

8. Programming

Select several minimum values from row (MySQL)

Hello there. I've got the query like that SELECT count(tour_id) AS cnt FROM orders JOIN tours ON orders.tour_id=tours.id GROUP BY tour_id The result Is cnt 1 4 2 1 1 Now i have to select all records with minimum values in field "cnt" MySQL function min() returns only one.... (2 Replies)
Discussion started by: Trump
2 Replies

9. Shell Programming and Scripting

comparing values

i have two file and i am comparing both.. in cmp1 ,the content is : the nu of file is : <some integer value> in cmp2 ,the content is : the nu of file is : so want a script which will take value (2) when cmp1 is compared with cmp2.. i mean cmp cmp1 cmp2 the the output will be he nu of... (1 Reply)
Discussion started by: Aditya.Gurgaon
1 Replies

10. Shell Programming and Scripting

comparing two float values

I am trying to compare 2 float values using if the foll code does not work a=1.4 b=1.6 if test $a -gt $b then echo "$a is max" else echo "$b is max" fi does -gt work for floating point numbers, if not how do go about for my requirement? can i use bc ? pls help thanks in advance... (2 Replies)
Discussion started by: kavitha
2 Replies
Login or Register to Ask a Question