String condensation


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers String condensation
# 1  
Old 05-20-2012
String condensation

Hello,

if this input:

Code:
gff     art ex    2833    2966    .       +       .       ID=A_172736
gff     art ex    2976    3165    .       +       .       ID=A_172736
gff     art ex    3195    3941    .       +       .       ID=A_713953
gff     art ex    8615    8753    .       +       .       ID=A_172736


that has several strings that repeat in several lines scattered randomly around the file, how can one generate this output

Code:
gff     art ex    2833    8753    .       +       .       ID=A_172736
gff     art ex    3195    3941    .       +       .       ID=Ag_713953

Notice how the string ID=A_172736 only repeats once and only the smallest value of column 3 for that string and the largest value of column 4 of that string is printed in the output in the same format. How can this be done?

thanks
# 2  
Old 05-20-2012
Do you need to retain the spaces? Or can the output be like this:
Code:
gff art ex 2833 8753 . + . ID=A_172736
gff art ex 3195 3941 . + . ID=Ag_713953

?
# 3  
Old 05-20-2012
That output is fine
# 4  
Old 05-20-2012
Try:
Code:
awk '{min[$9]=min[$9]?min[$9]:$4;min[$9]=$4<min[$9]?$4:min[$9];max[$9]=$5>max[$9]?$5:max[$9]}END{for (i in max) {$4=min[i];$5=max[i];$9=i;print}}' file

Shorter:
Code:
awk '{m[$9]=m[$9]?m[$9]:$4;m[$9]=$4<m[$9]?$4:m[$9];M[$9]=$5>M[$9]?$5:M[$9]}END{for (i in M) {$4=m[i];$5=M[i];$9=i;print}}' file

# 5  
Old 05-20-2012
thank you so much. Can you explain what you did?
# 6  
Old 05-20-2012
I've broken the first awk posted into individual statements with comments; hope it helps explain things.

Code:
# create two hashes (min and max) based on the contents of the nineth field ($9)
awk '
    {       # this block of code executed for each input record
        min[$9] = min[$9] ? min[$9] : $4;       # capture field 4 as initial default if min is not set
                                                # this assumes that values in field 4 will always be non-zero

        # max does not need to be initialised as it defaults to zero.  (provided field 5 values are always >= 0)

        min[$9] = $4 < min[$9] ? $4 : min[$9];  # capture current field 4 if it is smaller than minimum
        max[$9] = $5 > max[$9] ? $5 : max[$9]   # capture field 5 if larger than maximum
    }

    # this code block executed after last input record  has been read and processed above
    # the assumpion that the last input record is still in the buffer (might not be true for old awk versions, 
    # but this is not likely to be an issue.  The programme also assumes that the only changing fields
    # in the input data are fields 4, 5, and 9.
    END {
        for (i in max)  # for each value of field 9 that was observed
        {
            $4=min[i];      # substitute the min and max values and field 9 into the current record.
            $5=max[i];
            $9=i;
            print           # print the record with the substituted values.
        }
    }'

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search a string and display its location on the entire string and make a text file

I want to search a small string in a large string and find the locations of the string. For this I used grep "string" -ob <file name where the large string is stored>. Now this gives me the locations of that string. Now how do I store these locations in a text file. Please use CODE tags as... (7 Replies)
Discussion started by: ANKIT ROY
7 Replies

2. Shell Programming and Scripting

awk string comparison unterminated quoted string andrule of thumb

I have the logic below to look up for matches within the columns between the two files with awk. In the if statement is where the string comparison is attempted with == The issue seems to be with the operands, as 1. when " '${SECTOR}' " -- double quote followed by single quote -- awk matches... (1 Reply)
Discussion started by: deadyetagain
1 Replies

3. Shell Programming and Scripting

Remove lines between the start string and end string including start and end string Python

Hi, I am trying to remove lines once a string is found till another string is found including the start string and end string. I want to basically grab all the lines starting with color (closing bracket). PS: The line after the closing bracket for color could be anything (currently 'more').... (1 Reply)
Discussion started by: Dabheeruz
1 Replies

4. Shell Programming and Scripting

grep exact string from files and write to filename when string present in file

I am attempting to grep an exact string from a series of files within a directory and append that output to the filename when it is present in the file. I've been after this all day with no luck. Thanks for your help in advance :wall:. (4 Replies)
Discussion started by: JC_1
4 Replies

5. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

6. UNIX for Dummies Questions & Answers

Comparing a String variable with a string literal in a Debian shell script

Hi All, I am trying to to compare a string variable with a string literal inside a loop but keep getting the ./testifstructure.sh: line 6: #!/bin/sh BOOK_LIST="BOOK1 BOOK2" for BOOK in ${BOOK_LIST} do if then echo '1' else echo '2' fi done Please use next... (1 Reply)
Discussion started by: daveu7
1 Replies

7. Shell Programming and Scripting

to extract string from main string and string comparison

continuing from my previous post, whose link is given below as a reference https://www.unix.com/shell-programming-scripting/171076-shell-scripting.html#post302573569 consider there is create table commands in a file for eg: CREATE TABLE `Blahblahblah` ( `id` int(11) NOT NULL... (2 Replies)
Discussion started by: vivek d r
2 Replies

8. Shell Programming and Scripting

replace (sed?) a string in file with multiple lines (string) from variable

Can someone tell me how I can do this? e.g: a=$(echo -e wert trewt ertert ertert ertert erttert erterte rterter tertertert ert) How do i replace the STRING with $a? I try this: sed -i 's/STRING/'"$a"'/g' filename.ext but this don' t work (2 Replies)
Discussion started by: jforce
2 Replies

9. Shell Programming and Scripting

search string in a file and retrieve 10 lines including string line

Hi Guys, I am trying to write a perl script to search a string "Name" in the file "FILE" and also want to create a new file and push the searched string Name line along with 10 lines following the same. can anyone of you please let me know how to go about it ? (8 Replies)
Discussion started by: sukrish
8 Replies

10. Shell Programming and Scripting

Search, replace string in file1 with string from (lookup table) file2?

Hello: I have another question. Please consider the following two sample, tab-delimited files: File_1: Abf1 YKL112w Abf1 YAL054c Abf1 YGL234w Ace2 YKL150w Ace2 YNL328c Cup9 YDR441c Cup9 YDR442w Cup9 YEL040w ... File 2: ... ABF1 YKL112W ACE2 YLR131C (9 Replies)
Discussion started by: gstuart
9 Replies
Login or Register to Ask a Question