print specific strings only


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting print specific strings only
# 1  
Old 09-30-2009
print specific strings only

Hello,
I have a file like this..
Code:
2    168611167    STK39    STK39    ---    27347    "serine threonine kinase 39 (STE20/SPS1 homolog, yeast)"    YES    SNP_A-2086192    rs16854601    0.001558882
6    13670256    SIRT5 /// RPS4X    SIRT5    ---    23408 /// 6191    "sirtuin (silent mating type information regulation 2 homolog) 5 (S. cerevisiae) /// ribosomal protein S4, X-linked"    YES    SNP_A-8405097    rs16874223    0.00156082
2    105439878    NCK2 /// FHL2    FHL2 /// NCK2    ---    8440 /// 2274    NCK adaptor protein 2 /// four and a half LIM domains 2    ---    SNP_A-2034891    rs41322544    0.001562043
12    80373503    PPFIA2    PPFIA2    ---    8499    "protein tyrosine phosphatase, receptor type, f polypeptide (PTPRF), interacting protein (liprin), alpha 2"    YES    SNP_A-8542673    rs17008588    0.001565901
15    41547066    TP53BP1 /// TP53BP1 /// TP53BP1    TP53BP1    ---    7158 /// 7158 /// 7158    tumor protein p53 binding protein 1 /// tumor protein p53 binding protein 1 /// tumor protein p53 binding protein 1    YES    SNP_A-1782700    rs1814538    0.001573326

I need to sort this file ascending on the last column.
Then, I need an output with two columns.
First col. with only the words that start with SNP_A and the the next column with the word found in the right of the column with SNP_A.

e.g output
SNP_A-2086192 rs16854601
SNP_A-8405097 rs16874223
Can you show me howto with awk?

Thanks for reading
# 2  
Old 09-30-2009
Does the file always have fixed number of fields ?

If yes get the number of fields and sort it , cut the reqd fields and parse it like

cat file | sort -k 12,12 | cut -f 10,11 | awk '{if ($1 ~ /SNP/) print $0}'

Changes the field columns accordingly

Cheers
# 3  
Old 09-30-2009
The number of fields are not equal and the delimiters for the fields are also not the same as in the example above.
# 4  
Old 09-30-2009
It would do you a world of good if you generate the file with the same number of fields as it would be easy for processing and understanding as well.
If for some record some field does not exist replace it with a null space and use a standard delimiter. Eg:- tab

Cheers
# 5  
Old 09-30-2009
For the sample given above. Will work with space as separator and with an variable number of fields:

Code:
parse.awk

Code:
{
    for(i=1;i<NF;i++){
        if ($i ~ /^SNP_A/){
            a_str[NR]=sprintf("%s %s",$i,$(i+1))
            a_val[NR]=$NF
            break
        }
}
}
END{
    for(i in a_str) print a_str[i],a_val[i]
}

That was for the parsing. Now the sort.

Code:
$ awk -f parse.awk yourFile | sort -k3,3



---------- Post updated at 08:00 AM ---------- Previous update was at 07:58 AM ----------

If you have GNU awk, you can do the sort in awk but it is a little be tricky (read: I hate gawk sort functions)

Code:
{
    for(i=1;i<NF;i++){
        if ($i ~ /^SNP_A/){
            a_str[$NF]=sprintf("%s %s",$i,$(i+1))
            break
        }
}
}
END{
    n=asorti(a_str, a_copy)
    for(i=1; i<=n; i++) print a_str[a_copy[i]]
}


Last edited by ripat; 09-30-2009 at 03:18 AM..
# 6  
Old 10-14-2009
Ripat,
Worked great!
Thank you so much
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract fields containing specific strings?

Hello I have a log file with thousands of lines like below Sep 21 13:02:52 lnxtst01 kernel: New TCP in: IN=eth0 OUT= MAC=00:1a:4b:50:b7:32:00:08:e3:ff:fc:04:08:00 SRC=10.184.46.4 DST=10.162.139.21 LEN=60 TOS=0x00 PREC=0x00 TTL=59 ID=52961 DF PROTO=TCP SPT=55688 DPT=22 WINDOW=5840 RES=0x00 SYN... (3 Replies)
Discussion started by: magnus29
3 Replies

2. UNIX for Dummies Questions & Answers

Delete specific strings in a file

Hi, My file has a numerous sttrings.I want to retain those strings which start with stt and delete entries with >C For eg: my infile is >C4603985... (7 Replies)
Discussion started by: sa@@
7 Replies

3. Shell Programming and Scripting

How to print multiple specific column after a specific word?

Hello.... Pls help me (and sorry my english) :) So I have a file (test.txt) with 1 long line.... for example: isgc jsfh udgf osff 8462 error iwzr 653 idchisfb isfbisfb sihfjfeb isfhsi gcz eifh How to print after the "error" word the 2nd 4th 5th and 7th word?? output well be: 653 isfbisfb... (2 Replies)
Discussion started by: marvinandco
2 Replies

4. UNIX for Dummies Questions & Answers

Printing lines with specific strings at specific columns

Hi I have a file which is tab-delimited. Now, I'd like to print the lines which have "chr6" string in both first and second columns. Could anybody help? (3 Replies)
Discussion started by: a_bahreini
3 Replies

5. Shell Programming and Scripting

How to print with awk specific field different from specific character?

Hello, i need help with awk. I have this file: cat number DirB port 67 er_enc_out 0 er_bad_os 0 DirB port 71 er_enc_out 56 er_bad_os 0 DirB port 74 er_enc_out 0 er_bad_os 0 DirB port 75 ... (4 Replies)
Discussion started by: elilmal
4 Replies

6. Shell Programming and Scripting

Getting specific strings from output

so i have the following string: ... (3 Replies)
Discussion started by: SkySmart
3 Replies

7. UNIX for Dummies Questions & Answers

How to Detect Specific Pattern and Print the Specific String after It?

I'm still beginner and maybe someone can help me. I have this input: the great warrior a, b, c and what i want to know is, with awk, how can i detect the string with 'warrior' string on it and print the a, b, and c seperately, become like this : Warrior Type a b c Im still very... (3 Replies)
Discussion started by: radynaraya
3 Replies

8. Shell Programming and Scripting

Print Specific lines when found specific character

Hello all, I have thousand file input like this: file1: $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ $$ | | | |$$ $$ UERT | TTYH | TAFE | FRFG |$$ $$______|______|________|______|$$ $$ | | | |$$ $$ 1 | DISK | TR1311 | 1 |$$ $$ 1 |... (4 Replies)
Discussion started by: attila
4 Replies

9. Shell Programming and Scripting

print first few lines, then apply regex on a specific column to print results.

abc.dat tty cpu tin tout us sy wt id 0 0 7 3 19 71 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 133.2 0.0 682.9 0.0 1.0 0.0 7.2 0 79 c1t0d0 0.2 180.4 0.1 5471.2 3.0 2.8 16.4 15.6 15 52 aaaaaa1-xx I want to skip first 5 line... (4 Replies)
Discussion started by: kchinnam
4 Replies

10. Shell Programming and Scripting

Remove specific strings from certain fields

I'm working with a set of files where I'm trying to remove a set of characters from specific fields. The files are comma-delimited, and the characters I want to remove include: - open parentheses - ( - close parentheses - ) - space followed by a dollar sign - $ I don't want to remove every... (1 Reply)
Discussion started by: HLee1981
1 Replies
Login or Register to Ask a Question