Extract lines from text files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract lines from text files
# 1  
Old 01-25-2014
Extract lines from text files

I have some files containing the following data

Code:
 # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 
1 196 A M 0 0 230 0, 0.0 2,-0.2 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 76.4 21.7 -6.8 11.3
2 197 A D + 0 0 175 1,-0.1 2,-0.1 0, 0.0 0, 0.0 -0.193 360.0 151.5 -46.2 99.1 23.2 -9.3 13.8
3 198 A E - 0 0 170 -2,-0.2 -1,-0.1 0, 0.0 0, 0.0 -0.622 29.3-158.9-134.6 66.9 26.9 -9.0 13.0
4 199 A K - 0 0 161 1,-0.1 0, 0.0 -2,-0.1 0, 0.0 0.037 18.7-134.6 -43.9 157.1 28.8 -9.8 16.3
5 200 A R + 0 0 174 3,-0.0 2,-1.6 2,-0.0 -1,-0.1 0.294 60.4 134.1 -97.8 0.9 32.4 -8.5 16.6
6 201 A R + 0 0 178 1,-0.1 -2,-0.1 2,-0.1 0, 0.0 -0.429 24.5 143.8 -54.0 86.9 33.5 -11.9 17.9
7 202 A A + 0 0 76 -2,-1.6 -1,-0.1 2,-0.1 -2,-0.0 -0.471 24.7 108.8-134.5 48.7 36.5 -11.8 15.5
8 203 A Q S S+ 0 0 149 3,-0.0 2,-0.1 4,-0.0 -2,-0.1 -0.694 77.8 88.8-115.4 54.1 39.3 -13.4 17.4
9 204 A H S >> S- 0 0 121 4,-0.0 3,-2.1 0, 0.0 4,-0.7 -0.341 88.3 -9.7-128.0-146.1 38.5 -16.0 14.8
10 205 A N H 3> S+ 0 0 145 1,-0.3 4,-0.8 2,-0.2 5,-0.2 0.673 125.2 50.8 -27.9 -50.8 39.4 -17.0 11.2
11 206 A E H 34 S+ 0 0 159 1,-0.2 4,-0.3 2,-0.1 -1,-0.3 0.843 106.1 59.4 -64.2 -34.5 41.5 -13.9 10.2
12 207 A V H X4 S+ 0 0 60 -3,-2.1 3,-0.5 2,-0.1 4,-0.4 0.982 107.8 32.9 -62.8 -61.2 43.7 -14.0 13.3
13 208 A E H >X S+ 0 0 78 -4,-0.7 3,-4.0 1,-0.2 4,-0.9 0.950 109.6 53.5 -70.0 -62.3 45.4 -17.4 13.2

Desired output
Code:
ASG  ILE A   99    2    C          Coil    -82.86    141.16      97.1      1N8W
ASG  LEU A  146   48    C          Coil    -68.82    158.46       0.0      1N8W
ASG  LEU A  302  167    E        Strand    -98.11    143.77      19.7      1N8W

I want to extract the lines only if the values in the phi and psi columns between -67<=phi<=-99 and 100<=psi<=165
I would like to save the outputs in to another folder f2 with the input file names. I highly appreciate your valuable suggestions.

Thanks a lot.

Last edited by edweena; 01-26-2014 at 08:08 AM..
# 2  
Old 01-25-2014
Code:
$ awk 'NR==1{print;next}$15>=-67 && $15<=-99 && $16>=100 && $16<=165' file

# 3  
Old 01-25-2014
Code:
awk 'NR==1||($(NF-3)>=100&&$(NF-3)<=165&&$(NF-4)>=-67&&$(NF-4)<=-99)' file

# 4  
Old 01-25-2014
Quote:
Originally Posted by edweena
I have some files containing the following data

Code:
 #  RESIDUE AA STRUCTURE BP1 BP2  ACC     N-H-->O    O-->H-N    N-H-->O    O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA 
    1  196 A M              0   0  230      0, 0.0     2,-0.2     0, 0.0     0, 0.0   0.000 360.0 360.0 360.0  76.4   21.7   -6.8   11.3
    2  197 A D        +     0   0  175      1,-0.1     2,-0.1     0, 0.0     0, 0.0  -0.193 360.0 151.5 -46.2  99.1   23.2   -9.3   13.8
    3  198 A E        -     0   0  170     -2,-0.2    -1,-0.1     0, 0.0     0, 0.0  -0.622  29.3-158.9-134.6  66.9   26.9   -9.0   13.0
    4  199 A K        -     0   0  161      1,-0.1     0, 0.0    -2,-0.1     0, 0.0   0.037  18.7-134.6 -43.9 157.1   28.8   -9.8   16.3
    5  200 A R        +     0   0  174      3,-0.0     2,-1.6     2,-0.0    -1,-0.1   0.294  60.4 134.1 -97.8   0.9   32.4   -8.5   16.6
    6  201 A R        +     0   0  178      1,-0.1    -2,-0.1     2,-0.1     0, 0.0  -0.429  24.5 143.8 -54.0  86.9   33.5  -11.9   17.9
    7  202 A A        +     0   0   76     -2,-1.6    -1,-0.1     2,-0.1    -2,-0.0  -0.471  24.7 108.8-134.5  48.7   36.5  -11.8   15.5
    8  203 A Q  S    S+     0   0  149      3,-0.0     2,-0.1     4,-0.0    -2,-0.1  -0.694  77.8  88.8-115.4  54.1   39.3  -13.4   17.4
    9  204 A H  S >> S-     0   0  121      4,-0.0     3,-2.1     0, 0.0     4,-0.7  -0.341  88.3  -9.7-128.0-146.1   38.5  -16.0   14.8
   10  205 A N  H 3> S+     0   0  145      1,-0.3     4,-0.8     2,-0.2     5,-0.2   0.673 125.2  50.8 -27.9 -50.8   39.4  -17.0   11.2
   11  206 A E  H 34 S+     0   0  159      1,-0.2     4,-0.3     2,-0.1    -1,-0.3   0.843 106.1  59.4 -64.2 -34.5   41.5  -13.9   10.2
   12  207 A V  H X4 S+     0   0   60     -3,-2.1     3,-0.5     2,-0.1     4,-0.4   0.982 107.8  32.9 -62.8 -61.2   43.7  -14.0   13.3
   13  208 A E  H >X S+     0   0   78     -4,-0.7     3,-4.0     1,-0.2     4,-0.9   0.950 109.6  53.5 -70.0 -62.3   45.4  -17.4   13.2

I want to extract the lines only if the values in the phi and psi column must between -67<=phi<=-99 and 100<=psi<=165
I would like to save the outputs in to another folder having the input file names. I highly appreciate your valuable suggestions.

Thanks a lot.
There are several problems here. First, and most importantly, your specification requiring a value for PHI that is greater than -67 and simultaneously less than -99 (-67<=phi<= -99) always yields the empty set.

If we assume that you meant -99 <= PHI <= -67, your sample data still produces no output (except for the heading) because only the fifth line of your input file has a PSI value between 100 and 165, and the PHI value on that line is -43.9 (which is out of range). These values are marked in red above.

When Akshay provided his suggested code, he apparently didn't notice that the data under the heading "STRUCTURE" looks like 0, 1, 2, or 3 fields to awk (when using the default field delimiter). Yoda compensated for that problem, but apparently didn't notice that sometimes there are no field delimiters between values under the headings KAPPA, ALPHA, PHI, and PSI. Some samples of this problem are marked in green above. So, rather than using field delimiters, any code processing these lines will have to be based on column positions in the input file; not field counts.

Are there ever any <tab> characters in your input files? Or, are all of the spaces between fields just sequences of <space> characters?

Please provide us with a specification that doesn't always produce an empty set, and provide us some sample input that includes some lines that will be selected as well as some lines that will be rejected. And, show us the sample output you expect to be produced for that sample input.

And, please tell us how the name of the directory to contain the new files will be passed to your script.

Last edited by Don Cragun; 01-25-2014 at 04:39 PM.. Reason: fix typo
These 2 Users Gave Thanks to Don Cragun For This Post:
# 5  
Old 01-25-2014
Yup! We didn't notice. Thank you Don
# 6  
Old 01-26-2014
Hi Cragun,

Thank you very much for your suggestions. I have rephrased my question below and changed the data. Please have a look at the question.

I have a folder f1 that contains some files. The content of the files are shown below.

Code:
REM  |---Residue---|    |--Structure--|   |-Phi-|   |-Psi-|  |-Area-|      1N8W
ASG  GLU A   98    1    C          Coil    360.00    145.18     236.2      1N8W
ASG  ILE A   99    2    C          Coil    -82.86    141.16      97.1      1N8W
ASG  ILE A  100    3    C          Coil   -115.85    140.04      33.4      1N8W
ASG  GLN A  101    4    E        Strand   -114.08    115.71      61.8      1N8W
ASG  GLY A  127   29    C          Coil    149.12    153.69      21.1      1N8W
ASG  GLU A  128   30    T          Turn    -81.07    168.08     150.8      1N8W
ASG  PHE A  129   31    T          Turn    -55.84    139.19      85.7      1N8W
ASG  CYS A  144   46    H    AlphaHelix    -67.95    -16.88       0.0      1N8W
ASG  GLN A  145   47    C          Coil    -86.59    -11.10      29.5      1N8W
ASG  LEU A  146   48    C          Coil    -68.82    158.46       0.0      1N8W
ASG  PRO A  147   49    C          Coil    -61.30    150.63      46.7      1N8W
ASG  ILE A  148   50    G      310Helix    -57.27    -35.92      84.1      1N8W
ASG  TYR A  301  166    E        Strand   -110.40    111.53      75.1      1N8W
ASG  LEU A  302  167    E        Strand    -98.11    143.77      19.7      1N8W

Desired output
Code:
ASG  ILE A   99    2    C          Coil    -82.86    141.16      97.1      1N8W
ASG  LEU A  146   48    C          Coil    -68.82    158.46       0.0      1N8W
ASG  LEU A  302  167    E        Strand    -98.11    143.77      19.7      1N8W

I want to extract the lines only if the values in the phi and psi columns between -67<=phi<=-99 and 100<=psi<=165
I would like to save the outputs in to another folder f2 with the input file names. I highly appreciate your valuable suggestions.

Thanks a lot.

Moderator's Comments:
Mod Comment I have restored post #1 to its original content and added the new data into this post, otherwise the thread would be difficult to follow

Moderator's Comments:
Mod Comment I have picked up some more of the updates to original posting so this message gives a complete picture of what is now being requested.

Last edited by Don Cragun; 01-26-2014 at 03:44 PM.. Reason: Pick up the requested selection criteria and delete the out of range output.
# 7  
Old 01-26-2014
Quote:
Originally Posted by edweena
Hi Cragun,

Thank you very much for your suggestions. I have rephrased my question below and changed the data. Please have a look at the question.

I have a folder f1 that contains some files. The content of the files are shown below.

Code:
REM  |---Residue---|    |--Structure--|   |-Phi-|   |-Psi-|  |-Area-|      1N8W
ASG  GLU A   98    1    C          Coil    360.00    145.18     236.2      1N8W
ASG  ILE A   99    2    C          Coil    -82.86    141.16      97.1      1N8W
ASG  ILE A  100    3    C          Coil   -115.85    140.04      33.4      1N8W
ASG  GLN A  101    4    E        Strand   -114.08    115.71      61.8      1N8W
ASG  GLY A  127   29    C          Coil    149.12    153.69      21.1      1N8W
ASG  GLU A  128   30    T          Turn    -81.07    168.08     150.8      1N8W
ASG  PHE A  129   31    T          Turn    -55.84    139.19      85.7      1N8W
ASG  CYS A  144   46    H    AlphaHelix    -67.95    -16.88       0.0      1N8W
ASG  GLN A  145   47    C          Coil    -86.59    -11.10      29.5      1N8W
ASG  LEU A  146   48    C          Coil    -68.82    158.46       0.0      1N8W
ASG  PRO A  147   49    C          Coil    -61.30    150.63      46.7      1N8W
ASG  ILE A  148   50    G      310Helix    -57.27    -35.92      84.1      1N8W
ASG  TYR A  301  166    E        Strand   -110.40    111.53      75.1      1N8W
ASG  LEU A  302  167    E        Strand    -98.11    143.77      19.7      1N8W

Desired output
Code:
ASG  ILE A   99    2    C          Coil    -82.86    141.16      97.1      1N8W
ASG  GLU A  128   30    T          Turn    -81.07    168.08     150.8      1N8W
ASG  LEU A  146   48    C          Coil    -68.82    158.46       0.0      1N8W
ASG  LEU A  302  167    E        Strand    -98.11    143.77      19.7      1N8W

Moderator's Comments:
Mod Comment I have restored post #1 to its original content and added the new data into this post, otherwise the thread would be difficult to follow


Code:
$ awk '$8>=-99 && $8<=-67 && $9>=100 && $9<=165' file

This will give you desired output
Code:
$ awk '$8>=-99 && $8<=-67 && $9>=100 && $9<=169' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extract the same lines from the two files

I used to use this script to extract the same lines from two files: grep -f file1 file2 > outputfile now I have file1 AB029895 AF208401 AF309648 AF526378 AJ444445 AJ720950 AJ851546 AY568629 AY591907 AY994087 BU116401 BU116599 BU119689 BU121308 BU125622 BU231446 BU236750 BU237045 (4 Replies)
Discussion started by: yuejian
4 Replies

2. Shell Programming and Scripting

extract lines from text after keyword

I have a text and I want to extract the 4 lines following a keyword! For example if I have this text and the keyword is AAA hello helloo AAA one two three four helloooo hellooo I want the output to be one two three four (7 Replies)
Discussion started by: stekanius
7 Replies

3. Shell Programming and Scripting

How to extract lines between tags into different files?

I have an xml file with the below data: unix>Cat address.xml <Address City=”Amsterdam” Street = “station straat” ZIPCODE="2516 CK " </Address> <Address City=”Amsterdam” Street = “Leeuwen straat” ZIPCODE="2517 AB " </Address> <Address City=”The Hauge” Street = “kirk straat” ... (1 Reply)
Discussion started by: LinuxLearner
1 Replies

4. UNIX for Dummies Questions & Answers

Help please, extract multiple lines from a text file

Hi all, I need to extract lines between the lines 'RD' and 'QA' from a text file (following). there are more that one of such pattern in the file and I need to extract all of them. however, the number of lines between them is varied in the file. Therefore, I can not just use 'grep -A' command.... (6 Replies)
Discussion started by: johnshembb
6 Replies

5. Shell Programming and Scripting

Extract two lines before and after the 'search text'

Hi Guys, I have a situation wherein I need to extract two lines from below the search string. Eg. Current: $ grep "$(date +'%a %b %e')" alert.log Mon Apr 12 03:58:10 2010 Mon Apr 12 12:51:48 2010 $ Here I would like the display to be something like Mon Apr 12... (6 Replies)
Discussion started by: geetap
6 Replies

6. Shell Programming and Scripting

AWK: How to extract text lines between two strings

Hi. I have a text test1.txt file like:Receipt Line1 Line2 Line3 End Receipt Line4 Line5 Line6 Canceled Receipt Line7 Line8 Line9 End (9 Replies)
Discussion started by: TQ3
9 Replies

7. Shell Programming and Scripting

extract particular lines from text file

I have two files file A which have a number in every row and file B which contains few hundred thousand rows with about 300 characters in each row (csv) What I need is to extract whole rows from B file (only these which numbers are indicated in A file) I also need to use cygwin. Any... (7 Replies)
Discussion started by: gunio
7 Replies

8. Shell Programming and Scripting

Extract lines from files

hi all, I have three files. The first file (FILE_INFO in my code) consists of four parameters for each line. 0.00765600 0.08450704 M3 E3 0.00441931 0.04878049 M4 E5 0.01904574 0.21022727 M5 E10 0.00510400 0.05633803 M6 E12 0.00905960 ... (11 Replies)
Discussion started by: my_Perl
11 Replies

9. Shell Programming and Scripting

Extract lines of text based on a specific keyword

I regularly extract lines of text from files based on the presence of a particular keyword; I place the extracted lines into another text file. This takes about 2 hours to complete using the "sort" command then Kate's find & highlight facility. I've been reading the forum & googling and can find... (4 Replies)
Discussion started by: DionDeVille
4 Replies

10. Shell Programming and Scripting

extract the lines between specific line number from a text file

Hi I want to extract certain text between two line numbers like 23234234324 and 54446655567567 How do I do this with a simple sed or awk command? Thank you. ---------- Post updated at 06:16 PM ---------- Previous update was at 05:55 PM ---------- found it: sed -n '#1,#2p'... (1 Reply)
Discussion started by: return_user
1 Replies
Login or Register to Ask a Question