Filter file by length, looking only at lines that don't begin with ">"


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filter file by length, looking only at lines that don't begin with ">"
# 1  
Old 07-23-2013
Filter file by length, looking only at lines that don't begin with ">"

I have a file that stores data in pairs of lines, following this format:
line 1: header (preceded by ">")
line 2: sequence

Example.txt:
Code:
>seq1 name
GATTGATGTTTGAGTTTTGGTTTTT
>seq2 name
TTTTCTTC

I want to filter out the sequences and corresponding headers for all sequences that are less than 11 characters. Desired output:
Code:
>seq2 name
TTTTCTTC

I can search each line for lines less than 11 characters, and print that line along with the header. The problem I'm having is ignoring the headers (i.e. lines beginning with ">") when I do the length search.

For example
Code:
awk '{lines[NR] = $0} length($0) < 11 {print lines [NR-1]; print lines [NR]} ' example.txt

Gives me
Code:
>seq1 name
GATTGATGTTTGAGTTTTGGTTTTT
>seq2 name
>seq2 name
TTTTCTTC

How do I tell awk not to ignore lines beginning with ">"?

Last edited by Scrutinizer; 07-23-2013 at 02:55 AM.. Reason: code tags also for data samples. Do not use quote tags
# 2  
Old 07-23-2013
try..

Code:
 
awk '{lines[NR] = $0}!/>/&&length($0) < 11 {print lines [NR-1]; print lines [NR]} ' example.txt

These 2 Users Gave Thanks to vidyadhar85 For This Post:
# 3  
Old 07-23-2013
That does the trick, thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

2. Shell Programming and Scripting

Filter all the lines with minimum specified length of words of a text file

Hi Can someone tell me which script will work best (in terms of speed and simplicity to write and run) for a large text file to filter all the lines with a minimum specified length of words ? A sample script with be definitely of great help !!! Thanks in advance. :) (4 Replies)
Discussion started by: my_Perl
4 Replies

3. UNIX for Dummies Questions & Answers

Grep : Filter/Move All The Lines Containing Not More Than One "X" Character Into A Text File

Hi All It's me again with another huge txt files. :confused: What I have: - I have 33 huge txt files in a folder. - I have thousands of line in this txt file which contain many the letter "x" in them. - Some of them have more than one "x" character in the line. What I want to achieve:... (8 Replies)
Discussion started by: Nexeu
8 Replies

4. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

5. Shell Programming and Scripting

Find lines with "A" then change "E" to "X" same line

I have a bunch of random character lines like ABCEDFG. I want to find all lines with "A" and then change any "E" to "X" in the same line. ALL lines with "A" will have an "X" somewhere in it. I have tried sed awk and vi editor. I get close, not quite there. I know someone has already solved this... (10 Replies)
Discussion started by: nightwatchrenba
10 Replies

6. UNIX for Dummies Questions & Answers

awk - difference between -F"," and BEGIN{FS=","}

in awk, what is the difference between: -F"," and BEGIN{FS=","} (2 Replies)
Discussion started by: locoroco
2 Replies

7. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

8. Shell Programming and Scripting

perl file, one line code include "length, rindex, substr", slow

Hi Everyone, # cat a.txt a;b;c;64O a;b;c;d;ee;f # cat a.pl #!/usr/bin/perl use strict; use warnings; my $tmp3 = ",,a,,b,,c,,d,,e,,f,,"; open(my $FA, "a.txt") or die "$!"; while(<$FA>) { chomp; my @tmp=split(/\;/, $_); if ( ($tmp =~ m/^(64O)/i) || ($tmp... (3 Replies)
Discussion started by: jimmy_y
3 Replies

9. Shell Programming and Scripting

How to skip lines which don't begin with a number

Hi, I have a file: file.txt 1 word 2 word word word 3 word 4 word and I would like to create a set: set number = `cut -d" " -f1 ${1}` #${1} is the text file but it should only contain the lines which begin with numbers, and another set which contains the lines which begin with... (10 Replies)
Discussion started by: shira
10 Replies

10. SCO

Plz. don't ignore this mail "Installing Tomcat 4.1.24.zip on Sco Openserver 5.0.2"

Hi Guys, I want ur replies very very Urgently.Plz. don't ignore this mail. I am using Sco openserver 5.0.2 and i have downloaded jdk1.2.2 for that i have installed it.The jdk is working fine. Then i download jakarta-tomcat-4.1.24.zip and i have installed it. In order... (1 Reply)
Discussion started by: ananthu_m
1 Replies
Login or Register to Ask a Question