Sponsored Content
Top Forums Shell Programming and Scripting Filter file by length, looking only at lines that don't begin with ">" Post 302835661 by pathunkathunk on Monday 22nd of July 2013 11:20:54 PM
Old 07-23-2013
Filter file by length, looking only at lines that don't begin with ">"

I have a file that stores data in pairs of lines, following this format:
line 1: header (preceded by ">")
line 2: sequence

Example.txt:
Code:
>seq1 name
GATTGATGTTTGAGTTTTGGTTTTT
>seq2 name
TTTTCTTC

I want to filter out the sequences and corresponding headers for all sequences that are less than 11 characters. Desired output:
Code:
>seq2 name
TTTTCTTC

I can search each line for lines less than 11 characters, and print that line along with the header. The problem I'm having is ignoring the headers (i.e. lines beginning with ">") when I do the length search.

For example
Code:
awk '{lines[NR] = $0} length($0) < 11 {print lines [NR-1]; print lines [NR]} ' example.txt

Gives me
Code:
>seq1 name
GATTGATGTTTGAGTTTTGGTTTTT
>seq2 name
>seq2 name
TTTTCTTC

How do I tell awk not to ignore lines beginning with ">"?

Last edited by Scrutinizer; 07-23-2013 at 02:55 AM.. Reason: code tags also for data samples. Do not use quote tags
 

10 More Discussions You Might Find Interesting

1. SCO

Plz. don't ignore this mail "Installing Tomcat 4.1.24.zip on Sco Openserver 5.0.2"

Hi Guys, I want ur replies very very Urgently.Plz. don't ignore this mail. I am using Sco openserver 5.0.2 and i have downloaded jdk1.2.2 for that i have installed it.The jdk is working fine. Then i download jakarta-tomcat-4.1.24.zip and i have installed it. In order... (1 Reply)
Discussion started by: ananthu_m
1 Replies

2. Shell Programming and Scripting

How to skip lines which don't begin with a number

Hi, I have a file: file.txt 1 word 2 word word word 3 word 4 word and I would like to create a set: set number = `cut -d" " -f1 ${1}` #${1} is the text file but it should only contain the lines which begin with numbers, and another set which contains the lines which begin with... (10 Replies)
Discussion started by: shira
10 Replies

3. Shell Programming and Scripting

perl file, one line code include "length, rindex, substr", slow

Hi Everyone, # cat a.txt a;b;c;64O a;b;c;d;ee;f # cat a.pl #!/usr/bin/perl use strict; use warnings; my $tmp3 = ",,a,,b,,c,,d,,e,,f,,"; open(my $FA, "a.txt") or die "$!"; while(<$FA>) { chomp; my @tmp=split(/\;/, $_); if ( ($tmp =~ m/^(64O)/i) || ($tmp... (3 Replies)
Discussion started by: jimmy_y
3 Replies

4. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

5. UNIX for Dummies Questions & Answers

awk - difference between -F"," and BEGIN{FS=","}

in awk, what is the difference between: -F"," and BEGIN{FS=","} (2 Replies)
Discussion started by: locoroco
2 Replies

6. Shell Programming and Scripting

Find lines with "A" then change "E" to "X" same line

I have a bunch of random character lines like ABCEDFG. I want to find all lines with "A" and then change any "E" to "X" in the same line. ALL lines with "A" will have an "X" somewhere in it. I have tried sed awk and vi editor. I get close, not quite there. I know someone has already solved this... (10 Replies)
Discussion started by: nightwatchrenba
10 Replies

7. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

8. UNIX for Dummies Questions & Answers

Grep : Filter/Move All The Lines Containing Not More Than One "X" Character Into A Text File

Hi All It's me again with another huge txt files. :confused: What I have: - I have 33 huge txt files in a folder. - I have thousands of line in this txt file which contain many the letter "x" in them. - Some of them have more than one "x" character in the line. What I want to achieve:... (8 Replies)
Discussion started by: Nexeu
8 Replies

9. Shell Programming and Scripting

Filter all the lines with minimum specified length of words of a text file

Hi Can someone tell me which script will work best (in terms of speed and simplicity to write and run) for a large text file to filter all the lines with a minimum specified length of words ? A sample script with be definitely of great help !!! Thanks in advance. :) (4 Replies)
Discussion started by: my_Perl
4 Replies

10. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies
colcrt(1)						      General Commands Manual							 colcrt(1)

Name
       colcrt - filter nroff output for CRT previewing

Syntax
       colcrt [-] [-2] [file...]

Description
       The  command provides virtual half-line and reverse line feed sequences for terminals without such capability, and on which overstriking is
       destructive.  Half-line characters and underlining (changed to dashing `-') are placed on new lines in between the normal output lines.

Options
       -  Suppresses all underlining.  It is especially useful for previewing allboxed tables from

       -2 Causes half-lines to be printed, double spacing the output.  Normally, a minimal space output format is used which will  suppress  empty
	  lines.   The	program  never	suppresses  two  consecutive empty lines, however.  The -2 option is useful for sending output to the line
	  printer when the output contains superscripts and subscripts which would otherwise be invisible.

Examples
       A typical use of would be:
       tbl exum2.n | nroff -ms | colcrt - | more

Restrictions
       Can't back up more than 102 lines.

       General overstriking is lost; as a special case `|' overstruck with `-' or underline becomes `+'.

       Lines are trimmed to 132 characters.

See Also
       col(1), more(1), nroff(1), ul(1)

																	 colcrt(1)
All times are GMT -4. The time now is 04:35 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy