Sponsored Content
Full Discussion: extracting data
Top Forums Shell Programming and Scripting extracting data Post 302537326 by birei on Thursday 7th of July 2011 06:33:47 PM
Old 07-07-2011
Hi,

I have not tested the solution of Franklin52 but there is a subtle difference between your first post and your last one. In the first one each header begins with '>' but not in the last one.

I will make a try, but parsing your file, how can I know where each header begins or ends? I suppose each header is less than 20 characters while normal lines are above that number, but I may be wrong.

Code:
$ cat infile
(data of your last post)
$ cat script.pl
use warnings;
use strict;
use constant HEADER_LINE_LENGTH => 20;

die "Usage: perl $0 <input-file> <output-file> <headers>\n" unless @ARGV > 2;

my $infile = shift;
my $outfile = shift;
my %header = map { $_ => 1 } @ARGV;

open my $fh, "<", $infile or die "Cannot open file $infile: $!\n";
open my $ofh, ">", $outfile or die "Cannot open file $outfile: $!\n";

while ( my $line = <$fh> ) {
        chomp $line;
        if ( my $flip = ( exists $header{ $line } ... length( $line ) < HEADER_LINE_LENGTH ) ) {
                if ( $flip =~ /E/ ) {
                        redo;
                } else {
                        printf $ofh "%s\n", $line;
                }
        }
}

close $fh or warn "Cannot close $infile: $!\n";
close $ofh or warn "Cannot close $outfile: $!\n";
$ perl script.pl
Usage: perl script.pl <input-file> <output-file> <headers>
$ perl script.pl infile outfile BA BC BC23_
$ cat outfile
BA
GTATACATTATTGATGAAGTCCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATCATTTCGCGTTGCCAACGTTTCGAATTTCGAAAAATATCAGTAAAT
GATATTGTTGAGAGATTGTCCACGGTTGTGACTAATGAAGGTACGCAAGTAGAAGATGAG
GCTTTACAAATTGTTGCGCGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAAGCGATATCTTATAGTGATGAGAGGGTTACGACAGAAGATGTATTAGCTGTAACG
GGTCGTGATATGTTCCGTATGTTAAGTGAA
BC23_
GTATACATTATTGATGAAGTTCACATGCTTTCTATGGGTGCATTCAATGCGCTTTTAAAA
ACCTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCTCATAAG
ATCCCACCTACAATCATTTCACGTTGTCAGCGCTTTGAATTCCGAAAAATATCAGTGAAT
GATATTGTTGAGAGATTATCAACGGTCGTGACAAATGAAGGTACGCAAGTGGAAGGTGAA
GCATTACAAATTGTTGCGCGTGCTGCCGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCTATATCTTATAGTGATGAGATTGTTACGACAGAAGATGTATTGGCCGTAACA
GGACGTGATATGTTCCGTAAGTTGAGTGAA
BC
GTATACATTATTGATGAAGTTCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAACCGCCAGGACATGTCATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATTATTTCGCGTTGCCAACGTTTCGAATTTCGAAAGATATCAGTAAAT
GATATTGTTGAGAGATTATCGACAGTTGTAAACAATGAAGGTACGCAAGTAGAAGATGAA
GCGTTACAAATCGTTGCACGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCAATATCTTATAGTGATGAGACTGTTACGACAGAAGATGTATTAGCTGTAACA
GGGCGTGATATGTTCCGAATGTTAAGTGAA

Regards,
Birei
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting certain data from a sentence

How do I delete text in each line from the first character up to a certain pattern, ie. 'qmuser.' and then delete from the next occurence of a dot to the end of the sentence For example: - LTSB Renewal Notice Travel Pack --- d \qmaster\spool1\qmuser.8664_LM245_20031216094308.ps.0 From this... (7 Replies)
Discussion started by: dbrundrett
7 Replies

2. Shell Programming and Scripting

Extracting data from each line

Hi All I have one file aa.txt like this Change 172453 on 2006/04/26 10:45:45 by cm@cm-ixca-cm-build23 'cmbuild: ixweb-3.10.28.110 ' Change 172362 on 2006/04/26 08:58:47 by cm@cm-ixca-cm-build23 'build failed: ixweb-3.10.28.109' Change 172299 on 2006/04/26 07:39:08 by... (1 Reply)
Discussion started by: csaha
1 Replies

3. UNIX for Dummies Questions & Answers

Extracting Data from a File

Hi I need to calculate the number of occurrences of a item in a number of files using Perl. The item appears continually throughout the files but in each case I only want to calculate it in certain blocks of the file. Example - Calculalte the number of occurrences of a 'pass' in a block of... (0 Replies)
Discussion started by: oop
0 Replies

4. Shell Programming and Scripting

extracting data from files..

frnds, I m having prob woth doing some 2-3 task simultaneously... what I want is... I have lots ( lacs ) of files in a dir... I want.. these info from arround 2-3 months files filename convention is - abc20080403sdas.xyz ( for todays files ) I want 1. total no of files for 1 dec... (1 Reply)
Discussion started by: clx
1 Replies

5. UNIX for Dummies Questions & Answers

Extracting Data Using SED

Given the following text in a file named extract.txt: listenPort:=25 smtpDestination:=2 enableSSL:= I am trying to extract only the value 2 following smtpDestination:= Someone had suggested I use: sed -e "s/^smtpDestination:=\(.*\)$/\1/" extract.txt but this returns: listenPort:=25 2 ... (2 Replies)
Discussion started by: cleanden
2 Replies

6. UNIX for Dummies Questions & Answers

Help with extracting data and plotting

I have attached a txt file, what I would like to be able to do is: 1. Extract Data from Columns labeled E/N and Ko into a new file 2. Then in the new file I would like to be able to plot E/N on the X axis and Ko on the y axis. 3. Lastly I would like to be able to extract multiple data sets and... (6 Replies)
Discussion started by: gingburg
6 Replies

7. Shell Programming and Scripting

Extracting data between two characters

From the command line how would I extract data in file that was contained between parenthesis "()"? Awk or Grep? Thanks in advance Ted (11 Replies)
Discussion started by: TedSD
11 Replies

8. Shell Programming and Scripting

Extracting and printing data

Hi I have the following data : <Cell id="34A" ref="ds:/BTS:34/Cells/Cell:34A"/> <Cell id="34B" ref="ds:/BTS:34/Cells/Cell:34B"/> <Cell id="34C" ref="ds:/BTS:34/Cells/Cell:34C"/> I would like to print this data in the following format : BTS:34 Cell:34A.I'm... (9 Replies)
Discussion started by: Prega
9 Replies

9. Shell Programming and Scripting

extracting data from a string

Hi there, I have a bunch of vlan tagged network interfaces that are named as follows e1000g111000 e1000g99001 e1000g3456000 nge2002 where the 'e1000g' and 'nge' parts of the name are the driver, the red and blue bits above define the VLAN and the last digit on the end defines the... (3 Replies)
Discussion started by: rethink
3 Replies

10. Shell Programming and Scripting

Extracting specific lines of data from a file and related lines of data based on a grep value range?

Hi, I have one file, say file 1, that has data like below where 19900107 is the date, 19900107 12 144 129 0.7380047 19900108 12 168 129 0.3149017 19900109 12 192 129 3.2766666E-02 ... (3 Replies)
Discussion started by: Wynner
3 Replies
vselect(1Vi)															      vselect(1Vi)

NAME
vselect - select objects from a Vista data file SYNOPSIS
vselect [-option ...] [infile] [outfile] DESCRIPTION
vselect copies selected objects from an input file to an output file. Command line options specify which objects are to be copied. Objects may be selected by type (e.g., all images), by name, by the value of some attribute, or by position within the input file. COMMAND LINE OPTIONS
vselect accepts the following options: -help Prints a message describing options. -in Specifies a Vista data file from which objects are to be selected. -out Specifies where to write the selected objects as a Vista data file. The input and output files can be specified on the command line or allowed to default to the standard input and output streams. In addition, exactly one of the following options must be used to specify the objects to be selected: -object i Select the ith object. Objects in the input file are numbered consecutively from 0. -name name Select any object whose attribute name is name. -type type Select any object whose type is type. -attr name value Select any object having an attribute with the specified name and value. Finally, any of these selection criteria can be inverted: -not Reverses the sense of the selection criterion. EXAMPLES
To select the first object from a file: vselect -object 0 < infile > outfile To select the attribute named ``cantaloupe'' from a file: vselect -name cantaloupe < infile > outfile To select everything but images from a file: vselect -type image -not < infile > outfile To select images with ubyte pixels from a file of images: vselect -attr nbands ubyte < infile > outfile AUTHOR
Art Pope <pope@cs.ubc.ca> Vista Version 1.12 24 April 1993 vselect(1Vi)
All times are GMT -4. The time now is 09:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy