Sponsored Content
Top Forums Shell Programming and Scripting Extraction of sequences from files Post 302952217 by Don Cragun on Saturday 15th of August 2015 02:26:18 AM
Old 08-15-2015
If the input files are as you described, and you used the RudiC suggested in post #2, you should get the output he listed in that same post.

If you're running this script on a Solaris/SunOS system, change awk in his suggestion to /usr/xpg3/bin/awk. (Since you say that was no output, this should not be your problem.)

If your input files are in DOS format (with <carriage-return><linefeed> character pair line terminators instead of the normal <newline> character line terminators expected by UNIX and Linux system utilities) or have extraneous spaces and/or tabs at the end of input lines, change RudiC's suggestion to:
Code:
awk '{sub("[[:space:]]*\r*$","")} FNR==NR {T[$1];next} $1 in T {P=NR+1} NR<=P' file1 file2

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extraction of latest files from cvs repository

Hi everyone.. Anybody having idea to get the latest file from CVS repository through schell scripts. Thanks in advance. Regards shahid Bakshi (4 Replies)
Discussion started by: shahidbakshi
4 Replies

2. UNIX for Dummies Questions & Answers

merged 10 files with column extraction into one

Hi, I have 600 text files. In each txt file, I have 3 columns, e.g: File 1 a 0.21 0.003 b 0.34 0.004 c 0.72 0.002 File 2 a 0.25 0.0083 b 0.38 0.0047 c 0.79 0.00234 File 3 a 0.45 0.0063 b 0.88 0.0027 c 0.29 0.00204 ... my filename as "sc2408_0_5278.txt sc2408_0_5279.txt... (2 Replies)
Discussion started by: libenhelen
2 Replies

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Hi all, Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here... Nucleotide This is saved on my computer as NC_001666.gb. I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all... (5 Replies)
Discussion started by: akreibich07
5 Replies

4. Shell Programming and Scripting

Selective extraction of data from a files

Hi, I would like to seek for methods to do selective extraction of line froma file. The scenario as follows: I have a file with content: message a received on 11:10:00 file size: 10 bytes send by abc message b received on 11:20:00 file size: 10 bytes send by abc (3 Replies)
Discussion started by: dwgi32
3 Replies

5. Shell Programming and Scripting

Extraction of data from multiple text files, and creation of a chart

Hello dear friends, My problem as explained below seems really basic. Fact is that I'm totally new to programming, and have only a week to produce a script ( CShell or Perl ? ) to perform this action. While searching on the forums, I found a command that could help me, but I don't know... (2 Replies)
Discussion started by: ackheron
2 Replies

6. Shell Programming and Scripting

Files extraction - any help ?

Hi Friends, i am new to unix,i have a big doubt/help. I have files in folders SER1 and SER2 with naming convention as below file_2010-03-19.txt and so on the file naming format is file_<date>.txt. I would like to copy the files to directory "Landing" I have entries in a log file log.txt... (5 Replies)
Discussion started by: Gopal_Engg
5 Replies

7. UNIX for Dummies Questions & Answers

Need help for data extraction if files

Hello all, I want to extract some particular data from a files and than add all the values . but i m not able to cut the particular word(USU-INOCT and USU-OUTOCT) as it is coming not in column. and than able to add values coming in it . can anyone help me Please cat <file name> ... (7 Replies)
Discussion started by: anamdev
7 Replies

8. Shell Programming and Scripting

Randomly selecting sequences and generating specific output files

I have two files containing hundreds of different sequences with the same Identifiers (ID-001, ID-002, etc.,), something like this: Infile1: ID-001 ATGGGAGCGGGGGCGTCTGCCTTGAGGGGAGAGAAGCTAGATACA ID-002 ATGGGAGCGGGGGCGTCTGTTTTGAGGGGAGAGAAGCTAGATACA ID-003... (18 Replies)
Discussion started by: Xterra
18 Replies

9. Shell Programming and Scripting

Speed up extraction od tar.bz2 files using bash

The below bash will untar each tar.bz2 folder in the directory, then remove the tar.bz2. Each of the tar.bz2 folders ranges from 40-75GB and currently takes ~2 hours to extract. Is there a way to speed up the extraction process? I am using a xeon processor with 12 cores. Thank you :). ... (7 Replies)
Discussion started by: cmccabe
7 Replies

10. UNIX for Beginners Questions & Answers

UNIX - 2 tab delimited files, conditional column extraction

Please know that I am very new to unix and trying to learn 'on the job'. I'm only manipulating large tab-delimited files (millions of rows), but I'm stuck and don't know how to proceed with the following. Hoping for some friendly advice :) I have 2 tab-delimited files - with differing column &... (10 Replies)
Discussion started by: GTed
10 Replies
JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME
join - relational database operator SYNOPSIS
join [-an] [-e s] [-o list] [-tc] file1 file2 DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard input is used. File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in each line. There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con- sists of the common field, then the rest of the line from file1, then the rest of the line from file2. Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis- carded. These options are recognized: -an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2. -e s Replace empty output fields by string s. -o list Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a field number. -tc Use character c as a separator (tab character). Every appearance of c in a line is significant. SEE ALSO
sort(1), comm(1), awk(1). BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort. The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous. 7th Edition April 29, 1985 JOIN(1)
All times are GMT -4. The time now is 07:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy