Match patterns between two files and extract certain range of strings Post: 303041966

Sponsored Content

Top Forums UNIX for Beginners Questions & Answers Match patterns between two files and extract certain range of strings Post 303041966 by RudiC on Monday 9th of December 2019 01:59:52 PM

12-09-2019

Registered User

Try also

Code:

awk '
NR==FNR         {PAT[$1,$2,$3]
                 next
                }
                {IX  = $1
                 L1  = length ($1) + 1
                 $1 = $1 "|"
                 $0 = $0
                 for (p in PAT) {split (p, T)
                                 if (IX == T[1]) print RS p ORS substr ($0, T[2]+L1, T[3]-T[2]+1)
                                }
                }
' SUBSEP="\t" inputfile2.txt   RS=">"  OFS="" inputfile1.fa

>l-WR24-1:1    1    71
GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGGGGCGGAGGGCG
>l-ZF385A-2:1    33    105
TGAGCTTCGGGTCACCGCCCCTCCAGAGGCTGAGTACTCAGGACTCGTCAGACACCCAGGGGTGAGATGAGAC
>l-YJC-1:1    1    161
GTCCCGCCCTCGCATGCGCCTGGTGGTCACCGCGGACGACTTTGGTTACTGCCCGCGACGCGATGAGGGTATCGTGGAGGCCTTTCTGGCCGGGGCTGTGACCAGCGTGTCCCTGCTGGTCAACGGTGCGGCCACGGAGAGCGCGGCGGAGCTGGCCCGCA
 >l-YJC-1:1    1    165
GTCCCGCCCTCGCATGCGCCTGGTGGTCACCGCGGACGACTTTGGTTACTGCCCGCGACGCGATGAGGGTATCGTGGAGGCCTTTCTGGCCGGGGCTGTGACCAGCGTGTCCCTGCTGGTCAACGGTGCGGCCACGGAGAGCGCGGCGGAGCTGGCCCGCAGGCA

If you really really need the output lines 60 chars in length, use

Code:

                 if (IX == T[1])    {print RS p
                                     TMP = substr ($0, T[2]+L1, T[3]-T[2]+1)
                                     PTR = 1
                                     while (PTR < length (TMP))    {print substr (TMP, PTR, 60)
                                                                    PTR += 60
                                                                   }
                                    }

Last edited by RudiC; 12-09-2019 at 03:20 PM..

These 3 Users Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

print range between two patterns if it contains a pattern within the range

I want to print between the range two patterns if a particular pattern is present in between the two patterns. I am new to Unix. Any help would be greatly appreciated. e.g. Pattern1 Bombay Calcutta Delhi Pattern2 Pattern1 Patna Madras Gwalior Delhi Pattern2 Pattern1...

2. Shell Programming and Scripting

script to match patterns in 2 different files.

I am new to shell scripting and need some help. I googled, but couldn't find a similar scenario. Basically, I need to rename a datafile. This is the scenario - I have a file, readonly.txt that has 2 columns - file# and name. I have another file,missing_files.txt that has id and name. Both the...

3. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have...

4. Shell Programming and Scripting

Extract patterns and copy them in different files

Hi All, I have a file which looks like this: Name1;A01 Name2;A01.047 Name3;A01.047.025 Newname1;B01 NewName2;B01.056.32 NewName3;B04.09.43 NewNewName1;C01.03 NewNewName2;C01.034.44As you can see, in the file there is some name and followed by the name is some identifier. These...

5. Shell Programming and Scripting

How to extract information from two files with data range

Hi, I want to make a query about extracting data from two files that both have data ranges. the data that i want to extract; when there is matching between file1 column 2 is equal to file2 column2 , and file1 column 3 and column 4 is within the range of file2 columns 3 and 4. I would like rows...

6. Shell Programming and Scripting

Using AWK to match CSV files with duplicate patterns

Dear awk users, I am trying to use awk to match records across two moderately large CSV files. File1 is a pattern file with 173,200 lines, many of which are repeated. The order in which these lines are displayed is important, and I would like to preserve it. File2 is a data file with 456,000...

7. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of...

8. Shell Programming and Scripting

Extract multiple occurance of strings between 2 patterns

I need to extract multiple occurance strings between 2 different patterns in given line. For e.g. in below as input ------------------------------------------------------------------------------------- mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)...

9. Shell Programming and Scripting

Match strings in 2 different files

Hi, i am trying to match strings from 2 different files based on position like below:- file1 (tab delimited) f07270 lololol fff u12730 gggddd dddkkrr mmm file2 (not tab delimited) %f07270 APSLH bl%alalalalallaadsdsfdfdfdgsgfss %g13450 GDIDFLRIP%ILITEAPPRKgsfgsgsf %d08880...

10. Shell Programming and Scripting

Match to range in files

I am trying to create a script that will use the position in column A ($1) in 48850.txt and search for it in columns B ($2) in gene.txt. Then when it finds a match it copies the text in column A ($1) and places it in column C ($3) of 48850.txt. I have attached the files. Thank you :). The...

LEARN ABOUT MOJAVE

bytes

bytes(3pm)						 Perl Programmers Reference Guide						bytes(3pm)

NAME

       bytes - Perl pragma to force byte semantics rather than character semantics

NOTICE

       This pragma reflects early attempts to incorporate Unicode into perl and has since been superseded. It breaks encapsulation (i.e. it
       exposes the innards of how the perl executable currently happens to store a string), and use of this module for anything other than
       debugging purposes is strongly discouraged. If you feel that the functions here within might be useful for your application, this possibly
       indicates a mismatch between your mental model of Perl Unicode and the current reality. In that case, you may wish to read some of the perl
       Unicode documentation: perluniintro, perlunitut, perlunifaq and perlunicode.

SYNOPSIS

	   use bytes;
	   ... chr(...);       # or bytes::chr
	   ... index(...);     # or bytes::index
	   ... length(...);    # or bytes::length
	   ... ord(...);       # or bytes::ord
	   ... rindex(...);    # or bytes::rindex
	   ... substr(...);    # or bytes::substr
	   no bytes;

DESCRIPTION

       The "use bytes" pragma disables character semantics for the rest of the lexical scope in which it appears.  "no bytes" can be used to
       reverse the effect of "use bytes" within the current lexical scope.

       Perl normally assumes character semantics in the presence of character data (i.e. data that has come from a source that has been marked as
       being of a particular character encoding). When "use bytes" is in effect, the encoding is temporarily ignored, and each string is treated
       as a series of bytes.

       As an example, when Perl sees "$x = chr(400)", it encodes the character in UTF-8 and stores it in $x. Then it is marked as character data,
       so, for instance, "length $x" returns 1. However, in the scope of the "bytes" pragma, $x is treated as a series of bytes - the bytes that
       make up the UTF8 encoding - and "length $x" returns 2:

	   $x = chr(400);
	   print "Length is ", length $x, "
";     # "Length is 1"
	   printf "Contents are %vd
", $x;	    # "Contents are 400"
	   {
	       use bytes; # or "require bytes; bytes::length()"
	       print "Length is ", length $x, "
"; # "Length is 2"
	       printf "Contents are %vd
", $x;     # "Contents are 198.144"
	   }

       chr(), ord(), substr(), index() and rindex() behave similarly.

       For more on the implications and differences between character semantics and byte semantics, see perluniintro and perlunicode.

LIMITATIONS

       bytes::substr() does not work as an lvalue().

SEE ALSO

       perluniintro, perlunicode, utf8

perl v5.18.2							    2013-11-04								bytes(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

print range between two patterns if it contains a pattern within the range

Discussion started by: joyan321

2. Shell Programming and Scripting

script to match patterns in 2 different files.

Discussion started by: mathews

3. Shell Programming and Scripting

Find files that do not match specific patterns

Discussion started by: nikos-koutax

4. Shell Programming and Scripting

Extract patterns and copy them in different files

Discussion started by: shoaibjameel123