Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Match patterns between two files and extract certain range of strings Post 303041964 by vgersh99 on Monday 9th of December 2019 11:44:45 AM
Old 12-09-2019
a bit verbose, but a possible starter.
awk -f bunny.awk inputfile2.txt inputfile1.fa where bunny.awk is:
Code:
function printRec() {
   #print a[f], s[f], e[f]
   for ( i in s) {
      split(i,t, OFS)
      if (f == ">" t[1])
        print ">" i ORS substr(a[f],s[i],e[i]-s[i]+1)
   }
   f=""
   split("",a)
}
FNR==NR {
   idx=$1 OFS $2 OFS $3
   s[idx]=$2
   e[idx]=$3
   next
}
/>/ && f {
   printRec()
}
f { a[f]=(f in a)?a[f] $1:$1 }
/^>/ { f=$1 }
END { printRec() }

results in:
Code:
>l-WR24-1:1 1 71
GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGGGGCGGAGGGCG
>l-ZF385A-2:1 33 105
TGAGCTTCGGGTCACCGCCCCTCCAGAGGCTGAGTACTCAGGACTCGTCAGACACCCAGGGGTGAGATGAGAC
>l-YJC-1:1 1 161
GTCCCGCCCTCGCATGCGCCTGGTGGTCACCGCGGACGACTTTGGTTACTGCCCGCGACGCGATGAGGGTATCGTGGAGGCCTTTCTGGCCGGGGCTGTGACCAGCGTGTCCCTGCTGGTCAACGGTGCGGCCACGGAGAGCGCGGCGGAGCTGGCCCGCA
>l-YJC-1:1 1 165
GTCCCGCCCTCGCATGCGCCTGGTGGTCACCGCGGACGACTTTGGTTACTGCCCGCGACGCGATGAGGGTATCGTGGAGGCCTTTCTGGCCGGGGCTGTGACCAGCGTGTCCCTGCTGGTCAACGGTGCGGCCACGGAGAGCGCGGCGGAGCTGGCCCGCAGGCA


Last edited by vgersh99; 12-09-2019 at 12:51 PM..
This User Gave Thanks to vgersh99 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

print range between two patterns if it contains a pattern within the range

I want to print between the range two patterns if a particular pattern is present in between the two patterns. I am new to Unix. Any help would be greatly appreciated. e.g. Pattern1 Bombay Calcutta Delhi Pattern2 Pattern1 Patna Madras Gwalior Delhi Pattern2 Pattern1... (2 Replies)
Discussion started by: joyan321
2 Replies

2. Shell Programming and Scripting

script to match patterns in 2 different files.

I am new to shell scripting and need some help. I googled, but couldn't find a similar scenario. Basically, I need to rename a datafile. This is the scenario - I have a file, readonly.txt that has 2 columns - file# and name. I have another file,missing_files.txt that has id and name. Both the... (3 Replies)
Discussion started by: mathews
3 Replies

3. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

4. Shell Programming and Scripting

Extract patterns and copy them in different files

Hi All, I have a file which looks like this: Name1;A01 Name2;A01.047 Name3;A01.047.025 Newname1;B01 NewName2;B01.056.32 NewName3;B04.09.43 NewNewName1;C01.03 NewNewName2;C01.034.44As you can see, in the file there is some name and followed by the name is some identifier. These... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies

5. Shell Programming and Scripting

How to extract information from two files with data range

Hi, I want to make a query about extracting data from two files that both have data ranges. the data that i want to extract; when there is matching between file1 column 2 is equal to file2 column2 , and file1 column 3 and column 4 is within the range of file2 columns 3 and 4. I would like rows... (1 Reply)
Discussion started by: houkto
1 Replies

6. Shell Programming and Scripting

Using AWK to match CSV files with duplicate patterns

Dear awk users, I am trying to use awk to match records across two moderately large CSV files. File1 is a pattern file with 173,200 lines, many of which are repeated. The order in which these lines are displayed is important, and I would like to preserve it. File2 is a data file with 456,000... (3 Replies)
Discussion started by: isuewing
3 Replies

7. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

8. Shell Programming and Scripting

Extract multiple occurance of strings between 2 patterns

I need to extract multiple occurance strings between 2 different patterns in given line. For e.g. in below as input ------------------------------------------------------------------------------------- mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)... (8 Replies)
Discussion started by: sameermohite
8 Replies

9. Shell Programming and Scripting

Match strings in 2 different files

Hi, i am trying to match strings from 2 different files based on position like below:- file1 (tab delimited) f07270 lololol fff u12730 gggddd dddkkrr mmm file2 (not tab delimited) %f07270 APSLH bl%alalalalallaadsdsfdfdfdgsgfss %g13450 GDIDFLRIP%ILITEAPPRKgsfgsgsf %d08880... (11 Replies)
Discussion started by: redse171
11 Replies

10. Shell Programming and Scripting

Match to range in files

I am trying to create a script that will use the position in column A ($1) in 48850.txt and search for it in columns B ($2) in gene.txt. Then when it finds a match it copies the text in column A ($1) and places it in column C ($3) of 48850.txt. I have attached the files. Thank you :). The... (2 Replies)
Discussion started by: cmccabe
2 Replies
patterns(4)						     Kernel Interfaces Manual						       patterns(4)

NAME
patterns - Patterns for use with internationalization tools SYNOPSIS
See the Description section. DESCRIPTION
The patterns file contains the patterns that must be matched for the internationalization tools extract, strextract, and strmerge. The pattern file in the following example is the default patterns file located in /usr/lib/nls/patterns. # This is the header to insert at the beginning of the first new # source file $SRCHEAD1 (1) #include <nl_types.h> nl_catd _m_catd; # The header to insert at the beginning of the rest of the new # source files $SRCHEAD2 (2) #include <nl_types.h> extern nl_catd _m_catd; # This is the header to insert at the beginning of the message # catalogues $CATHEAD (3) $ /* $ * X/OPEN message catalogue $ */ $quote " # This is how patterns that are matched will get rewritten. $REWRITE (4) catgets(_m_catd, %s, %n, %t) # Following is a list of the sort of strings we are looking for. # The regular expression syntax is based on regexp(3). $MATCH (5) # Match on strings containing an escaped " "[^\]*\"[^"]*" # Match on general strings "[^"]*" # Now reject some special C constructs. $REJECT (6) # the empty string ""0 # string with just one format descriptor "%." "%.." # string with just line control in "\." # string with just line control and one format descriptor in "%.\." "\.%." # ignore cpp include lines #[ ]*include[ ]*".*" #[ ]*ident[ ]*".*" # reject some common C functions and expressions with quoted # strings [sS][cC][cC][sS][iI][dD][][ ]*=[ ]*".*" open[ ]*([^,]*,[^)]*) creat[ ]*([^,]*,[^)]*) access[ ]*([^,]*,[^)]*) chdir[ ]*([^,]*,[^)]*) chmod[ ]*([^,]*,[^)]*) chown[ ]*([^,]*,[^)]*) # Reject any strings in single line comments /*.**/ # Print a warning for initialised strings. $ERROR initialised strings cannot be replaced (7) char[^=]*=[ ]*"[^"]*" char[^=]*=[ ]*"[^\]*\"[^"]*" char[ ]***[A-Za-z][A-Za- z0-9]*[[^]*][ ]*=[ {]*"[^"]*" char[ ]***[A-Za-z][A-Za-z0-9]*[[^]*][ ]*=[ {]*"[^\]*\"[^"]*" The default patterns file is divided into the following sections: In the $SRCHEAD1 section, the strmerge and extract commands place text in this section at the beginning of the first new source program, which is prefixed by nl_. These commands define the native language file descriptors that point to the message catalog. In the $SRCHEAD2 section, the strmerge and extract commands place text in this section at the beginning of the second and remaining source programs. These commands also define the native language file descriptors that point to the message catalog. $SRCHEAD2 contains the external declaration of the nl file descriptor. In the $CATHEAD section, the strmerge and extract commands place text in this section at the beginning of the message catalog. In the $REWRITE section, you specify how the strmerge and extract commands should replace the extracted strings in the new source program. You can supply three options to the catgets command: This option increments the set number for each source. This option applies only if you are using the strmerge command. For more informa- tion on set numbers, see the catgets(3) reference page. This option increments the message number for each string extracted. This option applies if you are using either the strmerge or extract commands. This option expands the text from the string extracted. The string can be a error message or the default string extracted and printed by the catgets command. For example, if you want an error message to appear when catgets is unable to retrieve the message from the message catalog, you would include the following line: catgets(_m_catd, %s, %n, "BAD STRING") When catgets fails, it returns the message BAD STRING. In the $MATCH section, you specify the patterns in the form of a regular expression that you want the strextract, strmerge, and extract commands to find and match. The regular expression follows the same syntax rules as defined in regexp(3) reference page. In the $REJECT section, you specify the matched strings that you do not want the strmerge and extract commands to replace in your source program. The regular expression follows the same syntax rules as defined in regexp(3) reference page. In the $ERROR section, the strextract, strmerge, and extract commands look for bad matches and notify you with a warning message. The regular expression follows the same syntax rules as defined in the regexp(3) reference page. RELATED INFORMATION
extract(1), strextract(1), strmerge(1), trans(1), regexp(3) Writing Software for the International Market delim off patterns(4)
All times are GMT -4. The time now is 08:30 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy