Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Match patterns between two files and extract certain range of strings Post 303041966 by RudiC on Monday 9th of December 2019 01:59:52 PM
Old 12-09-2019
Try also
Code:
awk '
NR==FNR         {PAT[$1,$2,$3]
                 next
                }
                {IX  = $1
                 L1  = length ($1) + 1
                 $1 = $1 "|"
                 $0 = $0
                 for (p in PAT) {split (p, T)
                                 if (IX == T[1]) print RS p ORS substr ($0, T[2]+L1, T[3]-T[2]+1)
                                }
                }
' SUBSEP="\t" inputfile2.txt   RS=">"  OFS="" inputfile1.fa

>l-WR24-1:1    1    71
GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGGGGCGGAGGGCG
>l-ZF385A-2:1    33    105
TGAGCTTCGGGTCACCGCCCCTCCAGAGGCTGAGTACTCAGGACTCGTCAGACACCCAGGGGTGAGATGAGAC
>l-YJC-1:1    1    161
GTCCCGCCCTCGCATGCGCCTGGTGGTCACCGCGGACGACTTTGGTTACTGCCCGCGACGCGATGAGGGTATCGTGGAGGCCTTTCTGGCCGGGGCTGTGACCAGCGTGTCCCTGCTGGTCAACGGTGCGGCCACGGAGAGCGCGGCGGAGCTGGCCCGCA
 >l-YJC-1:1    1    165
GTCCCGCCCTCGCATGCGCCTGGTGGTCACCGCGGACGACTTTGGTTACTGCCCGCGACGCGATGAGGGTATCGTGGAGGCCTTTCTGGCCGGGGCTGTGACCAGCGTGTCCCTGCTGGTCAACGGTGCGGCCACGGAGAGCGCGGCGGAGCTGGCCCGCAGGCA

If you really really need the output lines 60 chars in length, use
Code:
                 if (IX == T[1])    {print RS p
                                     TMP = substr ($0, T[2]+L1, T[3]-T[2]+1)
                                     PTR = 1
                                     while (PTR < length (TMP))    {print substr (TMP, PTR, 60)
                                                                    PTR += 60
                                                                   }
                                    }


Last edited by RudiC; 12-09-2019 at 03:20 PM..
These 3 Users Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

print range between two patterns if it contains a pattern within the range

I want to print between the range two patterns if a particular pattern is present in between the two patterns. I am new to Unix. Any help would be greatly appreciated. e.g. Pattern1 Bombay Calcutta Delhi Pattern2 Pattern1 Patna Madras Gwalior Delhi Pattern2 Pattern1... (2 Replies)
Discussion started by: joyan321
2 Replies

2. Shell Programming and Scripting

script to match patterns in 2 different files.

I am new to shell scripting and need some help. I googled, but couldn't find a similar scenario. Basically, I need to rename a datafile. This is the scenario - I have a file, readonly.txt that has 2 columns - file# and name. I have another file,missing_files.txt that has id and name. Both the... (3 Replies)
Discussion started by: mathews
3 Replies

3. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

4. Shell Programming and Scripting

Extract patterns and copy them in different files

Hi All, I have a file which looks like this: Name1;A01 Name2;A01.047 Name3;A01.047.025 Newname1;B01 NewName2;B01.056.32 NewName3;B04.09.43 NewNewName1;C01.03 NewNewName2;C01.034.44As you can see, in the file there is some name and followed by the name is some identifier. These... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies

5. Shell Programming and Scripting

How to extract information from two files with data range

Hi, I want to make a query about extracting data from two files that both have data ranges. the data that i want to extract; when there is matching between file1 column 2 is equal to file2 column2 , and file1 column 3 and column 4 is within the range of file2 columns 3 and 4. I would like rows... (1 Reply)
Discussion started by: houkto
1 Replies

6. Shell Programming and Scripting

Using AWK to match CSV files with duplicate patterns

Dear awk users, I am trying to use awk to match records across two moderately large CSV files. File1 is a pattern file with 173,200 lines, many of which are repeated. The order in which these lines are displayed is important, and I would like to preserve it. File2 is a data file with 456,000... (3 Replies)
Discussion started by: isuewing
3 Replies

7. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

8. Shell Programming and Scripting

Extract multiple occurance of strings between 2 patterns

I need to extract multiple occurance strings between 2 different patterns in given line. For e.g. in below as input ------------------------------------------------------------------------------------- mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)... (8 Replies)
Discussion started by: sameermohite
8 Replies

9. Shell Programming and Scripting

Match strings in 2 different files

Hi, i am trying to match strings from 2 different files based on position like below:- file1 (tab delimited) f07270 lololol fff u12730 gggddd dddkkrr mmm file2 (not tab delimited) %f07270 APSLH bl%alalalalallaadsdsfdfdfdgsgfss %g13450 GDIDFLRIP%ILITEAPPRKgsfgsgsf %d08880... (11 Replies)
Discussion started by: redse171
11 Replies

10. Shell Programming and Scripting

Match to range in files

I am trying to create a script that will use the position in column A ($1) in 48850.txt and search for it in columns B ($2) in gene.txt. Then when it finds a match it copies the text in column A ($1) and places it in column C ($3) of 48850.txt. I have attached the files. Thank you :). The... (2 Replies)
Discussion started by: cmccabe
2 Replies
strmerge(1int)															    strmerge(1int)

Name
       strmerge - batch string replacement

Syntax
       strmerge [ -m prefix ] [ -p patternfile ] [ -s string ] source-program...

Description
       The  command  reads  the strings specified in the message file produced by and replaces those strings with calls to the message file in the
       source program to create a new source program. The new version of source program has the same name as the input source  program,  with  the
       prefix nl_. For example, if the input source program is named the output source program is named You use this command to replace hard-coded
       messages (text strings identified by the command) with calls to the function and to create a source message catalog file. The  source  mes-
       sage  catalog  contains the text for each message extracted from your input source program. The command names the file by appending .msf to
       the name of the input source program. For example, the source message catalog for the program is named You can use the source message cata-
       log as input to the command.

       At  run	time, the program reads the message text from the message catalog.  By storing messages in a message catalog, instead of your pro-
       gram, you allow the text of messages to be translated to a new language or modified without the source program being changed.

       In the source-program argument, you name one or more source programs for which you want strings replaced. The command does not replace mes-
       sages  for source programs included using the directive. Therefore, you might want a source program and all the source programs it includes
       on a single command line.

       You can create a patterns file (as specified by patternfile ) to control how the command replaces text. The patterns file is  divided  into
       several	sections, each of which is identified by a keyword. The keyword must start at the beginning of a new line, and its first character
       must be a dollar sign ($).  Following the identifier, you specify a number of patterns. Each pattern begins on a new line and  follows  the
       regular expression syntax you use in the editor. For more information on the patterns file, see the reference page.

Options
       -m   Add  prefix  to  message  numbers in the output source program and source message catalog.	You can use this prefix as a mnemonic. You
	    must process source message catalogs that contain number prefixes using the option.  Message numbers will be in the form:
	    <prefix><msg_num>

	    Set numbers will be in the form:
	    S_<prefix><set_num>

	    If you process your input source program with this option, the resulting source program and source message catalog may not	be  porta-
	    ble.  For more information, see the Guide to Developing International Software.

       -p   Use  patternfile  to  match strings in the input source program.  By default, the command searches for the pattern file in the current
	    directory, your home directory and finally

	    If you omit the option, the command uses a default patterns file that is stored in

       -s   Write string at the top of the source message catalog. If you omit the option, uses the string specified in the section  of  the  pat-
	    terns file.

Restrictions
       You can specify only one rewrite string for all classes of pattern matches.

       The command does not verify if the message text file matches the source file being rewritten.

       The command does not replace strings to files included with directive. You must run the command on these files separately.

Examples
       The following produces a message file for a program called
       % strextract -p c_patterns prog.c prog2.c
       % vi prog.msg
       % strmerge -p c_patterns prog.c prog2.c
       % gencat prog.cat prog.msf
       % vi nl_prog.c
       % vi nl_prog2.c
       % cc nl_prog.c nl_prog2.c -li

       In this example, the command uses the file to determine which strings to match. The input source programs are named and

       If  you	need  to  remove  any  of the messages or extract one of the created strings, edit the resulting message file, Under no conditions
       should you add to this file. Doing so could result in unpredictable behavior.

       You issue the command to replace the extracted strings with calls to the message catalog.  In response to this command, creates the  source
       message catalogs, and and the output source programs, and

       Before compiling the source programs, you must edit and to include the appropriate and function calls.

       The command creates a message catalog and the command creates an executable program.

See Also
       intro(3int), extract(1int), gencat(1int), strextract(1int), trans(1int), regex(3), catopen(3int), catgets(3int), patterns(5int)
       Guide to Developing International Software

																    strmerge(1int)
All times are GMT -4. The time now is 02:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy