Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Identifiyng pattern within a block Post 302995747 by Xterra on Tuesday 11th of April 2017 02:56:28 PM
Old 04-11-2017
Identifiyng pattern within a block

I have the following file with @M at the beginning of the line as a RS:
Code:
 @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86
 GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT
 +
 GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8F
 @M04961:22:000000000-B5VGJ:1:1101:14258:7136 1:N:0:86
 GGCATGAAAACATACAACAGCGGCTTTAACCGGACGCTCGACGCCATTAATAATGTTTTCCGTAAATTCAGCGCCTTCCATGATGAGACAGGCCGTTTGA
 +
 CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDEGGEGEGEGGG
 @M04961:22:000000000-B5VGJ:1:1101:15671:7305 1:N:0:86
 GGCATGAAAACATACAAAGTAAGGGGCCGAAGCCCCTGCAATTAAAATTGTTGACCACCTACATACCAAAGACGAGCGCCTTTACGCTTGCCTTTAGTAC
 +
 CCCC@CCFFGFGEGGGGGFGGGGGGGGFGGGGGGEFGGGGGGGGGCGGGGGGGGCFFG@GFFGGGGGCCGCGFGGGGGGGGGGGFFBEGG:CFF9>CGEG
 @M04961:22:000000000-B5VGJ:1:1101:10817:7690 1:N:0:86
 ACGAGCATCATCTTGATTAAGCTCATTAGGGTTAGCCTCGGTACGGTCAGGCATCCACGGCGCTTTAAAATAGTTGTTATAGATATTCAAATAACCCTGA
 +
 CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEEFFFGGGGGG
 @M04961:22:000000000-B5VGJ:1:1101:10091:7763 1:N:0:86
 GAGCACATTGTAGCATTGTGCCAATTCATCCATTAACTTCTCAGTAACAGATACAAACTCATCACGAACGTCAGAAGCAGCCTTATGGCCGTCAACATAC
 +
 :=@FGEFFFGGGGGGGFBB@BEFGG?F,EFCCF@FGGGGGGECFGFG9,><3>FC@DFFGG9:383@FC9,>;,>78FC=FCDECFFDGFFCFFGGC?FF
 @M04961:22:000000000-B5VGJ:1:1101:14783:7784 1:N:0:86
 TCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCT
 +
 CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGCGGGGFGG
 @M04961:22:000000000-B5VGJ:1:1101:26069:7790 1:N:0:86
 CAGAACGTGAAAAAGCGTCCTGCGTGTAGCGAACTGCGATGGGCATACTGTAACCATAAGGCCACGTATTTTGCAAGCTGGCATGAAAACATACAT
 +
 CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

And I am using the following script to extract sequences with a specific string(GGCATGAAAACATACA):
Code:
 awk -vRS="@M" '/GGCATGAAAACATACA/ { print "@M"$0 }' infile

The problem I have is that the string should be at the beginning of the second line. Thus, the desire output file should include only three records:
Code:
 @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86
 GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT
 +
 GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8F
 @M04961:22:000000000-B5VGJ:1:1101:14258:7136 1:N:0:86
 GGCATGAAAACATACAACAGCGGCTTTAACCGGACGCTCGACGCCATTAATAATGTTTTCCGTAAATTCAGCGCCTTCCATGATGAGACAGGCCGTTTGA
 +
 CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDEGGEGEGEGGG
 @M04961:22:000000000-B5VGJ:1:1101:15671:7305 1:N:0:86
 GGCATGAAAACATACAAAGTAAGGGGCCGAAGCCCCTGCAATTAAAATTGTTGACCACCTACATACCAAAGACGAGCGCCTTTACGCTTGCCTTTAGTAC
 +
 CCCC@CCFFGFGEGGGGGFGGGGGGGGFGGGGGGEFGGGGGGGGGCGGGGGGGGCFFG@GFFGGGGGCCGCGFGGGGGGGGGGGFFBEGG:CFF9>CGEG

My script, however, is outputting an extra record containing the string somewhere in the middle of the second line and a blank line between each record:
Code:
 @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86
 GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT
 +
 GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8F
  
 @M04961:22:000000000-B5VGJ:1:1101:14258:7136 1:N:0:86
 GGCATGAAAACATACAACAGCGGCTTTAACCGGACGCTCGACGCCATTAATAATGTTTTCCGTAAATTCAGCGCCTTCCATGATGAGACAGGCCGTTTGA
 +
 CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDEGGEGEGEGGG
  
 @M04961:22:000000000-B5VGJ:1:1101:15671:7305 1:N:0:86
 GGCATGAAAACATACAAAGTAAGGGGCCGAAGCCCCTGCAATTAAAATTGTTGACCACCTACATACCAAAGACGAGCGCCTTTACGCTTGCCTTTAGTAC
 +
 CCCC@CCFFGFGEGGGGGFGGGGGGGGFGGGGGGEFGGGGGGGGGCGGGGGGGGCFFG@GFFGGGGGCCGCGFGGGGGGGGGGGFFBEGG:CFF9>CGEG
  
 @M04961:22:000000000-B5VGJ:1:1101:26069:7790 1:N:0:86
 CAGAACGTGAAAAAGCGTCCTGCGTGTAGCGAACTGCGATGGGCATACTGTAACCATAAGGCCACGTATTTTGCAAGCTGGCATGAAAACATACAT
 +
 CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

I tried adding ^ to the string(/^GGCATGAAAACATACA/), but that obviously does not work.
Any help will be greatly appreciated!
PS. Ideally I would like to use | sed '/^$/d' to eliminate the blank lines if at all possible

Last edited by Xterra; 04-11-2017 at 04:06 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete a block of text delimited by blank lines when pattern is found

I have a file which contains blocks of text - each block is a multi-lines text delimited by blank lines eg. <blank line> several lines of text ... pattern found on this line several more lines of text ... <blank line> How do you delete the block of text (including the blank lines) when... (17 Replies)
Discussion started by: gleu
17 Replies

2. Shell Programming and Scripting

Print block of lines matching a pattern

Hi :), I am using the script to search "MYPATTERN" in MYFILE and print that block of lines containing the pattern starting with HEADER upto FOOTER. But my problem is that at some occurrence my footer is different e.g. ";". How to modify the script so that MYPATTERN between HEADER and different... (1 Reply)
Discussion started by: vanand420
1 Replies

3. Shell Programming and Scripting

Need help in sed command (adding a blank line btw each block generated by pattern)

Hello friends, I have a C source code containing sql statements. I use the following sed command to print all the sql blocks in the source code.... sed -n "/exec sql/,/;/p" Sample.cpp The above sed command will print the sql blocks based on the pattern "exec sql" & ";"... (2 Replies)
Discussion started by: frozensmilz
2 Replies

4. Shell Programming and Scripting

delete block of lines when pattern does not match

I have this input file that I need to remove lines which represents more than 30 days of processing. Input file: On 11/17/2009 at 12:30:00, Program started processing...argc=7 Total number of bytes in file being processed is 390 Message buffer of length=390 was allocated successfully... (1 Reply)
Discussion started by: udelalv
1 Replies

5. Shell Programming and Scripting

[Awk] Extract block of with a particular pattern

Hi, I have some CVS log files, which are divided into blocks. Each block has many fields of information and I want to extract those blocks with a pattern. Here is the sample input. RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/DebugPlugin.java,v head: 1.174... (7 Replies)
Discussion started by: sandeepk1611
7 Replies

6. Shell Programming and Scripting

Using sed to pattern match within a particular multiline block and take action

Hi all, This is my first post, so please go easy if I broke some rules. Not accustomed to posting in forums... :) I'm looking for help on pattern matching within a multiline block and looking to highlight blocks/block-ids that do NOT contain a particular pattern. For example an input file... (5 Replies)
Discussion started by: tirodad
5 Replies

7. Shell Programming and Scripting

How to grab a block of data in a file with repeating pattern?

I need to send email to receipient in each block of data in a file which has the sender address under TO and just send that block of data where it ends as COMPANY. I tried to work this out by getting line numbers of the string HELLO but unable to grab the next block of data to send the next... (5 Replies)
Discussion started by: loggedout
5 Replies

8. Shell Programming and Scripting

sed -- Find pattern -- print remainder -- plus lines up to pattern -- Minus pattern

The intended result should be : PDF converters 'empty line' gpdftext and pdftotext?xml version="1.0"?> xml:space="preserve"><note-content version="0.1" xmlns:/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size">PDF converters gpdftext and pdftotext</note-content>... (9 Replies)
Discussion started by: Klasform
9 Replies

9. UNIX for Beginners Questions & Answers

Search a string inside a pattern matched block of a file

How to grep for searching a string within a begin and end pattern of a file. Sent from my Redmi 3S using Tapatalk (8 Replies)
Discussion started by: Baishali
8 Replies

10. Shell Programming and Scripting

Find specific pattern and change some of block values using awk

Hi, Could you please help me finding a way to replace a specific value in a text block when matching a key pattern ? I got the keys and the values from a command similar to: echo -e "key01 Nvalue01-1 Nvalue01-2 Nvalue01-3\nkey02 Nvalue02-1 Nvalue02-2 Nvalue02-3 \nkey03 Nvalue03-1... (2 Replies)
Discussion started by: alex2005
2 Replies
All times are GMT -4. The time now is 03:30 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy