Search for sequential pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search for sequential pattern
# 1  
Old 09-22-2016
Search for sequential pattern

input file:

Code:
4
5
1
A
1
2
3
4
s
8

(input file can be many millions of lines long)

I want to search the example input file above, and when I find 4 sequential rows with values of 1,2,3,4 return those values and the two previous ones.
In this case it should return
Code:
1,A,1,2,3,4

I know this can be done on various platforms, but I'd like to use awk in this case. I'm fairly certain I'll end up using a six element array, but y'all will probably figure this out before I do. Thanks in advance, brain too old to figure this stuff out anymore...

Image

---------- Post updated at 07:47 PM ---------- Previous update was at 04:47 PM ----------

I started down the path of using grep to pull out the rows that I need, 2 before the match and 3 after the match. I was going to simply the match to only finding the first entri that i needed, and filter the extra ones out later. After that is was a simple matter of formatting. That is, until the case where we had matching overlaps, like so.

Say I'm looking for rows with 1,2,3,4 - then I was only going to grep on "1", and extract the leading and following rows. Even if I got alot of entries that were not a perfect match, I can easily filter those out. Here is the case that ruined it.

Code:
5
A
1
1
2
3
4

The grep will misbehave because it refuses to grep the value "1" more than once. In this case the "1" relates to the before part of one selection, and the after part of another, and it only reports it once. So unless there is a way of telling grep to not do this, can't use grep....


Moderator's Comments:
Mod Comment
Please wrap all code, files, input & output.errors in CODE tags.
It makes it far easier to read and preserves multiple spaces for indenting or fixed-width data.

Last edited by rbatte1; 09-23-2016 at 08:30 AM.. Reason: Added CODE tags
# 2  
Old 09-22-2016
How about this using awk

Code:
awk '
  {A=B; B=C; C=$0}
  N==5 { print F " found from row " NR-6 ; exit}
  N&&$0==N { F=F","N++; next}
  $0==1 { F=A","B",1";N=2;next}
  {N=x}
' infile

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 09-23-2016
I can sort of follow this. Is is hardcoded to use "1,2,3,4" for the search criteria? Or at least 4 sequential numbers?
I need to have a little flexibility in selecting the 4 values to search for (I used 1,2,3,4 just as an oversimplified example).

I have confirmed that it works great for 1,2,3,4.......

Thanks for the first response!

Last edited by cedenker; 09-23-2016 at 12:35 AM.. Reason: clarify my follow up question
# 4  
Old 09-23-2016
If you are looking for different strings (not "1" thru "4") a slightly different solution is required:

Code:
awk '
  BEGIN{ L=split("one,two,three,four", M, ",") }
  {A=B; B=C; C=$0}
  N==L+1 { print F " found from row " NR-L-2 ; exit}
  N&&$0==M[N] { F=F","M[N++]; next}
  $0==M[1] { F=A","B","M[1];N=2;next}
  {N=x}
' infile

This version now searches for "one", "two", "three" and then "four" and can be easily converted to search for you list of specific strings. The split command is building an array M[] which is used to match each line.
This User Gave Thanks to Chubler_XL For This Post:
# 5  
Old 09-23-2016
initial test works fine. Let me add some of the other things I oversimplified into the script and see if I can break it. Thanks!

---------- Post updated 09-23-16 at 12:16 AM ---------- Previous update was 09-22-16 at 11:04 PM ----------

I should have made this part of the initial requirement, but thought I could add it in myself after the original problem was solved. I can't wrap my head what the script is actually doing, so can't really add to it unfortunately.

The additional requirement is as follows.
Extra column in the input file.
Code:
1  cow
2  bird
3  horse
4  one
5  two
6  three
7  four
8  fff

the additional output would be the value in column 1 for the initial row of the match. In this case the output (looking for one,two,three,four) should be.

Code:
2, bird,horse,one,two,three,four

So I understood enough to read $2 instead of $0, and the script works the same now, just basically ignoring the first of the two input columns. I'm assuming all we need is a 2nd array to store the first column values, updating itself at the same time the 1st array updates. Then when it comes time to print out, just print the first array element of the 1st column.

I should have included this in the initial requirement, sorry about that....


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 09-23-2016 at 06:03 AM.. Reason: Added CODE tags.
# 6  
Old 09-23-2016
Wouldn't it be easier using grep?

For instance (assuming every line consist of exactly one character, as in your example, and that the line terminator is just a newline character), the following command would work:

Code:
grep -zo '....1.2.3.4' your_data.txt


Last edited by rovf; 09-23-2016 at 04:01 AM.. Reason: Removing unnecessary -E switch
# 7  
Old 09-23-2016
Hello cedenker,

Let's say our Input_file is as follows, where I am considering that strings one,twoetc could come at any order.
Code:
cat Input_file
1 cow
2 bird
3 horse
4 one
5 two
6 three
7 four
8 fff
9 one
10 two
11 one
12 two
13 one
14 two
15 three
16 four
11 one
12 two
13 three
14 one

Then following will be the code.
Code:
awk 'BEGIN{num=split("one,two,three,four", A,",");for(i=1;i<=num;i++){B[A[i]]=i}} {;while(($2 in B) && ++e == B[$2]){A[FNR]=$2;W=W?W OFS $2:$2;getline;};A[FNR]=$2;if(e>=4){print FNR-6,A[FNR-6],A[FNR-5],W};e=W=""}' OFS=,   Input_file

Output will be as follows.
Code:
2,bird,horse,one,two,three,four
11,one,two,one,two,three,four

EDIT: Adding a non-one liner form of solution too now.
Code:
awk 'BEGIN{
                num=split("one,two,three,four", A,",");
                for(i=1;i<=num;i++){
                                        B[A[i]]=i
                                   }
          }
          {;
                while(($2 in B) && ++e == B[$2]){
                                                        A[FNR]=$2;
                                                        W=W?W OFS $2:$2;
                                                        getline;
                                                };
                A[FNR]=$2;
                if(e>=4){
                                print FNR-6,A[FNR-6],A[FNR-5],W
                        };
                e=W=""
          }
    ' OFS=,   Input_file

So it is taking care of rule like strings one,two,three,fourshould come consecutive and if they are less than their count 4 it shouldn't print those too. Please do let us know how it goes and if this helps you.
EDIT2: Improving above code by removing array A inside whileloop.
Code:
awk 'BEGIN{num=split("one,two,three,four", A,",");for(i=1;i<=num;i++){B[A[i]]=i}} {A[++q]=$2;while(($2 in B) && ++e == B[$2]){;W=W?W OFS $2:$2;getline;};if(e>=4){print FNR-6,A[q],A[q-1],W};e=W=""}' OFS=,   Input_file
####OR a non-one liner form of solution too as follows.
awk 'BEGIN{
                num=split("one,two,three,four", A,",");
                for(i=1;i<=num;i++){
                                        B[A[i]]=i
                                   }
          }
          {
                A[++q]=$2;
                while(($2 in B) && ++e == B[$2]){;
                                                        W=W?W OFS $2:$2;
                                                        getline;
                                                };
                if(e>=4){
                                print FNR-6,A[q],A[q-1],W
                        };
                e=W=""
          }
    ' OFS=,   Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 09-23-2016 at 06:30 AM.. Reason: Adding a non-one liner form of solution too now.
This User Gave Thanks to RavinderSingh13 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grep/awk using a begin search pattern and end search pattern

I have this fileA TEST FILE ABC this file contains ABC; TEST FILE DGHT this file contains DGHT; TEST FILE 123 this file contains ABC, this file contains DEF, this file contains XYZ, this file contains KLM ; I want to have a fileZ that has only (begin search pattern for will be... (2 Replies)
Discussion started by: vbabz
2 Replies

2. Shell Programming and Scripting

Extracting sequential pattern

Hi, Can someone advise/help me on how to write a script to extract sequential lines. I was able to find and get a script working to create permutations of the inputs, but that not what I want/need. awk 'function perm(p,s, i) { for(i=1;i<=n;i++) if(p==1) ... (4 Replies)
Discussion started by: fuzzi
4 Replies

3. Shell Programming and Scripting

How to use sed to search a particular pattern in a file backward after a pattern is matched.?

Hi, I have two files file1.txt and file2.txt. Please see the attachments. In file2.txt (which actually is a diff output between two versions of file1.txt.), I extract the pattern corresponding to 1172c1172. Now ,In file1.txt I have to search for this pattern 1172c1172 and if found, I have to... (9 Replies)
Discussion started by: saurabh kumar
9 Replies

4. Shell Programming and Scripting

Search for a pattern in a String file and count the occurance of each pattern

I am trying to search a file for a patterns ERR- in a file and return a count for each of the error reported Input file is a free flowing file without any format example of output ERR-00001=5 .... ERR-01010=10 ..... ERR-99999=10 (4 Replies)
Discussion started by: swayam123
4 Replies

5. Shell Programming and Scripting

Need one liner to search pattern and print everything expect 6 lines from where pattern match made

i need to search for a pattern from a big file and print everything expect the next 6 lines from where the pattern match was made. (8 Replies)
Discussion started by: chidori
8 Replies

6. Programming

Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX

Writing a Tool to simulate non-sequential disk I/O (simulate db file sequential read) in C POSIX I have over the years come across the same issue a couple of times, and it normally is that the read speed on SAN is absolutely atrocious when doing non-sequential I/O to the disks. Problem being of... (7 Replies)
Discussion started by: vrghost
7 Replies

7. Shell Programming and Scripting

Print a pattern between the xml tags based on a search pattern

Hi all, I am trying to extract the values ( text between the xml tags) based on the Order Number. here is the sample input <?xml version="1.0" encoding="UTF-8"?> <NJCustomer> <Header> <MessageIdentifier>Y504173382</MessageIdentifier> ... (13 Replies)
Discussion started by: oky
13 Replies

8. Shell Programming and Scripting

Append specific lines to a previous line based on sequential search criteria

I'll try explain this as best I can. Let me know if it is not clear. I have large text files that contain data as such: 143593502 09-08-20 09:02:13 xxxxxxxxxxx xxxxxxxxxxx 09-08-20 09:02:11 N line 1 test line 2 test line 3 test 143593503 09-08-20 09:02:13... (3 Replies)
Discussion started by: jesse
3 Replies

9. Shell Programming and Scripting

search a pattern and if pattern found insert new pattern at the begining

I am trying to do some thing like this .. In a file , if pattern found insert new pattern at the begining of the line containing the pattern. example: in a file I have this. gtrow0unit1/gctunit_crrownorth_stage5_outnet_feedthru_pin if i find feedthru_pin want to insert !! at the... (7 Replies)
Discussion started by: pitagi
7 Replies

10. Programming

Reading special characters while converting sequential file to line sequential

We have to convert a sequential file to a 80 char line sequential file (HP UX platform).The sequential file contains special characters. which after conversion of the file to line sequential are getting coverted into "new line" or "tab" and file is getting distorted. Is there any way to read these... (2 Replies)
Discussion started by: Rajeshsu
2 Replies
Login or Register to Ask a Question