Extract sequences of bytes from binary for differents blocks


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract sequences of bytes from binary for differents blocks
# 15  
Old 08-15-2013
You mentioned Ubuntu and Windows so a _bash_ script is not of much use here. However Python is platform independent. The problem is that I am not sure whether Ubuntu has a Python install. A default Windows install certainly does not have a Python installation.

From a Ubuntu terminal enter "python" and see what comes up...

If you have got it, post the version that is installed on your Ubuntu setup.

You will have to install the same version onto your Windows machine.

If neither have an install then I suggest you install the latest stable Python release which is version 3.3.2, I think...

Python 3.3.2 Release

I will generate a simple starter piece of code in the next couple of days and assume that Ubuntu has at least version 3.0.x, when I can get enough free time. I am not usually anywhere near a computer during the working day...
# 16  
Old 08-15-2013
Hello wisecracker,

Thanks for the help.

I've installed Python 3.3.2 in Windows machine, but when was trying to test some
simply codes to introduce me myself, even to get the current path I got an syntax error, I'm not sure why.

I've used:
Code:
import os 
os.getcwd()

# 17  
Old 08-16-2013
Code:
#include <stdio.h>
#include <stdlib.h>

#define err(x) {printf("\nError: %s... Exiting...\n", x); exit(1);}

static unsigned char pat1[] = {0x99, 0x11, 0x45, 0x27};
static unsigned char pat2[] = {0x73, 0x49};
static unsigned char pat3[] = {0xff, 0x34};
static unsigned char intrim_pat1[][2] = { {0x03, 0x80}, {0x03, 0x81}, {0x03, 0x83}, {0x03, 0x86}, {0x03, 0x87} };
static unsigned char end[] = {0xff, 0x33};

void print_data(const unsigned char *ptr, int len)
{
        int i;
        for(i=0;i<=len;i++)
                printf("%02x ", ptr[i]);
        printf("\n");
        return;
}

int main(int argc, char **argv)
{
        if(argc < 2)
                err("File name missing");

        char found = 0, more = 0;
        unsigned char buf[32];
        unsigned char *ptr = buf;
        int pos = 0, i;
        int arr_size = (sizeof(intrim_pat1)/2);

        FILE *fp = fopen(argv[1], "rb");
        if(!fp) err("Unable to open the file");

        while(2 == fread(ptr, sizeof(char), 2, fp)){
                pos = ftell(fp);

                //check for end of file pattern
                if(found && !memcmp(buf, end, 2)){
                        found=0; //start over or stop??
                        continue;
                }

                //check for 0xff 0x34
                if(found && !(memcmp(buf, pat3, 2))){
                        more = 1;
                        continue;
                }

                if(found && more){
                        for(i=0; i< arr_size; i++){
                                if(!memcmp(buf, intrim_pat1[i], 2)){
                                        if(15 != fread(ptr+2, sizeof(char), 15, fp))
                                                err("Insufficient data");
                                        print_data(ptr, 16);
                                        more=0;
                                        continue;// start with the next byte
                                }
                        }
                }

                if(buf[0] == 0x32){
                        if(18 != fread(ptr+2, sizeof(char), 18, fp))
                                err("Insufficient data");
                        if(memcmp(buf+4, pat1, 4) && memcmp(buf+12, pat2, 2)){
                                fseek(fp, pos, SEEK_SET);
                        }else{
                                found = 1; //found the starting of the block with data
                                print_data(ptr, 19);
                        }
                        continue;
                }
                pos--;
                if(fseek(fp, pos, SEEK_SET))
                        err("Error in seeking");
        }

        return 0;
}

--ahamed

Last edited by ahamed101; 08-16-2013 at 03:24 AM..
# 18  
Old 08-16-2013
Hello ahamed,

Thank you for your great help!

I've tested your new code and prints the sequence "03 80 ...", but is not printing the
other sequences after the 03 80, I mean the sequences that begin with 81, 83, 86, 87. I would like to extract
all the sequences of 17 bytes after 0x03 that begin with 0x80 or 0x81 or 0x83 or 0x84 or 0x86 or 0x87 if they
are present and print all sequences for each block in the same line if is not too complicated.

In summary:
- The 0x03 is the byte that says the beginning of certain kind of data.
- If 0x03 is present after 0xFF 0x34 then 0x03 could be inmediately followed by any of the sequences that begin with
0x80 or 0x81 or 0x83 or 0x84 or 0x86 or 0x87, because the sequences not always are present all of them.

Sometimes these sequences that begin with 0x80 or 0x81 or 0x83 or 0x84 or 0x86 or 0x87 could be present all, sometimes 3,
2 or only one of those sequences.

So, after the 0xFF 0x34 could happen several cases, some examples below:
0x03 0x80.... 0x86...
or
0x03 0x83.... 0x84... 0x87
or
0x03 0x87
or
0x03 0x81.... 0x87

Maybe you can explain me a little bit the logic of your code and function used, for example "memcmp" in order
to be able to modify it or add it new rules if I need to extract something else or to modify the printing order
or print the bytes without spaces separating with commas different sequences.

Thanks in advance for your time and help again.

Regards
# 19  
Old 08-17-2013
Well, you said 0x80, 0x81 etc will be preceded by 0x03 and I dont see that pattern. Only 0x80 is preceded by 0x03 and hence it is printed.

--ahamed
# 20  
Old 08-17-2013
Hello ahamed,

Sorry, maybe I didn't explaine me very well.

The sequence after FF 34 if present is 0x03 and after 0x03 could be follow by 0x80 or 0x81.... etc. Because of that I've put 0x8Z, where Z=0,1,3,4,6 or 7.

But independently which is the byte that appear after 0x03, the byte 0x03 only will appear once to represent the begin of this sub-block of sequences.

So, if I call the sequences like follow:
Z1=80 B1 B2 ... B16
Z2=81 B1 B2 ... B16
Z3=83 B1 B2 ... B16
.
.
Z6=87 B1 B2 ... B16

Then, if 0x03 is present the "sub-block" could contain:
0x03 Z1 Z2 Z3
or
0x03 Z2 Z3 Z6
or
0x03 Z1 Z2 Z3 Z4 Z5 Z6
or only one sequence like
0x03 Zx

But I would like to extract all the sequences in sub-block independently if
has all sequences Z1 to Z6 or if only have 1 sequence Zx.

I hope is not too complicated and you can help me.

Thanks in advance for all the help.
# 21  
Old 08-18-2013
Can you please show me the bytes you need from the attached image?
In between I see a 0x01 which disrupts the 16 bytes.

--ahamed

---------- Post updated 08-18-13 at 01:39 PM ---------- Previous update was 08-17-13 at 08:39 PM ----------

Ok, I just figured out, the byte after the 0x8Z is actually the size of data.
Code:
#include <stdio.h>
#include <stdlib.h>

#define err(x) {printf("\nError: %s... Exiting...\n", x); exit(1);}

static unsigned char pat1[] = {0x99, 0x11, 0x45, 0x27};
static unsigned char pat2[] = {0x73, 0x49};
static unsigned char pat3[] = {0xff, 0x34};
static unsigned char intrim_pat1[][2] = { {0x03, 0x80}, {0x03, 0x81}, {0x03, 0x83}, {0x03, 0x86}, {0x03, 0x87} };
static unsigned char end[] = {0xff, 0x33};

void print_data(const unsigned char *ptr, int len)
{
        int i;
        for(i=0;i<=len;i++)
                printf("%02x ", ptr[i]);
        printf("\n");
        return;
}

void get_len_and_print(FILE *fp, unsigned char *ptr)
{
        int len = 0;

        //only buf[0] is populated at this stage
        if(1 != fread(ptr+1, sizeof(char), 1, fp))
                err("Insufficient data");
        len = *(ptr+1);
        if(len != fread(ptr+2, sizeof(char), len, fp))
                err("Insufficient data");
        print_data(ptr, len+1);
        return;
}

int main(int argc, char **argv)
{

        if(argc < 2)
                err("File name missing");

        char found = 0, more = 0, again = 0;
        unsigned char buf[32];
        unsigned char *ptr = buf;
        int pos = 0, i, len;
        int arr_size = (sizeof(intrim_pat1)/2);

        FILE *fp = fopen(argv[1], "rb");
        if(!fp) err("Unable to open the file");

        while(2 == fread(ptr, sizeof(char), 2, fp)){
                pos = ftell(fp);

                //check for end of file pattern
                if(found && !memcmp(buf, end, 2)){
                        found=0; //start over or stop??
                        continue;
                }

                //check for 0xff 0x34
                if(found && !(memcmp(buf, pat3, 2))){
                        more = 1;
                        continue;
                }

                if(found && more){
                        for(i=0; i < arr_size; i++){
                                // We got the intrim pattern.
                                if(!memcmp(buf, intrim_pat1[i], 2)){
                                        again=1;
                                        more=0;
                                        break;
                                }
                        }
                        if(again) {
                                // Now read the next 1 byte which is actually the size
                                if(1 != fread(ptr+2, sizeof(char), 1, fp))
                                        err("Insufficient data");
                                len = buf[2];
                                if(len != fread(ptr+3, sizeof(char), len, fp))
                                        err("Insufficient data");
                                print_data(ptr, len+2);

                                // Now check for the remaining pattern
                                // Assuming it will be in ascending order
                                // Break even if we dont find the very next pattern
                                // Also, reset the fp. We will flush all the data in buffer
                                // till now, since we have printed them already
                                pos = ftell(fp);

                                if(1 != fread(ptr, sizeof(char), 1, fp))
                                        err("Insufficient data");

                                while(i < arr_size){
                                        if( buf[0] == intrim_pat1[i][1]){
                                                get_len_and_print(fp, ptr);

                                                if(1 != fread(ptr, sizeof(char), 1, fp))
                                                        err("Insufficient data");
                                                pos = ftell(fp);
                                        }
                                        i++;
                                }
                                again=0;
                        }
                }
                if(buf[0] == 0x32){
                        if(18 != fread(ptr+2, sizeof(char), 18, fp))
                                err("Insufficient data");
                        if(memcmp(buf+4, pat1, 4) && memcmp(buf+12, pat2, 2)){
                                fseek(fp, pos, SEEK_SET);
                        }else{
                                found = 1; //found the starting of the block with data
                                print_data(ptr, 19);
                        }
                        continue;
                }
                pos--;
                if(fseek(fp, pos, SEEK_SET))
                        err("Error in seeking");
        }
        return 0;
}

You may want to test this extensively!
And I think I have given you enough information to proceed further.

--ahamed

Last edited by ahamed101; 08-18-2013 at 05:46 PM..
This User Gave Thanks to ahamed101 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

2. Shell Programming and Scripting

Extract the part of sequences from a file

I have a text file, input.fasta contains some protein sequences. input.fasta is shown below. >P02649 MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY... (8 Replies)
Discussion started by: rahim42
8 Replies

3. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

4. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

5. UNIX for Dummies Questions & Answers

X bytes of 0, Y bytes of random data, Z bytes of 5, T bytes of 1. ??

Hello guys. I really hope someone will help me with this one.. So, I have to write this script who: - creates a file home/student/vmdisk of 10 mb - formats that file to ext3 - mounts that partition to /mnt/partition - creates a file /mnt/partition/data. In this file, there will... (1 Reply)
Discussion started by: razolo13
1 Replies

6. Linux

Why does ext3 allocate 8 blocks for files that are few bytes long

The title is clear: why does ext3 allocate 8 blocks for files that are few bytes long? If I create a file named "test", put a few chars in it, and then I run: stat test I get that "Blocks: 8" I searched in the web and found that ext does that, it allocates 8 blocks even if It doesn't need... (4 Replies)
Discussion started by: Tavo
4 Replies

7. Shell Programming and Scripting

extract blocks of text from a file

Hi, This is part of a large text file I need to separate out. I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and... (7 Replies)
Discussion started by: cajunfries
7 Replies

8. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

9. UNIX for Advanced & Expert Users

Deal with binary sequences

Hello, I have come across the necessity for me to deal with binary sequences and I had a few questions. 1- Does any UNIX scripting language provide any tool or command for converting text data to binary sequences? Example of binary sequence: "0x97 0x93 0x85 0x40 0xd5 0xd6 0xd7" 2- If I want... (2 Replies)
Discussion started by: Indalecio
2 Replies

10. Shell Programming and Scripting

Remove first N bytes and last N bytes from a binary file on AIX.

Hi all, Does anybody know or guide me on how to remove the first N bytes and the last N bytes from a binary file? Is there any AWK or SED or any command that I can use to achieve this? Your help is greatly appreciated!! Best Regards, Naveen. (1 Reply)
Discussion started by: naveendronavall
1 Replies
Login or Register to Ask a Question