Extract sequences of bytes from binary for differents blocks


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract sequences of bytes from binary for differents blocks
# 29  
Old 08-21-2013
hello ahamed,

Thanks really for your help!

Now is displayed much better, all data related with one block in a single line.

In order to print without spaces the bytes I've changed the
Code:
printf("%02x ", ptr[i]);

to
Code:
printf("%02x", ptr[i]);

But I'm trying to separate each sequence by "|" in this way:
For the first bytes sequences extracted
Code:
1 byte|3 bytes|8 bytes|8 bytes

Example:

32|000001|991145278934550f|73494549232fffff

And for the other sequences that begin with 0x80, 0x81 etc like this:
Code:
1 byte|1 byte|1 byte|1 byte|4 bytes|8 bytes|1 byte

Example:

80 0f 01 02 00000030 7349526905ffffff 00

for 0x83 ... would be one byte more:
Code:
|1 byte|1 byte|1 byte|1 byte|4 bytes|8 bytes|1 byte|1 byte
Example:
83|10|01|0c|0000009f|7349526905ffffff|01|01

I hope you can help me with this printing.

Thanks so much
# 30  
Old 08-21-2013
Code:
#include <stdio.h>
#include <stdlib.h>

#define err(x) {printf("\nError: %s... Exiting...\n", x); exit(1);}

static unsigned char start = 0x32;
static unsigned char pat1[] = {0x99, 0x11, 0x45, 0x27};
static unsigned char pat2[] = {0x73, 0x49};
static unsigned char pat3[] = {0xff, 0x34};
static unsigned char intrim_pat1[][2] = { {0x03, 0x80}, {0x03, 0x81}, {0x03, 0x83}, {0x03, 0x86}, {0x03, 0x87} };
static unsigned char end[] = {0xff, 0x33};

typedef enum {
        MAIN_BLOCK,
        SUB_BLOCK
} block;

void print_bytes(const unsigned char *ptr, int len)
{
        int i;
        for(i=0;i<len;i++)
                printf("%02x", ptr[i]);
        printf("|");
        return;

}
void print_data(const unsigned char *ptr, int len, block bl)
{
        int i;
        if(MAIN_BLOCK == bl){
                print_bytes(ptr, 1);
                print_bytes(ptr+1, 3);
                print_bytes(ptr+4, 8);
                print_bytes(ptr+12, 8);
        } else {
                print_bytes(ptr, 1);
                print_bytes(ptr+1, 1);
                print_bytes(ptr+2, 1);
                print_bytes(ptr+3, 1);
                print_bytes(ptr+4, 4);
                print_bytes(ptr+8, 8);
                print_bytes(ptr+16, 1);
                if(*ptr == intrim_pat1[2][1]){
                        print_bytes(ptr+17, 1);
                }
        }
        return;
}

void get_len_and_print(FILE *fp, unsigned char *ptr)
{
        int len = 0;

        //only buf[0] is populated at this stage
        if(1 != fread(ptr+1, sizeof(char), 1, fp))
                err("Insufficient data");
        len = *(ptr+1);
        if(len != fread(ptr+2, sizeof(char), len, fp))
                err("Insufficient data");
        print_data(ptr, len+1, SUB_BLOCK);
        return;
}

int main(int argc, char **argv)
{

        if(argc < 2)
                err("File name missing");

        char found = 0, more = 0, again = 0;
        unsigned char buf[32];
        unsigned char *ptr = buf;
        int pos = 0, i, len;
        int arr_size = (sizeof(intrim_pat1)/2);

        FILE *fp = fopen(argv[1], "rb");
        if(!fp) err("Unable to open the file");

        while(2 == fread(ptr, sizeof(char), 2, fp)){
                pos = ftell(fp);

                //check for end of file pattern
                if(found && !memcmp(buf, end, 2)){
                        found=0; //start over or stop??
                        continue;
                }

                //check for 0xff 0x34
                if(found && !(memcmp(buf, pat3, 2))){
                        more = 1;
                        continue;
                }

                if(found && more){
                        for(i=0; i < arr_size; i++){
                                // We got the intrim pattern.
                                if(!memcmp(buf, intrim_pat1[i], 2)){
                                        again=1;
                                        more=0;
                                        break;
                                }
                        }
                        if(again) {
                                // Now read the next 1 byte which is actually the size
                                if(1 != fread(ptr+2, sizeof(char), 1, fp))
                                        err("Insufficient data");
                                len = buf[2];
                                if(len != fread(ptr+3, sizeof(char), len, fp))
                                        err("Insufficient data");
                                print_data(ptr+1, len+2, SUB_BLOCK);

                                // Now check for the remaining pattern
                                // Assuming it will be in ascending order
                                // Break even if we dont find the very next pattern
                                // Also, reset the fp. We will flush all the data in buffer
                                // till now, since we have printed them already
                                pos = ftell(fp);

                                if(1 != fread(ptr, sizeof(char), 1, fp))
                                        err("Insufficient data");

                                while(i < arr_size){
                                        if( buf[0] == intrim_pat1[i][1]){
                                                get_len_and_print(fp, ptr);

                                                if(1 != fread(ptr, sizeof(char), 1, fp))
                                                        err("Insufficient data");
                                                pos = ftell(fp);
                                        }
                                        i++;
                                }
                                again=0;
                        }
                }

                if(buf[0] == start){
                        if(18 != fread(ptr+2, sizeof(char), 18, fp))
                                err("Insufficient data");
                        if(memcmp(buf+4, pat1, 4) && memcmp(buf+12, pat2, 2)){
                                fseek(fp, pos, SEEK_SET);
                        }else{
                                found = 1; //found the starting of the block with data
                                printf("\n");
                                print_data(ptr, 19, MAIN_BLOCK);
                        }
                        continue;
                }
                pos--;
                if(fseek(fp, pos, SEEK_SET))
                        err("Error in seeking");
        }

        return 0;
}

--ahamed

Last edited by ahamed101; 08-21-2013 at 04:40 AM.. Reason: Correction in print_data
This User Gave Thanks to ahamed101 For This Post:
# 31  
Old 08-21-2013
Hi Ophiuchus...

I have no idea what you are doing.
Did you save the files with the filenames I used inside the C:\Windows\Temp directory?

The attached image is what you should get from Windows; I am on Vista ATM...

Do you know ANY Python at all?
Extract sequences of bytes from binary for differents blocks-binjpg
# 32  
Old 08-22-2013
Hello ahamed,

Just great! It works very nice. I've been adding some functions to print
in decimal copying the logic from your print_bytes function.

After trying and trying and investigating I've learned some about C and I've been able to modify the code succesfully. I'm still pending with some other sequence
to extract, but I'm studying deeply your code ad I hope to be able to replicate how you do if I get issues I hope you can give a hand one more time.Smilie

Hello wisecracker,

Thanks for your help and time. I have installed python 3.3.2 and I have the files in same folder in windows, I'm not sure why I don't receive the same output. I'll keep trying because I'm interested to learn.

Thanks again

Regards

---------- Post updated at 02:38 AM ---------- Previous update was at 02:30 AM ----------

Hello again ahamed,

One question:

The script works fine with small files, but I tested with a binary of 20MB and I received the error:
Code:
$ ./extract5 input > output
Segmentation fault (`core' generated)

And the file "extract5.exe.stackdump" was created containing this:
Code:
Exception: STATUS_ACCESS_VIOLATION at eip=004012CA
eax=00790001 ebx=0000003F ecx=80000038 edx=005B0102 esi=0000002E edi=00000061
ebp=0028ABB8 esp=0028ABA0 program=C:\Scripts\extract5.exe, pid 11184, thread main
cs=0023 ds=002B es=002B fs=0053 gs=002B ss=002B
Stack trace:
Frame     Function  Args
0028ABB8  004012CA (00790001, 00000001, 000DEC59, 00000000)
0028ABD8  00401494 (00790001, 005B0102, 00000001, 80010288)
0028AC38  00401837 (00000002, 0028AC5C, 80010100, 61007F58)
0028ACF8  61007FB5 (00000000, 0028CD84, 61007120, 00000000)
End of stack trace

What did this happen? Is possible to fixed in order to process a file as bigger than 2GB?

Thanks in advance.
# 33  
Old 08-22-2013
I tested with 1.8 GB file in Ubuntu and there was no problem.

A quick search shows this error you are getting has something to do with Cygwin?

--ahamed
# 34  
Old 08-22-2013
The code below could cause problems if len is ever bigger than 30 (overflow the buf[] array):

Code:
if(len != fread(ptr+3, sizeof(char), len, fp))

len should be range tested.
# 35  
Old 08-22-2013
Hello ahamed,

I've tested the program using as input the attached file of 20MB on cygwin and compiled on windows too ran from DOS.

In cygwin I got the segmentation error and compiled in Windows the script process data but stops suddenly. It processes 2550 lines
and the last line repeats the values of subblocks many times. When stops the maximun file size of the ouput is 231Kb.

One more question, is possible to print the values in the moment that is analyzed by the program? because it seems is storing in buffer the
data that will be printed and prints at the end and for a big file it could be problems.

I hope you can see why this happens.

PD: For the attached bin20MB, the line:
Code:
static unsigned char pat1[] = {0x99, 0x11, 0x45, 0x27};

Should be modified to:
Code:
static unsigned char pat1[] = {0x99, 0x11, 0x45};

Thanks so much again!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

2. Shell Programming and Scripting

Extract the part of sequences from a file

I have a text file, input.fasta contains some protein sequences. input.fasta is shown below. >P02649 MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY... (8 Replies)
Discussion started by: rahim42
8 Replies

3. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

4. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

5. UNIX for Dummies Questions & Answers

X bytes of 0, Y bytes of random data, Z bytes of 5, T bytes of 1. ??

Hello guys. I really hope someone will help me with this one.. So, I have to write this script who: - creates a file home/student/vmdisk of 10 mb - formats that file to ext3 - mounts that partition to /mnt/partition - creates a file /mnt/partition/data. In this file, there will... (1 Reply)
Discussion started by: razolo13
1 Replies

6. Linux

Why does ext3 allocate 8 blocks for files that are few bytes long

The title is clear: why does ext3 allocate 8 blocks for files that are few bytes long? If I create a file named "test", put a few chars in it, and then I run: stat test I get that "Blocks: 8" I searched in the web and found that ext does that, it allocates 8 blocks even if It doesn't need... (4 Replies)
Discussion started by: Tavo
4 Replies

7. Shell Programming and Scripting

extract blocks of text from a file

Hi, This is part of a large text file I need to separate out. I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and... (7 Replies)
Discussion started by: cajunfries
7 Replies

8. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

9. UNIX for Advanced & Expert Users

Deal with binary sequences

Hello, I have come across the necessity for me to deal with binary sequences and I had a few questions. 1- Does any UNIX scripting language provide any tool or command for converting text data to binary sequences? Example of binary sequence: "0x97 0x93 0x85 0x40 0xd5 0xd6 0xd7" 2- If I want... (2 Replies)
Discussion started by: Indalecio
2 Replies

10. Shell Programming and Scripting

Remove first N bytes and last N bytes from a binary file on AIX.

Hi all, Does anybody know or guide me on how to remove the first N bytes and the last N bytes from a binary file? Is there any AWK or SED or any command that I can use to achieve this? Your help is greatly appreciated!! Best Regards, Naveen. (1 Reply)
Discussion started by: naveendronavall
1 Replies
Login or Register to Ask a Question