Extract sequences of bytes from binary for differents blocks


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract sequences of bytes from binary for differents blocks
# 8  
Old 08-14-2013
Hello ahamed101,

Attached the same sample file corresponding to the hexdump I put in first post that has 2 blocks.

PD: In order to be able to upload the file, I needed to add "txt" extension.

Thanks in advance for the help.
# 9  
Old 08-14-2013
Yes I know a little Python but you must at least have an attempt yourself.
What have you tried so far?

What scripting, shell or otherwise, have you done?

Why does your code not work?

What error reports do you get?

Which shell are you using?

What HW and OS is this running on?

I have given a pointer and here is a new one which I generated yesterday after my post 3 on this thread:-

https://www.unix.com/unix-dummies-que...on-thread.html

LBNL, I am sure you have posted about this before on here some weeks ago!

EDIT:

How about dumping the binary file as a HEX string array, searching the array for a pattern, finding your wanted end point and re-converting back to a binary file again?

Last edited by wisecracker; 08-14-2013 at 06:24 AM.. Reason: Added an addendum...
# 10  
Old 08-15-2013
Hello,

I've helped several times in this forum before and I know if somebody wants to help, helps. If somebody ask here there is not a forum rule to make attemtps before posting, due that sometimes the people just don't know where to begin. I'm asking and requesting help here (not complete solution) because I don't know in Python or any other language.

I only have the idea to extract the byte sequences searching regular expressions because not always the sequences are in the same position, but I don't know
in which language would be easier, faster, better and how to begin.

I'm using Ubuntu or Windows, but I'm asking for help and suggestions in bash if it possible or in Perl, Python, C or any language to handle binaries and be able
to extract the byte sequences I mention.

Maybe if someone knows how to do it in any language, could give some examples to follow and continue by myself.

I posted before but now I ask thinking in another approach, but still searching the way to extract the info reading the binary directly
without converting to text.

Thanks in advance for any help.

Last edited by Ophiuchus; 08-15-2013 at 01:58 AM..
# 11  
Old 08-15-2013
Is the number of bytes between the start (0x32) and end (0xff 0x33) constant?
Also, whats the significance of the blocks? Can the requested data is present outside the block, in which case you are not interested?
Are there multiple instances of the required data within a block?

I can provide a solution in C. I am not good at python.

--ahamed

Last edited by ahamed101; 08-15-2013 at 04:39 AM..
# 12  
Old 08-15-2013
Hello ahamed,

The number of bytes are not constant between the start (0x32) and end (0xff 0x33)

The requested data is only inside each block:

I'm interested in:
1- The sequence in color inmediately after the beginning of each block, I mean after each 0x32 marked in red in image attached. These sequences always happens only once in each block.

2- Some of the sequences after the FF 34 in each block, these sequences not always happens but if happens only do once in each block. For example, in the sample file, the sequences after FF 34 only appear in block #2.

But maybe for now you can help with the sequences of item "1" and after that
I'll try to replicate the logic you use for sequences of item "2" or ask in order to be able to complete the 2nd item.

Thanks in advance for any help.

Regards
Extract sequences of bytes from binary for differents blocks-binary-sequencesjpg
# 13  
Old 08-15-2013
Well, there is nothing which mentions the end of each block. 0xff 0x33 represents the end of file, is that right?

Following code extracts the data from which you want, but there is no check for end of block as I am still confused. May be a larger file with expected output can clarify it.

Code:
#include <stdio.h>
#include <stdlib.h>

#define err(x) {printf("\nError: %s... Exiting...\n", x); exit(1);}

static unsigned char pat1[] = {0x99, 0x11, 0x45, 0x27};
static unsigned char pat2[] = {0x73, 0x49};

int main(int argc, char **argv)
{

        if(argc < 2)
                err("File name missing");

        unsigned char buf[32];
        unsigned char *ptr = buf;
        int pos = 0;

        FILE *fp = fopen(argv[1], "rb");
        if(!fp) err("Unable to open the file");

        while(!feof(fp)){
                fread(ptr, sizeof(char), 1, fp);
                pos = ftell(fp);
                if(buf[0] == 0x32){
                        fread(ptr+1, sizeof(char), 19, fp);
                        if(memcmp(buf+4, pat1, 4) && memcmp(buf+12, pat2, 2)){
                                fseek(fp, pos, SEEK_SET);
                                continue;
                        }else{
                                int i=0;
                                for(i=0;i<=19;i++) printf("%02x ", buf[i]);
                                printf("\n");
                        }
                }
        }
        return 0;
}

Code:
user@Imperfecto_:~$ gcc extract.c -o extract 
user@Imperfecto_:~$ ./extract binfile.txt 
32 00 00 01 99 11 45 27 89 34 55 0f 73 49 45 49 23 2f ff ff 
32 00 00 02 99 11 45 27 89 34 55 1f 73 49 45 54 76 8f ff ff 
user@Imperfecto_:~$

--ahamed

---------- Post updated at 03:16 AM ---------- Previous update was at 03:13 AM ----------

Or is it that once we encounter 0xff 0x33, we should stop?

--ahamed

Last edited by ahamed101; 08-15-2013 at 08:05 AM..
# 14  
Old 08-15-2013
Hello ahamed,

Thank you for your help!!, I'll try your code to begin with no doubt.

And yes, FF 33 is the end of the file, after the 33 follow some bytes that represent the date and hour, not of interest. 0x33 is iso coded, so in ascii is the number 3.

For more details below is the main structure I mentioned in my 1rst post:

Code:
1- Each block begins with the hex 32 (1 byte) and ends with FF. After the FF of the last block, it follows 33.
2- Next sequence to extract is the correlative (3 bytes) --> I mean, 1, 2, 3...N
3- Next sequence to extract is Product Series (8 bytes) --> The first 4 bytes are always "99 11 45 27"
4- Next sequence to extract is Product Model (8 bytes) --> The first 2 bytes are always "73 49"

Thank you for your help ahamed.

---------- Post updated at 03:05 PM ---------- Previous update was at 01:07 PM ----------

Hello again ahamed,

It works nice!

Now for each block I try to extract (if present) the bytes after the FF 34 and begins with 0x03 followed by 0x80 or 0x81or 0x83 or 0x86 or 0x87 more 16 bytes more how it is shown in image attched in previous post.

I've added a new line as below:
Code:
static unsigned char pat3[] = {0x03, 0x8};

But how to include it in the "if" statement and extract those bytes only when the 0x03 0x8Z (where Z could be 0,1,3,6,7) appears after the occurrence of 0xFF 0x34?

For each block I'd like to have one line in put file.

Thanks in advance again.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

2. Shell Programming and Scripting

Extract the part of sequences from a file

I have a text file, input.fasta contains some protein sequences. input.fasta is shown below. >P02649 MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY... (8 Replies)
Discussion started by: rahim42
8 Replies

3. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

4. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

5. UNIX for Dummies Questions & Answers

X bytes of 0, Y bytes of random data, Z bytes of 5, T bytes of 1. ??

Hello guys. I really hope someone will help me with this one.. So, I have to write this script who: - creates a file home/student/vmdisk of 10 mb - formats that file to ext3 - mounts that partition to /mnt/partition - creates a file /mnt/partition/data. In this file, there will... (1 Reply)
Discussion started by: razolo13
1 Replies

6. Linux

Why does ext3 allocate 8 blocks for files that are few bytes long

The title is clear: why does ext3 allocate 8 blocks for files that are few bytes long? If I create a file named "test", put a few chars in it, and then I run: stat test I get that "Blocks: 8" I searched in the web and found that ext does that, it allocates 8 blocks even if It doesn't need... (4 Replies)
Discussion started by: Tavo
4 Replies

7. Shell Programming and Scripting

extract blocks of text from a file

Hi, This is part of a large text file I need to separate out. I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and... (7 Replies)
Discussion started by: cajunfries
7 Replies

8. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

9. UNIX for Advanced & Expert Users

Deal with binary sequences

Hello, I have come across the necessity for me to deal with binary sequences and I had a few questions. 1- Does any UNIX scripting language provide any tool or command for converting text data to binary sequences? Example of binary sequence: "0x97 0x93 0x85 0x40 0xd5 0xd6 0xd7" 2- If I want... (2 Replies)
Discussion started by: Indalecio
2 Replies

10. Shell Programming and Scripting

Remove first N bytes and last N bytes from a binary file on AIX.

Hi all, Does anybody know or guide me on how to remove the first N bytes and the last N bytes from a binary file? Is there any AWK or SED or any command that I can use to achieve this? Your help is greatly appreciated!! Best Regards, Naveen. (1 Reply)
Discussion started by: naveendronavall
1 Replies
Login or Register to Ask a Question