Extract sequences of bytes from binary for differents blocks


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract sequences of bytes from binary for differents blocks
# 43  
Old 08-25-2013
Yes, only I don't know why is no printing each of the 14 bytes separated by "|".
Only the fisrt 2 bytes after 0x0e are printing separated.
# 44  
Old 08-26-2013
That is because you never asked for that. Check your post #29, sub block sequences are printed based on that.

--ahamed

---------- Post updated at 08:37 PM ---------- Previous update was at 11:06 AM ----------

For removing f's. If in a single byte there is a trailing f, that will be masked.

Code:
#define DONT_PRINT1 0xff
#define DONT_PRINT2 0x0f
#define MASK 0x0f
#define INV_MASK 0xf0

void print_bytes(const unsigned char *ptr, int len, block bl)
{
        int i;
        unsigned char op;
        for(i=0;i<len;i++){
                if(MAIN_BLOCK == bl){
                        op = ptr[i] & MASK;

                        if(ptr[i] == DONT_PRINT1)
                                continue;

                        if(  ((i == (len-1)) && !(op ^ DONT_PRINT2)) 
                          || (!(op ^ DONT_PRINT2) && (ptr[i+1] && (ptr[i+1] == DONT_PRINT1))) ){

                                op = ptr[i] & INV_MASK;
                                if(op){
                                        printf("%x", op);
                                }
                                continue;
                        }
                }
                printf("%02x", ptr[i]);
        }
        printf("|");
        return;

}

void print_data(const unsigned char *ptr, int len, block bl)
{
        if(MAIN_BLOCK == bl){
                print_bytes(ptr, 1, bl);
                print_bytes(ptr+1, 3, bl);
                print_bytes(ptr+4, 8, bl);
                print_bytes(ptr+12, 8, bl);
        } else {
                print_bytes(ptr, 1, bl);
                print_bytes(ptr+1, 1, bl);
                print_bytes(ptr+2, 1, bl);
                print_bytes(ptr+3, 1, bl);
                print_bytes(ptr+4, 4, bl);
                print_bytes(ptr+8, 8, bl);
                print_bytes(ptr+16, 1, bl);
                if(*ptr == intrim_pat1[2][1]){
                        print_bytes(ptr+17, 1, bl);
                }
        }
        return;
}

--ahamed
# 45  
Old 08-26-2013
Hello ahamed,

Really help! it works wonderful. I've learned many things from C from your code and logic and methods used, almost I didn't now anything of C before.

I think the code would be a little smaller because since pat2 has a fixed number of bytes and is always after the 8 bytes of pat1, maybe both could be
a single pat joined. Doesn't matter, I didn't explain very well that at the beginning.

I was able to add/modify (in red) some things to your code in order to remove f's y Sub-block too and print separated the las 14 bytes.
Code:
typedef enum {
        MAIN_BLOCK,
        SUB_BLOCK,
        SUB_BLOCK1
} block;

void print_bytes(const unsigned char *ptr, int len, block bl)
{
        int i;
        unsigned char op;
        for(i=0;i<len;i++){
                if((MAIN_BLOCK == bl) || (SUB_BLOCK == bl)){
                        op = ptr[i] & MASK;

                        if(ptr[i] == DONT_PRINT1)
                                continue;

                        if(  ((i == (len-1)) && !(op ^ DONT_PRINT2)) 
                          || (!(op ^ DONT_PRINT2) && (ptr[i+1] && (ptr[i+1] == DONT_PRINT1))) ){

                                op = ptr[i] & INV_MASK;
                                if(op){
                                        printf("%x", op);
                                }
                                continue;
                        }                        
                printf("%02x", ptr[i]);
                }
                else if(SUB_BLOCK1 == bl) {
                    printf("%02x", ptr[i]);
                }
        }
        printf("|");
        return;
}
void print_bytes_each(const unsigned char *ptr, int len)
{
        int i;
        for(i=0;i<len;i++)
                printf("%02x|", ptr[i]);
        return;
}
void print_data(const unsigned char *ptr, int len, block bl)
{
        if(MAIN_BLOCK == bl){
                print_bytes(ptr, 1, bl);
                print_bytes(ptr+1, 3, bl);
                print_bytes(ptr+4, 8, bl);
                print_bytes(ptr+12, 8, bl);
        } else if(SUB_BLOCK == bl){
                print_bytes(ptr, 1, bl);
                print_bytes(ptr+1, 1, bl);
                print_bytes(ptr+2, 1, bl);
                print_bytes(ptr+3, 1, bl);
                print_bytes(ptr+4, 4, bl);
                print_bytes(ptr+8, 8, bl);
                print_bytes(ptr+16, 1, bl);
                if(*ptr == intrim_pat1[2][1]){
                        print_bytes(ptr+17, 1, bl);
                }
        } else if(SUB_BLOCK1 == bl) {
		print_bytes(ptr, 1, bl); 
                print_bytes_each(ptr+2, 14); 
		 }
        return;
}
void get_len_and_print(FILE *fp, unsigned char *ptr)
{
        int len = 0;

        //only buf[0] is populated at this stage
        if(1 != fread(ptr+1, sizeof(char), 1, fp))
                err("Insufficient data");
        len = *(ptr+1);
        
        if(len !=14){
                if(len != fread(ptr+2, sizeof(char), len, fp))
                   err("Insufficient data");
                print_data(ptr, len+1, SUB_BLOCK);
        }
        else if(len == 14){
                if(len != fread(ptr+2, sizeof(char), len, fp))
                       err("Insufficient data");
                print_data(ptr, len, SUB_BLOCK1);            
        }        
        return;
}

One question I have is, can pat1 and pat2 controlled outside the code by user? Now they are fixed written inside the code. I think
something like when running the program, it asks for pat1 and pat2 (or better, asks for pat1 only).
example:
"Introduce pat1"
and after enter the pattern 991145, the code would run.

Many thanks for all your great help during these days ahamed!

Best regards
# 46  
Old 08-27-2013
You can use the command line arguments and initialize the pattern array
Here is an example

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define err(x) {printf("Error : %s...\n", x); exit(1);}

int main(int argc, char **argv)
{

        if(argc <= 1){
                err("Insufficient arguments");
        }

        unsigned char pat[32];
        int count = 0, i;

        while(argv[count+1] != NULL){
                if(strlen(argv[count+1]) > 2)
                        err("Invalid data provided")
                if(!sscanf(argv[count+1], "%2hhx", &pat[count]))
                        err("Invalid data provided")
                count++;
        }

        printf("\n");
        for(i=0; i<count; i++){
                printf("0x%02x ", pat[i]);
        }
        printf("\n");

        return 0;
}

Code:
-bash-3.2$ gcc run.c -o run
-bash-3.2$ ./run 99 11 45

0x99 0x11 0x45

--ahamed
# 47  
Old 08-31-2013
Hello ahamed again,

Thanks for the help in the last question.

Now, I hope you can help me to see what could be this issue. I'm attaching the final code I'm using, added some functions to your original code.

I'm trying with a binary file of 2GB and when the output file reaches 581,525 lines and 38 MB, simply stops the processing and the error below appears:
Code:
Aborted (`core' generated)

and inside the stackdump file generated says:
Code:
Stack trace:
Frame     Function  Args

Many thanks in advance.
# 48  
Old 08-31-2013
If you have a file approaching 2GB in size the result of ftell() will overflow the pos variable, you are best defining pos as a long variable.

Try again after you replace this:

Code:
        int pos = 0, i, len;

with this:

Code:
        long pos = 0L;
        int i, len;

# 49  
Old 08-31-2013
Hello Chubler_XL!

Thanks for your interventions.

I've tried modifying as you suggested but happens exactly the same. It stops exactly when
the size reaches 39,296 KB.

The error is the same.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

2. Shell Programming and Scripting

Extract the part of sequences from a file

I have a text file, input.fasta contains some protein sequences. input.fasta is shown below. >P02649 MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY... (8 Replies)
Discussion started by: rahim42
8 Replies

3. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

4. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

5. UNIX for Dummies Questions & Answers

X bytes of 0, Y bytes of random data, Z bytes of 5, T bytes of 1. ??

Hello guys. I really hope someone will help me with this one.. So, I have to write this script who: - creates a file home/student/vmdisk of 10 mb - formats that file to ext3 - mounts that partition to /mnt/partition - creates a file /mnt/partition/data. In this file, there will... (1 Reply)
Discussion started by: razolo13
1 Replies

6. Linux

Why does ext3 allocate 8 blocks for files that are few bytes long

The title is clear: why does ext3 allocate 8 blocks for files that are few bytes long? If I create a file named "test", put a few chars in it, and then I run: stat test I get that "Blocks: 8" I searched in the web and found that ext does that, it allocates 8 blocks even if It doesn't need... (4 Replies)
Discussion started by: Tavo
4 Replies

7. Shell Programming and Scripting

extract blocks of text from a file

Hi, This is part of a large text file I need to separate out. I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and... (7 Replies)
Discussion started by: cajunfries
7 Replies

8. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

9. UNIX for Advanced & Expert Users

Deal with binary sequences

Hello, I have come across the necessity for me to deal with binary sequences and I had a few questions. 1- Does any UNIX scripting language provide any tool or command for converting text data to binary sequences? Example of binary sequence: "0x97 0x93 0x85 0x40 0xd5 0xd6 0xd7" 2- If I want... (2 Replies)
Discussion started by: Indalecio
2 Replies

10. Shell Programming and Scripting

Remove first N bytes and last N bytes from a binary file on AIX.

Hi all, Does anybody know or guide me on how to remove the first N bytes and the last N bytes from a binary file? Is there any AWK or SED or any command that I can use to achieve this? Your help is greatly appreciated!! Best Regards, Naveen. (1 Reply)
Discussion started by: naveendronavall
1 Replies
Login or Register to Ask a Question