how to get data from hex file using SED or AWK based on pattern sign


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to get data from hex file using SED or AWK based on pattern sign
# 1  
Old 10-11-2011
how to get data from hex file using SED or AWK based on pattern sign

I have a binary (hex) file I need to parse to get some data which are encoded this way:

.* b4 . . . 01 12 .* af .* 83 L1 x1 x2 xL 84 L2 y1 y2 yL

By another words there is a stream of hexadecimal bytes (in my example separated by space for better readability). I need to get value stored in x1-xL bytes (where L1 is number of 'x' data bytes) and value stored in y1-yL (where L2 is number of 'y' data bytes).

Desired data can be identified by this sequence of bytes:
.* b4 . . . 01 12 .* af .* 83
where '.*' means any number of data, '. . .' means 3 bytes, 'B4, 01, 12, af, 83' is unique ordered sequence which identifies 83 L1 x1 - xL 84 L2 y1 - yL data container (the 'y' bytes identificator '84' must follow immediately after x1-xL data).

Yeh! ... somehow complicated to explain, but I hope clear enough. Can anybody help with the script output of which will be 'X' and 'Y' value from the file.

Note: there may be multiple occurence required data in the file with the different values of 'X' and 'Y', but each one occurence of 'X' and 'Y' is preceeded by string identificator .* b4 . . . 01 12 .* af .* 83 as stated above
# 2  
Old 10-11-2011
Quote:
Originally Posted by sameucho
I have a binary (hex) file
I was able to understand your data's description, but its format is unclear to me. Is it a binary file or a text file with hex-encoded data?

To be as clear as possible, why don't you post the output of hexdump -C datafile. If it's a large amount of data, just include a couple of samples of portions that match the bytes you need help extracting. This way, we can be sure we understand each other.

Regards,
Alister
# 3  
Old 10-12-2011
Hi Alister,

it is a binary file. The Hexdump output of the file is the following:

Code:

00000000  10 00 00 00 5a 01 00 00  00 00 a8 61 00 00 00 00  |....Z......a....|
00000010  00 02 fc 00 dd 01 01 00  01 00 d7 b4 81 d4 80 01  |................|
00000020  12 83 08 32 01 01 00 00  00 00 f1 84 08 21 43 65  |...2.........!Ce|
00000030  87 09 21 13 f0 a5 06 80  04 87 f7 4a 37 86 03 00  |..!........J7...|
00000040  01 0b 87 01 02 88 02 00  01 89 02 00 01 8a 04 03  |................|
00000050  00 00 00 ab 06 80 04 4a  35 05 01 8c 0c 77 77 77  |.......J5....www|
00000060  2e 74 65 73 74 30 2e 64  65 8d 02 f1 21 ae 08 a0  |.test0.de...!...|
00000070  06 80 04 00 00 00 02 af  36 30 34 81 0d 00 00 00  |........604.....|
00000080  00 00 00 00 00 00 00 00  00 00 82 0d 02 1d 82 1f  |................|
00000090  72 96 87 87 43 fb ff ff  00 83 02 28 00 84 02 28  |r...C......(...(|
000000a0  00 85 01 02 86 09 11 10  11 15 55 45 2b 02 00 90  |..........UE+...|
000000b0  09 11 10 11 15 55 41 2b  02 00 91 01 04 93 01 00  |.....UA+........|
....
....

In fact this is ASN1 encoded record (starting in this case at ofset 0x1B) which looks in external decoder like this:

Code:

B4 81 D4 (212d)
.  80 01 (1d): 12 
.  83 08 (8d): 32 01 01 00 00 00 00 F1 
.  84 08 (8d): 21 43 65 87 09 21 13 F0 
.  A5 06 (6d)
.  80 04 (4d): 87 F7 4A 37 
.  86 03 (3d): 00 01 0B 
.  87 01 (1d): 02 
.  88 02 (2d): 00 01 
.  89 02 (2d): 00 01 
.  8A 04 (4d): 02 00 00 00 
.  AB 06 (6d)
.  80 04 (4d): 4A 35 05 01 
.  8C 0C (12d): 77 77 77 2E 74 65 73 74 30 2E 64 65 
.  8D 02 (2d): F1 21 
.  AE 08 (8d)
.  A0 06 (6d)
.  80 04 (4d): 00 00 00 01 
.  AF 36 (54d)
.     30 34 (52d)
.        81 0D (13d): 00 00 00 00 00 00 00 00 00 00 00 00 00 
.        82 0D (13d): 02 1D 82 1F 72 96 87 87 43 FB FF FF 00 
.        83 02 (2d): 28 00 
.        84 02 (2d): 28 00 
.        85 01 (1d): 02 
.        86 09 (9d): 11 10 11 15 54 18 2B 02 00 
.  90 09 (9d): 11 10 11 15 54 14 2B 02 00 
.  91 01 (1d): 04 
.  93 01 (1d): 00 
...
...

I need to get values under TAGs [B4 [80 01 12] ... [AF...[83] X [84] Y]]
where B4 [80 01 12] is the identifier of correct [AF] TAG. To have it universal the full ASN1 decoder would have to be written, but because I need to get just those two values I simplified the task supposing there will not be another occurence of such combination of 'sign' bytes before the desired values which are to be collected.

Last edited by sameucho; 10-12-2011 at 05:36 AM.. Reason: Editorial changes for better readability
# 4  
Old 10-12-2011
Since we don't know exactly what platform you're running on, my proposal endeavours to restrict itself to ubiquitous POSIX functionality. Also, it makes the same assumptions you've made. Specifically:

Quote:
Originally Posted by sameucho
I simplified the task supposing there will not be another occurence of such combination of 'sign' bytes before the desired values which are to be collected.
I did not test the following code, but I did my best to mind the details. If it doesn't work, please post any error messages, how the behavior deviates from what's expected, and which operating system(s) this needs to run on. Also, if the following code is insufficient, it would help to have a sample of the binary data to test against (upload it somewhere and link us). I'm feeling a bit lazy today and I'm not interested in creating my own mock data Smilie (although I suppose I could reverse the hexdump with AWK if I were feeling industrious).


Code:
od -An -td1 binfile | tr -s ' \t' '\n\n' | awk '
    NR==1 && length==0   { getline }
    $0==180              { i=1; getline; getline; getline }
    i==1 && $0==128      { ++i; getline }
    i==2 && $0==1        { ++i; getline }
    i==3 && $0==12       { ++i; getline }
    i==4 && $0==175      { ++i; getline }
    i==5 && $0==131      { ++i; getline; pr_bytes(); getline }
    i==6 && $0==132      { getline; pr_bytes(); printf("%s", s) }
    i!=4 && i!=5         { i=0; s="" }

    function pr_bytes() {
        j=$0
        while (j--) {
                getline
                s=s sprintf("%.2X%s", $0, (j ? OFS : ORS))
        }
    }
'


Since AWK is not required to support hexadecimal constants or numeric strings, od dumps byte values in base 10. tr is used to replace all spaces and tabs with newlines. AWK then reads one line at a time, with each line either containing one byte value in decimal or nothing at all.

The AWK script:
* Discard a leading blank line if present (a by-product of leading whitespace in od output).
* i keeps track of which state is sought.
* pr_bytes() reads the value of the current byte and reads that many subsequent bytes. The bytes are stored in s as a space-delimited string terminated by a newline.
* If at any point a byte value does not match what's expected, the line will fallthrough to the bottom, where i and s are reset.
* The output is two lines of text per record. Line 1 corresponds to what you've referred to as X, line 2 to Y. Each line is a space-delimited sequence of hexadecimal byte values.

Regards,
Alister

Last edited by alister; 10-12-2011 at 04:45 PM.. Reason: Added missing getline and corrected conditional
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Copy data to new file based on input pattern

Hi All, I want to create a new file based on certain conditions and copy only those conditioned data to new file. Input Data is as it looks below. ORDER|Header|Add|32|32|1616 ORDER|Details1......... ORDER|Details2......... ORDER|Details3......... ORDER|Details4............ (10 Replies)
Discussion started by: grvk101
10 Replies

2. Shell Programming and Scripting

awk to update value based on pattern match in another file

In the awk, thanks you @RavinderSingh13, for the help in below, hopefully it is close as I am trying to update the value in $12 of the tab-delimeted file2 with the matching value in $1 of the space delimeted file1. I have added comments for each line as well. Thank you :). awk awk '$12 ==... (10 Replies)
Discussion started by: cmccabe
10 Replies

3. Shell Programming and Scripting

awk to insert missing string based on pattern in file

Using the file below, which will always have the first indicated by the digit after the - and last id in it, indicated by the digit after the -, I am trying to use awk to print the missing line or lines in file following the pattern of the previous line. For example, in the file below the next... (4 Replies)
Discussion started by: cmccabe
4 Replies

4. Shell Programming and Scripting

Precede and Append characters using sed/awk based on a pattern

I have an input file which is similar to what I have shown below. Pattern : Data followed by two blank lines followed by data again followed by two blank lines followed by data again etc.. The first three lines after every blank line combination(2 blank lines between data) should be... (2 Replies)
Discussion started by: bikerboy
2 Replies

5. UNIX for Dummies Questions & Answers

using sed delete a line from csv file based on specific data in two separate fields

Hello, :wall: I have a 12 column csv file. I wish to delete the entire line if column 7 = hello and column 12 = goodbye. I have tried everything that I can find in all of my ref books. I know this does not work /^*,*,*,*,*,*,"hello",*,*,*,*,"goodbye"/d Any ideas? Thanks Please... (2 Replies)
Discussion started by: Chris Eagleson
2 Replies

6. Shell Programming and Scripting

sed/awk : how to delete lines based on IP pattern ?

Hi, I would like to delete lines in /etc/hosts on few workstations, basically I want to delete all the lines for a list of machines like this : for HOST in $(cat stations.lst |uniq) do # echo -n "$HOST" if ping -c 1 $HOST > /dev/null 2>&1 then HOSTNAME_val=`rsh $HOST "sed... (3 Replies)
Discussion started by: albator1932
3 Replies

7. Shell Programming and Scripting

Sed or awk : pattern selection based on special characters

Hello All, I am here again scratching my head on pattern selection with special characters. I have a large file having around 200 entries and i have to select a single line based on a pattern. I am able to do that: Code: cat mytest.txt | awk -F: '/myregex/ { print $2}' ... (6 Replies)
Discussion started by: usha rao
6 Replies

8. Shell Programming and Scripting

Merge two file data together based on specific pattern match

My input: File_1: 2000_t g1110.b1 abb.1 2001_t g1111.b1 abb.2 abb.2 g1112.b1 abb.3 2002_t . . File_2: 2000_t Ali england 135 abb.1 Zoe british 150 2001_t Ali england 305 g1111.b1 Lucy russia 126 (6 Replies)
Discussion started by: patrick87
6 Replies

9. Shell Programming and Scripting

Truncating FILE data BASED ON A PATTERN

HI I HAVE A PROBLEM,MY SOURCE FILE IS OF PATTERN S1,E-Certified,29,29,2.7,Certified,4,3,2.7,,0,0,0 S2,Certified,4,3,2.7,,0,0,0,,0 S3,E-Certified,29,29,2.7,,0,0,0 S4,,0,0,0,,0,0,0,,0,0,0,,0,0,0 AND THE EXPECTED OUTPUT IS S1,E-Certified,29,29,2.7 S1,Certified,4,3,2.7... (1 Reply)
Discussion started by: pkumar3
1 Replies

10. Shell Programming and Scripting

Split a file based on pattern in awk, grep, sed or perl

Hi All, Can someone please help me write a script for the following requirement in awk, grep, sed or perl. Buuuu xxx bbb Kmmmm rrr ssss uuuu Kwwww zzzz ccc Roooowwww eeee Bxxxx jjjj dddd Kuuuu eeeee nnnn Rpppp cccc vvvv cccc Rhhhhhhyyyy tttt Lhhhh rrrrrssssss Bffff mmmm iiiii Ktttt... (5 Replies)
Discussion started by: kumarn
5 Replies
Login or Register to Ask a Question