Converting parts of a string to "Hex"


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Converting parts of a string to "Hex"
# 8  
Old 09-14-2012
Quote:
Originally Posted by alister
... Been there myself many times. Unfortunately, your solution won't work reliably. The timing of hexdump's output and awk's output isn't in anyway guaranteed.
...
Admitted. I noticed that myself in certain circumstances. Still I wanted to publish my "elaborate" construt, and be it as a discussion basis. The typecast feature offered by programming languages was bitterly missed in sed, awk, bash. Obviously they are too smart in variable handling.
# 9  
Old 09-14-2012
Here's my attempt at a POSIX approach. It gives me the same result as the perl solution that I posted yesterday. It depends on the od command generating whitespace delimited, two-hexdigit numbers.
Code:
match='match($0, / ABC\+[0-9]+\+/)'

awk '{'"$match"'; print substr($0, RSTART+RLENGTH)}' file |
while IFS= read -r line; do
        printf '%s' "$line" | od -An -tx1 | paste -s - |
            sed 's/[[:blank:]]//g; s/../\\x&/g'
done > hextemp

awk '{'"
        $match"'
        pre=substr($0, 1, RSTART+RLENGTH-1)
        nbytes=substr($0, RSTART+5, RLENGTH-6)
        post=substr($0, RSTART+RLENGTH+nbytes)
        getline hexstr < hextemp
        hexstr=substr(hexstr, 1, 4*nbytes)
        printf "%s%s%s%s", pre, hexstr, post, ORS
}' hextemp=hextemp file

Regards,
Alister
# 10  
Old 09-15-2012
Hi Hans,
You say that the # characters are unprintable characters, but from the output you say you want from your input ($astring = "xxxxxx ABC+10+\x39\x55\x12\x84\xA7\x9F\x2C\xB1\xFF\x12+DEF xxxx") we see that some of these bytes represent printable characters (assuming you're using a codeset with ASCII underpinnings). The hexadecimal escape codes \x39, \x55, and \x2C are the characters '9', 'U', and ',', respectively. This isn't necessarily bad, but none of the scripts that have been presented here so far will work correctly if one of these characters represented by a "#" is a newline character and these scripts may fail if the "x"s or "#"s contain a sequence that matches the form "ABC+<digits>+". And has already been stated, there is nothing we can do for you in a shell script if any of the bytes represented by a "#" is a null byte ('\x00').

As long as you can guarantee that there won't be any null bytes in the string except for the terminating null byte at the end of every string and can guarantee that exactly one substring of the form "ABC+<digits>+" will appear in echo string, the following script does that you have requested:
Code:
#!/bin/ksh
### Functions:
# Usage: hexit bytes
# Convert the string ("bytes") into printable hexadecimal escape sequences
# corresponding to the values of the bytes in the string.  This function will
# not work correctly if a null byte appears in the string other than as the
# string terminator.  It will correctly handle newline characters in the bytes
# operand.
hexit() {
        printf "%s" "$1" | od -An -tx1 | while read x
        do      set -- $x
                while [ $# -gt 0 ]
                do      printf '\\x%s' "$1"
                        shift
                done
        done
}

### Main program:
# Usage: hexstring string...
# This utility will process each string operand (which must be of the form:
#       <front><hex-head><hex-bytes><tail>
# where <front> is any sequence of zero or more printable characters not
#               containing any substring that matches the format specified for
#               <hex-head>.
#       <hex-head> is composed of three parts in sequence:
#               <hex-head-start><count><hex-head-end>
#       where   <hex-head-start> is the characters "ABC+",
#               <count> is one or more characters from the current locale's
#                       digit character class) that will be interpreted as a
#                       decimal digit string specifying the number of bytes
#                       included in <hex-bytes> (see below), and
#               <hex-head-end> is a "+" character.
#       <hex-data> is string of <count> bytes.  These bytes can contain any
#               value except the null byte as long as no substring of these
#               bytes constitute a string that can be interpreted as a valid
#               <hex-head> string either by itself or when combined with the
#               following <tail>.
#       <tail>  is zero or more printable characters not containing any
#               substring that matches the format specifeid above for
#               <hex-head>.
# When processing is complete, a string will be written to stdout containging
# <front> (unchanged), <hex-head> (unchanged), <hex-data> (converted to the
# four character hexadecimal escape sequence representing each byte in the
# <hex-data>), and <tail> (unchanged).
#
# Example: (Assuming this script is invoked by a recent ksh running on a system
# with the ASCII codeset underlying the current locale):
#       hexstring $'start ABC+5+a\tb\nc+end'
# would produce the following output:
#       start ABC+5+\x61\x09\x62\x0a\x63+end
ec=0    # Exit code (0 unless an error is detected)
while [ $# -gt 0 ]
do
        # Extract the <count> field.
        count=$(expr "$1" : ".*ABC+\([0-9]\{1,\}\)+")
        if [ "$count" == "" ]
        then
                printf "%s: \"ABC+<digits>+\" not found in \"%s\"\n" \
                        $(basename "$0") "$1"
                shift
                ec=1
                continue
        fi
        # Calculate the offset to the start of <hex-bytes>.
        off=$(expr "$1" : ".*ABC+[0-9]\{1,\}+")
        # Print <front> and <hex-head>
        printf "%s" "${1:0:off}"
        # Print <hex-bytes> as hexadecimal escape sequences.
        hexit "${1:off:count}"
        # And, print <tail>
        printf "%s\n" "${1:off + count}"
        shift
done
exit $ec

I realize this is a long script, but it is mostly comments. Note that some features used in the above script are only available in versions of ksh newer than November 16, 1988 and some of the od utiity's options used here weren't defined by the standards until 1992. Smilie

Presumably, you have a source that creates strings containing binary data so I won't worry about it here. It is easy to create strings like this with $'...' in recent versions of ksh, in a C or C++ program, and using the printf utility with hex escape sequences (but I assume if you're creating hex escape sequences to generate these strings, you don't need to convert them back to hex).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Windows & DOS: Issues & Discussions

Convert "hex" foldername to "ascii"

So, I have a folder, containing subdirs like this: 52334d50 52365245 524b4450 524f3350 52533950 52535050 52555550 now I want to go ahead and rename all those folder: hex -> ascii (8 Replies)
Discussion started by: pasc
8 Replies

2. Shell Programming and Scripting

Delete all log files older than 10 day and whose first string of the first line is "MSH" or "<?xml"

Dear Ladies & Gents, I have a requirement to delete all the log files in /var/log/test directory that are older than 10 days and their first line begin with "MSH" or "<?xml" or "FHS". I've put together the following BASH script, but it's erroring out: for filename in $(find /var/log/test... (2 Replies)
Discussion started by: Hiroshi
2 Replies

3. UNIX for Dummies Questions & Answers

Extracting Parts of String "#" vs "%"

Hello, I have a question regarding extracting parts of a string and the meaning of # and % in the syntax. I created an example below. # filename=/first/second/third/fourth # # echo $filename /first/second/third/fourth # # echo "${filename##*/}" fourth # # echo "${filename%/*}"... (3 Replies)
Discussion started by: shah9250
3 Replies

4. Shell Programming and Scripting

grep with "[" and "]" and "dot" within the search string

Hello. Following recommendations for one of my threads, this is working perfectly : #!/bin/bash CNT=$( grep -c -e "some text 1" -e "some text 2" -e "some text 3" "/tmp/log_file.txt" ) Now I need a grep success for some thing like : #!/bin/bash CNT=$( grep -c -e "some text_1... (4 Replies)
Discussion started by: jcdole
4 Replies

5. Shell Programming and Scripting

tcsh - understanding difference between "echo string" and "echo string > /dev/stdout"

I came across and unexpected behavior with redirections in tcsh. I know, csh is not best for redirections, but I'd like to understand what is happening here. I have following script (called out_to_streams.csh): #!/bin/tcsh -f echo Redirected to STDOUT > /dev/stdout echo Redirected to... (2 Replies)
Discussion started by: marcink
2 Replies

6. Shell Programming and Scripting

how to use "cut" or "awk" or "sed" to remove a string

logs: "/home/abc/public_html/index.php" "/home/abc/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" how to use "cut" or "awk" or "sed" to get the following result: abc abc xyz xyz xyz (8 Replies)
Discussion started by: timmywong
8 Replies

7. Shell Programming and Scripting

Using sed to find text between a "string " and character ","

Hello everyone Sorry I have to add another sed question. I am searching a log file and need only the first 2 occurances of text which comes after (note the space) "string " and before a ",". I have tried sed -n 's/.*string \(*\),.*/\1/p' filewith some, but limited success. This gives out all... (10 Replies)
Discussion started by: haggismn
10 Replies

8. Shell Programming and Scripting

read parts of binary files by "ranges"

i read the "cat" manpages, but i could not find to tell it like "read file XY.BIN from byte 1000 to byte 5000" can somebody please point me into the right direction? cat would be the ideal tool for my purpose, the way it behaves, but i miss this ranges option. thanks for any input. (2 Replies)
Discussion started by: scarfake
2 Replies

9. Shell Programming and Scripting

How to apply a "tolower" AWK to a few parts of a document

Hi people, i have a nice problem to solve.. in an text page i must change all the "*.php" occourences to the respective lowercase.. Example: ... <tr><td> <form action="outputEstrazione.php" method="get"> <table cellspacing='0,5' bgcolor='#000000'><tr><td> <font size='2'... (5 Replies)
Discussion started by: marconet85
5 Replies

10. Shell Programming and Scripting

input string="3MMTQSZ348GGMZRQWMJM4SD6M";output string="3MMTQ-SZ348-GGMZR-QWMJM-4SD6

input string="3MMTQSZ348GGMZRQWMJM4SD6M" output string="3MMTQ-SZ348-GGMZR-QWMJM-4SD6M" using linux shell script (4 Replies)
Discussion started by: pankajd
4 Replies
Login or Register to Ask a Question