Help search and replace hex values only in specific files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help search and replace hex values only in specific files
# 1  
Old 06-14-2011
Help search and replace hex values only in specific files

Code:
perl -pi -e 's/\x00/\x0d\x0a/g' `grep -l $'[\x00]GS' filelist` 

This isn't working Smilie, it's not pulling the files that contain the regex. Please help me rewrite this Smilie.

Ideally for this to work on 9K of 20K files in the directory, I've tried this but I don't know enough about awk to make it work.

Code:
for each del in `cat delimiters`
do
  perl -pi -e 's/\x$del/\x0d\x0a/g' `grep -l $'[\x$del]GS' filelist`
done
cat delimiters
15
1c
1e
1f

GOAL
Subsititute the value at position 106 with a newline for the entire file (preferably when the value at pos 107=G) and preferably this will happen on for files that have a value at position 107 and the line begins with ISA.

ISSUE
- my grep statement looking for specific hex values isn't working
- I'd like to identify and perform the same action on files that share the same value at position 106
- some characters are troublesome in a perl one liner or a grep like \ or - or other hex values
- the file contains other non-ascii or non-printable characters throughout the file that should not be substituted
This value at position 106
- changes from file to file
- is consistent within a particular file
- is non-word, usually non-ascii or non-printable (ie hex value 15, 1c, 1e, 1f, 21, 27, 2a, 2e, 3c, 3d, 3e, 3f, 40, 5c, 5e, 60, 7d, 7e, b8, be, c4, 00)
- may not be at position 106 in other files and should not be subsituted in files that do not have this value at pos 106

Code:
INPUT:  This is what the input file looks like.  The value at position 106 is Hex x00 

ISA`00`FTL DATA  `00`FTL DATA  `ZZ`BBBB MFG       `ZZ`FFF  MFG       `110612`1931`U`00401`000001527`0`P`> GS`FA`BBBB MFG`FFF  MFG`110612`1931`940`X`002040 ST`997`000001184 AK1`PO`214 AK2`830`000007903 AK5`A AK9`A`1`1`1 SE`6`000001184 GE`1`940 IEA`1`000001527-
 
or
 
The value at position 106 is Hex x27.  Hex x60 is throughout the file and shouldn't be substituted. 

ISA`00`FTL DATA  `00`FTL DATA  `ZZ`B715 MFG       `ZZ`FTL  MFG       `110612`1931`U`00401`000001527`0`P`>'GS`FA`B715 MFG`FTL  MFG`110612`1931`940`X`002040'ST`997`000001184'AK1`PO`214'AK2`830`000007903'AK5`A'AK9`A`1`1`1'SE`6`000001184'GE`1`940'IEA`1`000001527-
 
OUTPUT:  This is what the output should look like.

ISA`00`FTL DATA  `00`FTL DATA  `ZZ`BBBB MFG       `ZZ`FFF  MFG       `110612`1931`U`00401`000001527`0`P`>
GS`FA`BBBB MFG`FFF  MFG`110612`1931`940`X`002040
ST`997`000001184
AK1`PO`214
AK2`830`000007903
AK5`A
AK9`A`1`1`1
SE`6`000001184
GE`1`940
IEA`1`000001527-

# 2  
Old 06-14-2011
Sounds like a fun project Smilie

Let's break it up:
1. convert to hex
2. store character 106 and 107
3. in hex format replace all char106's to '0a' if char107 is 'G'
4. convert hex back to ascii

Here, try this:

Code:
#!/bin/bash

while read file ; do           #loop through all input files
    od -t x1 $file > hexfile   #dump the content in hex 1-bytes
    awk 'NR==7{print $11 " " $12; exit}' hexfile |  #get char 106 and 107
    while read c106 c107 ; do  #and store them in variables
      if [ $c107 = 47 ] ; then #char107 is 'G'; substitute
         sed -i "s/$c106/0a/g" hexfile #replace all in hexfile
      fi
    done #end while
    awk '{$1=""; gsub(/ /,"")}1' hexfile | perl -ne '$l=length($_)-1; print pack(H.$l,$_);' > ${file}.replaced 
done < filelist

'od -t x1' will dump the contents of (ascii) input in 1-byte hexadecimals, check 'man od' for details, or try it out on CL.
The second awk command gets rid of first column, which is just reference numbers output by 'od', and also rids of all spaces. Perl command then determines length of hex string on each line and uses 'pack' to convert back to ascii. (This will be basically 'pack(H32,$_)' except for last line which may be shorter, in which case pack(H32,$_) would fill it up with \0. To avoid this mess at the end, length of string is determined.
Output into .replaced so that you don't mess up your originals and can compare.
This User Gave Thanks to mirni For This Post:
# 3  
Old 06-16-2011
Mirni!!

This worked like a charm! it worked very well and solved the hex issue and created newlines really well.

I'm not familiar with awk ... I tried to change up the if statement but it wouldn't work
Code:
while read c106 c107 c108 ; do  #and store them in variables
      if [ $c107 = 47 && $c108 = 53 ] ; then #char107 is 'G' and #char108 is 'S' sub
         sed -i "s/$c106/0a/g" hexfile #replace all in hexfile
      fi
done

ideally I'd like to also add if c109 is non-alphanumeric [^[:alnum:]]

how would I do that?

Last edited by Franklin52; 06-17-2011 at 03:11 AM.. Reason: Please use code tags
# 4  
Old 06-16-2011
Glad it worked.
c109 is always alphanumeric, because its a hex code of the character (the whole file 'hexfile' that awk filters is made of hex codes).

So you wanna get char109 from original file and test for alnum:
Code:
while read file ; do 
    c109=`awk 'NR==1{print substr($0,109,1)}' $file` #get char 109 from the first line of current file
    if [[ "$c109" = [^[:alnum:]] ]] ; then #only go on if c109 is not alnum
       od -t x1 $file > hexfile        
       awk 'NR==7{print $11 " " $12 " " $13; exit}' hexfile | 
       while read c106 c107 c108; do 
          if [ $c107 = 47 ] && [ $c108 = 53 ] ; then
             sed -i "s/$c106/0a/g" hexfile 
          fi     
       done #end while     
       awk '{$1=""; gsub(/ /,"")}1' hexfile | perl -ne '$l=length($_)-1; print pack(H.$l,$_);' > ${file}.replaced 
    fi  #end c109 condition
done < filelist


This will only work correctly if the original file has no newline characters before 109; since the awk command that grabs c109 operates only on the first line (NR==1).
This User Gave Thanks to mirni For This Post:
# 5  
Old 06-16-2011
Thanks so much mirni ... I really appreciate it.

Now that I've seen your code, this solves more than one issue for me. If c109 is present then yes ... all the code needs to be executed.

This saves time because the field will be empty for files that don't require this code.

Thanks!

Brackets! But of course ...

---------- Post updated at 06:37 PM ---------- Previous update was at 06:28 PM ----------

Can you help me figure out the next part of my script? How I can use awk to print a new line with every occurence of AK2 (plus the other variables). I can show you the awk I've already started, it isn't quite right, it's only giving me one line w the first AK2($2) and it skips the others. Smilie

Code:
INPUT looks like this:
ISA~00~          ~00~          ~ZZ~RRRR           ~ZZ~FFF FIAC       ~110611~2215~U~00301~000002391~0~P~>
GS~FA~RRRR MFG~FFFXMFG~110611~2215~1847~X~002000
ST~997~1751
AK1~PO~970
AK2~830~000031588 #I want to print the other variables for every occurrence of AK2
AK5~A
AK2~830~000031589
AK5~A
AK2~830~000031590
AK5~A
AK2~830~000031607
AK5~A
AK9~A~186~186~186
SE~376~1751
GE~1~1847
IEA~1~000002391
 
I'd like the OUTPUT to look this:
RRRR           ,FFF FIAC       ,000002391,FA,RRRR MFG,1847,PO,970,830,000031588,,,,,,A,,A,186,186,186,,
RRRR           ,FFF FIAC       ,000002391,FA,RRRR MFG,1847,PO,970,830,000031589,,,,,,A,,A,186,186,186,,
RRRR           ,FFF FIAC       ,000002391,FA,RRRR MFG,1847,PO,970,830,000031590,,,,,,A,,A,186,186,186,,
RRRR           ,FFF FIAC       ,000002391,FA,RRRR MFG,1847,PO,970,830,000031607,,,,,,A,,A,186,186,186,,

# 6  
Old 06-17-2011
Quote:
I want to print the other variables
I'm not sure I understand what you mean there... or what exactly is the logic of extraction, but let me try:

Your input seems to have fields separated by '~'. awk does the processing line-by-line, so you can tell it to print 3rd field of each line starting with 'AK2' like this:
Code:
awk -F~ '/^AK2/{print $3}' input

Now to put it together ad-hoc, something like this could be done by combining more pattern-action rules:
Code:
$ awk -F~ '
  /^ISA/{out=$7","$9","$14}
  /^GS/{out=out","$2","$3","$7}
  /^AK1/{out=out","$2","$3}
  /^AK2/{print out","$2","$3}
'  input
RRRR           ,FFF FIAC       ,000002391,FA,RRRR MFG,1847,PO,970,830,000031588
RRRR           ,FFF FIAC       ,000002391,FA,RRRR MFG,1847,PO,970,830,000031589
RRRR           ,FFF FIAC       ,000002391,FA,RRRR MFG,1847,PO,970,830,000031590
RRRR           ,FFF FIAC       ,000002391,FA,RRRR MFG,1847,PO,970,830,000031607

It works like this:
/regexPattern/{action to do when line matches regexPattern}
This User Gave Thanks to mirni For This Post:
# 7  
Old 06-28-2011
Thanks for your time and solution mirni ... I really appreciate both!

I apologize for not explaining the logic more clearly and you understood exactly what I meant.

Both of your solutions worked really well.

Thanks again
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Replace hex values using sed command

File lalo.txt contains: Á I need to replace Á by A using sed command. od -x lalo.txt 0000000 c10a 0000002 sed -e 's/\xc1\x0a/A/g' lalo.txt > lalo2.txt Also tried: sed -e 's/\xc3\x81/A/g' lalo.txt > lalo2.txt Output file lalo2.txt still has Á Unix version: SunOS 5.11 ... (9 Replies)
Discussion started by: mrreds
9 Replies

2. Shell Programming and Scripting

Search and replace specific positions of specific lines

Hi, I have a file with hundreds of lines. I want to search for particular lines starting with 4000, search and replace the 137-139 position characters; which will be '000', with '036'. Can all of this be done without opening a temp file and then moving that temp file to the original file name. ... (7 Replies)
Discussion started by: dsid
7 Replies

3. Shell Programming and Scripting

Search Replace Specific Column using RegEx

Have Pipe Delimited File: > BRYAN BAKER|4/4/2015|518 VIRGINIA AVE|TEST > JOE BAXTER|3/30/2015|2233 MockingBird RD|ROW2On 3rd column where the address is located, I want to add a space after every numeric value - basically doing a "s//&\ / ": > BRYAN BAKER|4/4/2015|5 1 8 VIRGINIA AVE|TEST > JOE... (5 Replies)
Discussion started by: svn
5 Replies

4. Shell Programming and Scripting

How to replace with "sed" some hex values by other hex values?

Assume I have a file \usr\home\\somedir\myfile123.txt and I want to replace all occurencies of the two (concatenated) hex values x'AD' x'A0' bytwo other (concatenated) hex values x'20' x'6E' How can I achieve this with the gnu sed tool? Additional question: Is there a way to let sed show... (1 Reply)
Discussion started by: pstein
1 Replies

5. Shell Programming and Scripting

awk to search for specific line and replace nth column

I need to be able to search for a string in the first column and if that string exists than replace the nth column with "-9.99". AW12000012012 2.38 1.51 3.01 1.66 0.90 0.91 1.22 0.82 0.57 1.67 2.31 3.63 0.00 AW12000012013 1.52 0.90 1.20 1.34 1.21 0.67 ... (14 Replies)
Discussion started by: ncwxpanther
14 Replies

6. Shell Programming and Scripting

Query the table and return values to shell script and search result values from another files.

Hi, I need a shell script, which would search the result values from another files. 1)execute " select column1 from table_name" query on the table. 2)Based on the result, need to be grep from .wft files. could please explain about this.Below is the way i am using. #!/bin/sh... (4 Replies)
Discussion started by: Rami Reddy
4 Replies

7. UNIX for Advanced & Expert Users

Search and replace a array values in perl

Hi, i want to search and replace array values by using perl perl -pi -e "s/${d$i]}/${b$j]}" *.xml i am using while loop for the same. if i excute this,it shows "Substitution replacement not terminated at -e line 1.". please tell me what's wrong this line (1 Reply)
Discussion started by: arindam guha
1 Replies

8. Shell Programming and Scripting

Search for multiple string and replace with respective values

Hi, Can anyone help me to search for multiple strings within specified position and replace with respective string value. For example I need to search the string from the position 11 to 20 and if it contain ABC and then replace it by BCDEFGHIJ ... find AABZSDJIK and replace with QWE. and... (4 Replies)
Discussion started by: zooby
4 Replies

9. Programming

searching files for hex or oct values

I have a set of files without extensions. How can I programatically tell if a file is in gzip format? The gzip file format spec RFC 1952 GZIP File Format Specification version 4.3 states that gzip files have certain hex/oct values at the beginning of the file. 1st byte = 0x1f in hex,... (2 Replies)
Discussion started by: daflore
2 Replies

10. Shell Programming and Scripting

highly specific search and replace for a large number of files

hey guys, I have a directory with about 600 files. I need to find a specific word inside a command and replace only that instance of the word in many files. For example, lets say I have a command called 'foo' in many files. One of the input arguments of the 'foo' call is 'bar'. The word 'bar'... (5 Replies)
Discussion started by: ksubrama
5 Replies
Login or Register to Ask a Question