Split binary file every occurrence of a group of characters


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Split binary file every occurrence of a group of characters
# 1  
Old 03-29-2013
Split binary file every occurrence of a group of characters

Hello I am new to scripts, codes, bash, terminal, etc.
I apologize this my be very scattered because I frankly don't have any idea where to begin and I have had trouble sleeping lately.

I have several 2GB files I wish to split.
This Code
Code:
00 00 01 BA ** ** ** ** ** ** ** ** C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20

reoccurs several times in each file, sadly they are not evenly spaced otherwise
Code:
split -a 6 -b 2k

would work just fine for what I am doing.
Out of the first 12 bits 5,6,7,8,9,10,11,12 are random, they change in sequential order and they must be included at the start of the file.

The files should start at
Code:
00 00 01 BA etc

and end before the next occurrence of
Code:
00 00 01 BA etc

Code:
00 00 01 BA

Occurs every 2048 bits but is not the part I'm looking for.
Just the bits after the first 12 are important.

I have been searching for a code to split the file at every occurrence. After trying and failing with many different tricks around the internet, I have decide to ask for help.

Back Story to problem:
I have a 500GB HD that had a 40GB HFS+ partition that I was using to transfer files from an old Dell PC to my Mac. Well, I needed to reformat it I no longer remember why.
Well, somehow the partition map got messed up and instead formatted the HFS 450GB partition. Which had all my home videos on it.
I just got the disks to burn them all to DVD to save in a lock box. 20 Years of video.
I found an application called disk drill and used that to recover all the data.
I am not sure if everything was found but it seems like it may have reconstructed the important stuff.

My problem with resolution:
Well, the videos are all jumbled up, bits and pieces of Mpeg2 videos are mixed together.
I figured out where the files need to be split with the help of an app called HEX Fiend.
And turns out there is a type of timecode used in the mpeg 2 files binary data. So using that I was able to correct 2 video files. Took me a week.

Last edited by PatrickE; 03-29-2013 at 05:55 PM.. Reason: I realized I repeated things unnecessarily.
# 2  
Old 03-29-2013
Quote:
Originally Posted by PatrickE
I have been searching for a code to split the file at every occurrence of
Code:
00 00 01 BA ** ** ** ** ** ** ** ** C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20

* = Random digit.
So the asterisks can be any hex digit. The bytes in blue must match exactly. What about the first four bytes in your sequences (highlighted in red)? Do they need to match exactly or can they be anything as well?

Also, what happens with this entire sequence when the file is split? Is it discarded and not included in either file? If not, on which side of the split should it appear? Should the matching bytes be placed at the end of one file or the beginning of the next?

Might there be more than one occurrence of this sequence in a file, requiring that a single file be split into more than 2 files?

Do you have any preference with regard to the new file names, e.g. movie.mpeg might be split into movie.mpeg.1 movie.mpeg.2?

Regards,
Alister

Last edited by alister; 03-29-2013 at 05:47 PM..
# 3  
Old 03-29-2013
Correct the asterisks can be any hex digit, The Blue must be exact. The red thats hard to say they also never change.

---------- Post updated at 04:01 PM ---------- Previous update was at 04:00 PM ----------

The Matching bits need to be at the beginning of the files.

---------- Post updated at 04:04 PM ---------- Previous update was at 04:01 PM ----------

Ah Yes there are several hundred occurrences and each needs to be in a separate file.
As for filename I would like it to include the original filename. Wait no the filename can be anything it don't even have to be in order thats a different project. LOL Sorry

---------- Post updated at 04:10 PM ---------- Previous update was at 04:04 PM ----------

Okay after the files are split I would use
Code:
for i in *; do mv "$i" "$( xxd -ps -l 55 "$i")"$i; done

to add the start hex code to the filename and then "better finder rename" to remove the parts of the hex that are unnecessary to sorting the files.

Original File name Movie.mpg
After that I would have 01c0-0e17-44E2DD28FC010189-Movie.mpg

Those first 8 digits help me sort the correct files into folder's.
Example there will turn out to be about 4 separate files sometime's more that was mixed into one video.
The remaining digits put the video file back in order.
After thats done each folder would contain all the correct pieces of each of the video file's in order.
I then CD to that folder and
Code:
(ls | xargs cat) >

into one video file.

Last edited by PatrickE; 03-29-2013 at 06:27 PM..
# 4  
Old 03-29-2013
Can you please give me an example? Got confused with couple of statements you made.
Probably a small sample input and expected output should do.

--ahamed
# 5  
Old 03-29-2013
ahamed I will try my best.

Here's a copy from one of my finished projects
.
Infidel Movie.mpg
00 00 01 BA 44 00 04 00 04 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 00 00 00 00 00 01 E0 40 00 00 00 47 40 00 00 D9 ~ 39KB of Random data ~ 00 00 01 BA 44 00 04 56 DD 0D 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 13 00 00 00 00 01 E0 40 00 00 00 D9 E1 00 01 89 ~ 43KB of Random data ~ 00 00 01 BA 44 00 0C 9D D4 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 28 00 00 00 00 01 E0 40 00 00 01 89 D6 00 02 39 ~ 68KB of Random data ~ 00 00 01 BA 44 00 16 1D 7C 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 49 00 00 00 00 01 E0 40 00 00 02 39 CB 00 02 E9 ~ 63KB of Random data ~ 00 00 01 BA 44 00 1F 9D 24 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 68 00 00 00 00 01 E0 40 00 00 02 E9 C0 00 03 99 ~ 70KB of Random data ~



01e0-0e17f-440004000401.mpg
00 00 01 BA 44 00 04 00 04 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 00 00 00 00 00 01 E0 40 00 00 00 47 40 00 00 D9 ~ 39KB of Random data
~ Split ~
01e0-0e17f-44000456DD0D.mpg
00 00 01 BA 44 00 04 56 DD 0D 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 13 00 00 00 00 01 E0 40 00 00 00 D9 E1 00 01 89 ~ 43KB of Random data
~ Split ~
01e0-0e17f-44000C9DD401.mpg
00 00 01 BA 44 00 0C 9D D4 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 28 00 00 00 00 01 E0 40 00 00 01 89 D6 00 02 39 ~ 68KB of Random data ~
~ Split ~
01e0-0e17f-4400161D7C01.mpg
00 00 01 BA 44 00 16 1D 7C 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 49 00 00 00 00 01 E0 40 00 00 02 39 CB 00 02 E9 ~ 63KB of Random data ~
~ Split ~
01e0-0e17f-44001F9D2401.mpg
00 00 01 BA 44 00 1F 9D 24 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 68 00 00 00 00 01 E0 40 00 00 02 E9 C0 00 03 99 ~ 70KB of Random data ~
~ Split ~

I hope this helps. If now suggestions on how I can explain better.
# 6  
Old 03-29-2013
So, you need to split the file based on 00 00 01 BA ** ** ** ** ** ** ** ** C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 where the first 4 pair of hex digits may or mat not be the same, the next 8 pair (i.e. *) can b anything and the remaining should match, right?

--ahamed

---------- Post updated at 03:52 PM ---------- Previous update was at 03:49 PM ----------

and the spacing may not be the same, right?

--ahamed
# 7  
Old 03-29-2013
I realized that example is mostly in order where they are mixed more at some point.
Here let me try this.

File.mpg
00 00 01 BA 44 00 04 00 04 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 00 00 00 00 00 01 E0 40 00 00 00 47 40 00 00 D9 ~ 39KB of random data ~ 00 00 01 BA 44 00 04 56 DD 0D 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 13 00 00 00 00 01 E0 40 00 00 00 D9 E1 00 01 89 ~ 43KB of random data ~ 00 00 01 BA 44 BF EF 12 1C B1 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 0D 39 C8 00 00 00 00 01 C0 40 00 0B FF 81 EB 0C 00 89 ~ 584KB of random data ~ 00 00 01 BA 44 BF FF D9 CC 05 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 0D 3A E5 00 00 00 00 01 C0 40 00 0C 00 89 DA 0C 01 91 ~ 580KB of random data ~ 00 00 01 BA 46 97 97 06 26 1D 00 AF CB F8 00 00 01 BB 00 12 80 57 E5 04 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 E0 07 D4 80 C1 0D 31 A5 E9 E9 FB 11 A5 E9 A3 99 1E 60 E8 00 00 01 B3 ~ 205KB of random data ~ 00 00 01 BA 44 00 0F B4 7E 39 02 3B 4B F8 00 00 01 BB 00 12 81 1D A5 06 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 E0 07 D4 80 C1 0D 31 00 05 79 79 11 00 05 33 17 1E 60 E8 00 00 01 B3 ~ 791KB of random data ~

Split on content

01e0-0e17-440004000401.mpg
00 00 01 BA 44 00 04 00 04 01 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 00 00 00 00 00 01 E0 40 00 00 00 47 40 00 00 D9 ~ 39KB of random data ~
~ Split ~
01e0-0e17-44000456DD0D.mpg
00 00 01 BA 44 00 04 56 DD 0D 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 00 00 13 00 00 00 00 01 E0 40 00 00 00 D9 E1 00 01 89 ~ 43KB of random data ~
~ Split ~
01c0-0e17-44BFEF121CB1.mpg
00 00 01 BA 44 BF EF 12 1C B1 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 0D 39 C8 00 00 00 00 01 C0 40 00 0B FF 81 EB 0C 00 89 ~ 584KB of random data ~
~ Split ~
01c0-0e17-44BFFFD9CC05.mpg
00 00 01 BA 44 BF FF D9 CC 05 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 BF 03 D4 00 00 0D 3A E5 00 00 00 00 01 C0 40 00 0C 00 89 DA 0C 01 91 ~ 580KB of random data ~
~ Split ~
8057-4e17-46979706261D.mpg
00 00 01 BA 46 97 97 06 26 1D 00 AF CB F8 00 00 01 BB 00 12 80 57 E5 04 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 E0 07 D4 80 C1 0D 31 A5 E9 E9 FB 11 A5 E9 A3 99 1E 60 E8 00 00 01 B3 ~ 205KB of random data ~
~ Split ~
811d-6e17-44000FB47E39.mpg
00 00 01 BA 44 00 0F B4 7E 39 02 3B 4B F8 00 00 01 BB 00 12 81 1D A5 06 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 E0 07 D4 80 C1 0D 31 00 05 79 79 11 00 05 33 17 1E 60 E8 00 00 01 B3 ~ 791KB of random data ~
~ Split ~


After that I combine them and get.
01e0-0e17.mpg
01c0-0e17.mpg
8057-4e17.mpg
811d-6e17.mpg

---------- Post updated at 05:56 PM ---------- Previous update was at 05:56 PM ----------

Correct.

---------- Post updated at 05:58 PM ---------- Previous update was at 05:56 PM ----------

Yes that is correct. ahamed
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a big file into multiple files based on first four characters

I have a requirement to split a huge file to smaller text files based on first four characters which look like ABCD 1234 DFGH RREX : : : : : 0000 Each of these records are OF EQUAL bytes with a different internal layout based on the above first digit identifier.. Any help to start... (5 Replies)
Discussion started by: etldev
5 Replies

2. UNIX for Dummies Questions & Answers

counting occurrence of characters in a string

Hello, I have a string like this 0:1:2:0:2:2:4:0:0:0:-200:500...... what i want is to break down how many different characters are there and their count. For example for above string it should display 0 - 5 times 1 - 1 times 2 - 3 times 4 - 1 times . . . I am stuck in writing... (8 Replies)
Discussion started by: exit86
8 Replies

3. Shell Programming and Scripting

split a string and convert to binary

Hi All, Iam new to unix scripting and I want a split a string into 4 characters each, leaving the last two characters and convert the splitted values into binary. For example: string='ffd80012ffe20000ffebfffeffea0007fff0ffd70014fff1fff0fff0fff201' this should split as ffd8 0012 ffe2 . .... (5 Replies)
Discussion started by: srinivasayedla
5 Replies

4. Shell Programming and Scripting

Deleting all characters before the last occurrence of /

Hi All, I have a text file with the following text in it: file:///About/accessibility.html file:///About/disclaimer.html file:///About/disclaimer.html#disclaimer file:///pubmed?term=%22Dacre%20I%22%5BAuthor%5D file:///pubmed?term=%22Madigan%20J%22%5BAuthor%5D... (8 Replies)
Discussion started by: shoaibjameel123
8 Replies

5. Shell Programming and Scripting

split file based on group count

Hi, can some one please help me to split the file based on groups. like in the below scenario x indicates the begining of the group and the file should be split each with 2 groups below there are 10 groups it should create 5 files. could you please help? (4 Replies)
Discussion started by: hitmansilentass
4 Replies

6. Shell Programming and Scripting

remove last characters after %EOF (pdf binary file)

Hi, I want to know how I can remove the last characters of ANY pdf file. I read it under "od" in the command shell to see which were the last characters: $od corruptedfile.pdf -c When I see the file, I need to keep only the last characters, or "end of the file": %EOF (obviously keeping all... (1 Reply)
Discussion started by: diegugawa
1 Replies

7. Shell Programming and Scripting

Split binary file with pattern

Hello! Have some problem with extract files from saved session. File contains any kind of special/printable characters. DATA NumberA DATA DATA Begin DATA1.1 DATA1.2 NumberB1 DATA1.3 DATA1.4 End DATA DATA DATA Begin DATA2.1 DATA2.2 NumberB2 DATA2.3 DATA2.4 End DATA DATA ... (4 Replies)
Discussion started by: vvild
4 Replies

8. Shell Programming and Scripting

Split file by data group

Hi all, I'm having a little trouble solving a file split I need to get done. I have the following data: 1. Light 1A. Light Soft texture: it's soft color: the color value is that of something light vital statistics: srm: 23 og: 1.035 sp: 1.065 comment: this is nice if you like... (8 Replies)
Discussion started by: mkastin
8 Replies

9. Shell Programming and Scripting

Split these into many ...(/etc/group)!!

Guys Following input line is from /etc/group file.As we know last entry in a line of /etc/group is userlist (all the users belonging to that group). I need to splilt this one line into 3 lines as shown below (3 because userlist has 3 names in it). Input: lp:!:11:root,lp,printq ... (13 Replies)
Discussion started by: ak835
13 Replies

10. Shell Programming and Scripting

Split a binary file into 2 basing on 2 delemiter string

Hi all, I have a binary file (orig.dat) and two special delimiter strings 'AAA' and 'BBB'. My binary file's content is as follow: <Data1.1>AAA<Data1.2>BBB <Data2.1>AAA<Data2.2>BBB ... <DataN.1>AAA<DataN.2>BBB DataX.Y might have any length, and contains any kind of special/printable... (1 Reply)
Discussion started by: Averell
1 Replies
Login or Register to Ask a Question