Split binary file every occurrence of a group of characters


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Split binary file every occurrence of a group of characters
# 8  
Old 03-29-2013
One more thing, this one file is a continuous stream of 2GB data, is it?

--ahamed
# 9  
Old 03-29-2013
More than once, it is asserted that the following sequence is a constant:
Code:
C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20

but at least two examples violate that assertion:

Quote:
Originally Posted by PatrickE
~ Split ~
8057-4e17-46979706261D.mpg
00 00 01 BA 46 97 97 06 26 1D 00 AF CB F8 00 00 01 BB 00 12 80 57 E5 04 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 E0 07 D4 80 C1 0D 31 A5 E9 E9 FB 11 A5 E9 A3 99 1E 60 E8 00 00 01 B3 ~ 205KB of random data ~
~ Split ~
811d-6e17-44000FB47E39.mpg
00 00 01 BA 44 00 0F B4 7E 39 02 3B 4B F8 00 00 01 BB 00 12 81 1D A5 06 E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00 01 E0 07 D4 80 C1 0D 31 00 05 79 79 11 00 05 33 17 1E 60 E8 00 00 01 B3 ~ 791KB of random data ~
Is this new data sample faulty or is the original problem statement incorrect?

Regards,
Alister
# 10  
Old 03-29-2013
Wow I hadn't noticed that one yet. I figured once I figured out how to separate based on content I could adjust it when the occurrences are varied. For the example I may have made it a bit longer than it should have been. Let me see.

I've noticed the length of similarities very by the files bit rate.


00 00 01 BA ** ** ** ** ** ** ** ** ** ** F8 00 00 01 BB 00 12 ** ** ** ** E1 7F B9 E0 E8 B8 C0 20 BD E0 3A BF E0 02 00 00

Might be more accurate.
So as long as the first a few from the second and some form the third occur in the same way it should be accurate still.

I can check a few more files.
Those samples were from 3 different files.

---------- Post updated at 06:55 PM ---------- Previous update was at 06:44 PM ----------

It seems the constant might change but there are parts that I can tell don't change and only occur in the string sequence.
Adjusting each string if not to complicated I can fix that when needed.

Sorry about the inaccuracies.
Alister


ahamed Um each file is different in size. I have about 200 files that range between 1.5 gb and 2.3 gb and 400 that range between 100mb and 500mb. There were about 130 Videos before the corruption.
And none of the split outputs will be consistently the same size.
# 11  
Old 03-29-2013
If 00 00 01 BA may also change, why not keep that also as **? If you are sure that will not change, let us know.

--ahamed
# 12  
Old 03-29-2013
From searching the files the main big constants only repeat at those strings. I think thats the right word. If there are at least 10 bits combined they only repeat in the main string. If its less than 4 I have found them repeat in parts of the file.


ahamed From what I have found that is always the same. However 00 00 01 BA repeats every 2048 bits.

---------- Post updated at 07:05 PM ---------- Previous update was at 07:02 PM ----------

as for the 2048.
Heres a list from a file.

Code:
00 00 01 BA 44 B7 25 15 75 11 01 89 C3 F8 00 00 01 BB 00 12 80 C4 E1 00 E1 7F B9 E0 E8 B8 C0 20
00 00 01 BA 44 B7 25 1A 05 BD 01 89 C3 F8 00 00 01 E0 07 EC 81 C0 0A 31 2D CB 70 1B 11 2D CB 4C
00 00 01 BA 44 B7 25 1E 9C 11 01 89 C3 F8 00 00 01 E0 07 EC 81 00 00 FB 54 F9 26 0C 67 00 AE BF
00 00 01 BA 44 B7 25 23 2C BD 01 89 C3 F8 00 00 01 BD 07 EC 81 80 05 21 2D C9 7D 01 81 02 01 E4
00 00 01 BA 44 B7 25 27 BD 69 01 89 C3 F8 00 00 01 E0 07 EC 81 00 00 E4 88 DA 77 AF 75 DD B6 D3

---------- Post updated at 07:13 PM ---------- Previous update was at 07:05 PM ----------

Hum maybe my original plan to split into 2K bits might actually work. Problem is it take up twice the space.

Last edited by Scrutinizer; 03-30-2013 at 05:46 AM.. Reason: code tags
# 13  
Old 03-29-2013
Being a video file, I suppose this is a continuous stream of data right? I mean without line breaks. Even if we manage to get a working script, I think its going to take a lot of time processing a single 2GB file. How long does it take for the split command?

--ahamed
# 14  
Old 03-29-2013
When I first did tests a 2GB file didn't take but maybe 10 minutes splitting it into 2K bits. Well at most 30 That wasn't that bad. The renaming parts took the most time.

Once the video is recombine yes it is a continuous stream of data. From what i can tell the corrupted files are like 3 minutes one file another few minutes from another and then back to the first file and back and fourth through several files. I had to play them with VLC to know what was in them.

There are 3 types of mpeg codec that was used on these.
16:9 around 5mbps 720x480
4:3 around 12 mbps 720x480
4:3 around 1mbps 480x480

One of the files i fixed the before video played a frame from one video then the next frame from the next video back and fourth for a minutes.
It was so mixed. Quicktime froze every time i opened it.

---------- Post updated at 08:58 PM ---------- Previous update was at 08:55 PM ----------

From the ones i fixed there were no line breaks i think thats correct and no other noticeable data corruption.

---------- Post updated at 09:08 PM ---------- Previous update was at 08:58 PM ----------

I noticed some of the recovered bits are recovered twice in different files. I confirmed the bits with MD5 and VLC. How weird. So theres also duplicates just not in the same files.
Thats no problem. Since I split the file into its own folder. I should be able to use Dup guru to find the same files in other folders.

---------- Post updated at 09:10 PM ---------- Previous update was at 09:08 PM ----------

I am finding talking about this useful. I should have done this last week.

---------- Post updated at 09:16 PM ---------- Previous update was at 09:10 PM ----------

The main problem with the 2K option is the files end up taking 4K space I don't know why.
So 2GB becomes 4GB.

Last edited by PatrickE; 03-29-2013 at 11:03 PM.. Reason: I left out the word before lol
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a big file into multiple files based on first four characters

I have a requirement to split a huge file to smaller text files based on first four characters which look like ABCD 1234 DFGH RREX : : : : : 0000 Each of these records are OF EQUAL bytes with a different internal layout based on the above first digit identifier.. Any help to start... (5 Replies)
Discussion started by: etldev
5 Replies

2. UNIX for Dummies Questions & Answers

counting occurrence of characters in a string

Hello, I have a string like this 0:1:2:0:2:2:4:0:0:0:-200:500...... what i want is to break down how many different characters are there and their count. For example for above string it should display 0 - 5 times 1 - 1 times 2 - 3 times 4 - 1 times . . . I am stuck in writing... (8 Replies)
Discussion started by: exit86
8 Replies

3. Shell Programming and Scripting

split a string and convert to binary

Hi All, Iam new to unix scripting and I want a split a string into 4 characters each, leaving the last two characters and convert the splitted values into binary. For example: string='ffd80012ffe20000ffebfffeffea0007fff0ffd70014fff1fff0fff0fff201' this should split as ffd8 0012 ffe2 . .... (5 Replies)
Discussion started by: srinivasayedla
5 Replies

4. Shell Programming and Scripting

Deleting all characters before the last occurrence of /

Hi All, I have a text file with the following text in it: file:///About/accessibility.html file:///About/disclaimer.html file:///About/disclaimer.html#disclaimer file:///pubmed?term=%22Dacre%20I%22%5BAuthor%5D file:///pubmed?term=%22Madigan%20J%22%5BAuthor%5D... (8 Replies)
Discussion started by: shoaibjameel123
8 Replies

5. Shell Programming and Scripting

split file based on group count

Hi, can some one please help me to split the file based on groups. like in the below scenario x indicates the begining of the group and the file should be split each with 2 groups below there are 10 groups it should create 5 files. could you please help? (4 Replies)
Discussion started by: hitmansilentass
4 Replies

6. Shell Programming and Scripting

remove last characters after %EOF (pdf binary file)

Hi, I want to know how I can remove the last characters of ANY pdf file. I read it under "od" in the command shell to see which were the last characters: $od corruptedfile.pdf -c When I see the file, I need to keep only the last characters, or "end of the file": %EOF (obviously keeping all... (1 Reply)
Discussion started by: diegugawa
1 Replies

7. Shell Programming and Scripting

Split binary file with pattern

Hello! Have some problem with extract files from saved session. File contains any kind of special/printable characters. DATA NumberA DATA DATA Begin DATA1.1 DATA1.2 NumberB1 DATA1.3 DATA1.4 End DATA DATA DATA Begin DATA2.1 DATA2.2 NumberB2 DATA2.3 DATA2.4 End DATA DATA ... (4 Replies)
Discussion started by: vvild
4 Replies

8. Shell Programming and Scripting

Split file by data group

Hi all, I'm having a little trouble solving a file split I need to get done. I have the following data: 1. Light 1A. Light Soft texture: it's soft color: the color value is that of something light vital statistics: srm: 23 og: 1.035 sp: 1.065 comment: this is nice if you like... (8 Replies)
Discussion started by: mkastin
8 Replies

9. Shell Programming and Scripting

Split these into many ...(/etc/group)!!

Guys Following input line is from /etc/group file.As we know last entry in a line of /etc/group is userlist (all the users belonging to that group). I need to splilt this one line into 3 lines as shown below (3 because userlist has 3 names in it). Input: lp:!:11:root,lp,printq ... (13 Replies)
Discussion started by: ak835
13 Replies

10. Shell Programming and Scripting

Split a binary file into 2 basing on 2 delemiter string

Hi all, I have a binary file (orig.dat) and two special delimiter strings 'AAA' and 'BBB'. My binary file's content is as follow: <Data1.1>AAA<Data1.2>BBB <Data2.1>AAA<Data2.2>BBB ... <DataN.1>AAA<DataN.2>BBB DataX.Y might have any length, and contains any kind of special/printable... (1 Reply)
Discussion started by: Averell
1 Replies
Login or Register to Ask a Question