What Cinderella did - sorting the bytes


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting What Cinderella did - sorting the bytes
# 1  
Old 12-01-2007
What Cinderella did - sorting the bytes

Hi,

I messed up a gzip'ed tarball by tar's verbose output:

tar -cvzf /dev/stdout /home | split -d -b 4000m - backupPART

I partioned the harddisk and installed fedora 8. My 6 GB '/home' directory is gone. All I got is a messy tarball.

I hope, I can do the Cinderella job: good bytes (gzip'ed data) separating from bad bytes (tar's verbose listing of copied full path file names starting with /home/az and ending at <LINEFEED>).

I'm a newbie and I think textutils are not appropriate. Then I found tr and dd.

I need a little help to realise a nice command or script. My idea is:

read next byte from backup.mess
check if byte.mess == '/' then
check if next 7 bytes == 'home/az' then delete all bytes until <LINEFEED>


But I don't know how to script that. I'll be glad if you give me a hint. Thank you.

Last edited by fedusr; 12-01-2007 at 01:00 PM..
# 2  
Old 12-01-2007
You should be ok. I create tarballs almost like that somewhat frequently. (That f and /dev/stdout was basicly a no-op, by default, stdout is where the output goes.) The v sends the listing to stderr and so it should have been displayed while the tar command was in progress. Meanwhile, the archive was going to stdout and it should have been ok. If this was not the case then every tar archive ever created with a v would have the same problem and the solution, if any, would be well known. If no solution had been found, then tar would have been rewritten decades ago to ignore the v during a c.
# 3  
Old 12-01-2007
Hi.

Using tar benignly to obtain a table of contents should tell you if the tar file is broken ... cheers, drl
# 4  
Old 12-01-2007
I'm using:
# tar --version
tar (GNU tar) 1.17
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.


and I can replicate that error:

# tar -cvzf /dev/stdout /etc | split -d -b 10m - etc.tgz.messy
tar: Entferne führende „/“ von Elementnamen
tar: Entferne führende „/“ von Zielen harter Verknüpfungen
# head -2 etc.tgz.messy00
/etc/
/etc/redhat-lsb/

# tar -czf /dev/stdout /etc | split -d -b 10m - etc.tgz.clean
tar: Entferne führende „/“ von Elementnamen
tar: Entferne führende „/“ von Zielen harter Verknüpfungen
# xxd etc.tgz.clean00 |head -2
0000000: 1f8b 0800 ac84 5147 0003 ec5c 7b73 dbb6 ......QG...\{s..
0000010: 96ef bfd6 a740 e4ec c876 2459 a41e 769c .....@...v$Y..v.

#

It seems to be tar's using <stdout> in stead of <stderr> for verbose output streaming. Tar's error messages (Entferne == German: 'remove') is <stderr> but didn't copy to the produced split (messed up) tarball.

Is there a utility for easy removing the file listings from the corrupted gzip'ed tarball?

Thank you.

Last edited by fedusr; 12-01-2007 at 01:20 PM..
# 5  
Old 12-01-2007
Quote:
Originally Posted by drl
Hi.

Using tar benignly to obtain a table of contents should tell you if the tar file is broken ... cheers, drl
Yes, it's corrupt (and the only backup I have):

# tar -xf backup
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Error exit delayed from previous errors
#

Last edited by fedusr; 12-01-2007 at 01:01 PM..
# 6  
Old 12-01-2007
I found this in the info page for gnu tar:
Quote:
Verbose output appears on the standard output except when an archive is being written to the standard output, as with ‘tar --create --file=- --verbose’ (‘tar cfv -’, or even ‘tar cv’—if the installer let standard output be the default archive). In that case tar writes verbose output to the standard error stream.
So leaving off the f option and the /dev/stdout would have still sent the archive to the pipeline and the listing to the terminal. So yes, you do have a garbled archive. I don't see an obvious way to script your proposed solution. But worse, I see acouple of potential problems with it...

Problem one: "/home" probably appears somewhere in the good part of the archive and your solution would then drop bytes from the archive.

Problem two: The output of the listing was probably buffered because it was going to a non-tty, so blocks of lising may be interspersed and you may have lines split between blocks. So "/home/this/that" might not have fit in the buffer. So "/home/th" was put in and the buffer was written. Then "is/that{lf}" is placed in the buffer. But meanwhile output buffers of the archive are being written.

Sorry for the bad news, but I doubt that the archive can be salvaged.
# 7  
Old 12-01-2007
Info is nice, I didn't know Info:-)

Quote:
Originally Posted by Perderabo
...
Problem one: "/home" probably appears somewhere in the good part of the archive and your solution would then drop bytes from the archive.
Actually I applied gzip ('z' option). I have a gzip'ed archive salted by file listings at random. A gzip'ed archive is like any gzip'ed file. Tar's listings didn't flow through the gzipper, I think. I assume that the gzipped (tar) file data is in good order, but plain tar's listings are randomly spread over the gzipped file.


Quote:
Originally Posted by Perderabo
...
Problem two: The output of the listing was probably buffered because it was going to a non-tty, so blocks of lising may be interspersed and you may have lines split between blocks. So "/home/this/that" might not have fit in the buffer. So "/home/th" was put in and the buffer was written. Then "is/that{lf}" is placed in the buffer. But meanwhile output buffers of the archive are being written.
Yes, I understand. The tar listing entries are cut resp. '/home/az/...[a-Z]...<LF>' can't be applied as a kind of search string.

Quote:
Originally Posted by Perderabo
..., but I doubt that the archive can be salvaged.
On the other hand, the problem is reduced to identify the bad bytes consisting of tar's instiled file names in my messy backup file. I don't see why it should not be possible in principle. I don't know about the gzip algorithm or the gzip file format. Maybe there are checksums, byte set ranges or something like that is useful for recovery.

However, are there any compression recovery kits available (based on zlib)? I've found 'grzrecover', but it's crashing on my file:-)

Thank you.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Programming

Best way to axe N bytes from the right?

say that i have strings that end in "text" foo.9.text, bar.10.text, baz.11.text and i want a C function to chop off the last four characters and replace each string with a '\0'; obviously with error-checking. Any ideas? TIA! (7 Replies)
Discussion started by: Gary Kline
7 Replies

2. Shell Programming and Scripting

Shell script - entered input(1-40 bytes) needs to be converted exactly 40 bytes

hello, suppose, entered input is of 1-40 bytes, i need it to be converted to 40 bytes exactly. example: if i have entered my name anywhere between 1-40 i want it to be stored with 40 bytes exactly. enter your name: donald duck (this is of 11 bytes) expected is as below - display 11... (3 Replies)
Discussion started by: shravan.300
3 Replies

3. UNIX for Dummies Questions & Answers

X bytes of 0, Y bytes of random data, Z bytes of 5, T bytes of 1. ??

Hello guys. I really hope someone will help me with this one.. So, I have to write this script who: - creates a file home/student/vmdisk of 10 mb - formats that file to ext3 - mounts that partition to /mnt/partition - creates a file /mnt/partition/data. In this file, there will... (1 Reply)
Discussion started by: razolo13
1 Replies

4. Programming

Copying 1024 bytes data in 3-bytes chunk

Hi, If I want to copy a 1024 byte data stream in to the target location in 3-bytes chunk, I guess I can use the following script. dd bs=1024 count=3 if=/src of=/dest But, I would like to know, how to do it via a C program. I have tried this with memcpy(), that did not help. (3 Replies)
Discussion started by: royalibrahim
3 Replies

5. Shell Programming and Scripting

Error PHP Fatal error: Allowed memory size of 67108864 bytes exhausted(tried to allocate 401 bytes)

While running script I am getting an error like Few lines in data are not being processed. After googling it I came to know that adding such line would give some memory to it ini_set("memory_limit","64M"); my input file size is 1 GB. Is that memory limit is based on RAM we have on... (1 Reply)
Discussion started by: elamurugu
1 Replies

6. Shell Programming and Scripting

Printing bytes

Hi, I have packed fields for which I am trying to print bytes. when I try cut -b2 it does print the 2nd byte besides that it also prints additional byte that I am not aware off:confused: data looks something like this Input: 135 246 when I try cut -b2 it gives me 30 4A but I... (12 Replies)
Discussion started by: ahmedwaseem2000
12 Replies

7. UNIX for Dummies Questions & Answers

Files with zero bytes

Hi All, I want to find zero byte files in the given folder for the given day. I know we can use find . -size 0 -mtime 0 But is there an option for file creation.? ls -lart | grep ' 0 Apr 24' will also work. Also is there any alternative using awk ? I want to know how to use awk in... (1 Reply)
Discussion started by: preethgideon
1 Replies

8. Shell Programming and Scripting

80 bytes per line ???

I am creating ASCII file from Oracle procedure into Unix box. I undertstand there is NO CRLF as I am writing it into one complete string .. but need to know what is best way to format the file with 80bytes per line only before handing over to another program. Thanks in advance regards... (14 Replies)
Discussion started by: u263066
14 Replies

9. Shell Programming and Scripting

Remove first N bytes and last N bytes from a binary file on AIX.

Hi all, Does anybody know or guide me on how to remove the first N bytes and the last N bytes from a binary file? Is there any AWK or SED or any command that I can use to achieve this? Your help is greatly appreciated!! Best Regards, Naveen. (1 Reply)
Discussion started by: naveendronavall
1 Replies
Login or Register to Ask a Question