ConCATenating binaries but excluding last bytes from each file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers ConCATenating binaries but excluding last bytes from each file
# 1  
Old 12-02-2013
ConCATenating binaries but excluding last bytes from each file

Hi there, shameful Linux Newbie here Smilie

I was wondering if you could help with my problem...
I have plenty of files I'd like to concatenate. I know how to basically use cat command but that won't be enough from what I need : excluding the last xx bytes from files before assembling since there's some redundancy


the interesting part is the xx bytes to remove is determined by the filename themselves, ie


file 1 is named something like 0-54548
file 2 is named something like 54475-648459
file 3 is named 648345-1269494
etc


so for file 1 I would have to drop bytes between 54475 till the end before joining file 2. etc
it would be easier if it was always the exact same bytes value to remove , alas there are some variations, so it must be calculated from filenames.


I feel like it's possible to script something with not much lines within a loop, but my unix knowledge is way too rudimentary for now ^^


I also read about dd command which could probably be of some help to generate the files without the unwanted part, now I'm a bit clueless about extracting character chains from filenames, turning into numeric values, doing the maths and use this dd or whatever else command so I can assemble the resulting files...
# 2  
Old 12-02-2013
Telling us that filenames are something like x-y where x and y are strings of digits isn't enough for us to figure out how to determine which file is file1, file2, ...

Give us details.
  1. How do you know which files are to be combined and in what order?
  2. Are the names of the files just two strings of digits separated by a single minus sign, or are the other characters in the name that need to be skipped over (or used to determine the order in which they're to be concatenated?
  3. How do you determine the output file name (or do you just want the output to be written to stout?
  4. Can we choose which shell we want to use, or have you decided the only one shell should be used? If only one, which one?
# 3  
Old 12-02-2013
ok, so :
1 - "strings of digits" are bytes positions from the final file I want to recreate. so what I call files "1" "2" etc are exactly in the same order as these string digits in filenames .
if one file is called 0-15000, and another is called 14400-30000, another is called 29800-40000 and a last one 39750-55421
, then it means precisely I want to create a file from byte 0 to byte 55421 minus the redundant bytes (14400-15000 are at the end of 0-15000 and the beginning of 14400-30000, and so on...)

2 yes ("just two strings of digits separated by a single minus sign")

3 let's call it 0-highest digit, or "final" it that's too complicated, I don't mind ^^

4 I'm afraid I'm too newbie to even know what are the different shells. All I know is I use my Xfce4-terminal on my Xubuntu 13.10 to type echo $SHELL , I get "/bin/bash" if that's of any help...

# 4  
Old 12-03-2013
This isn't highly efficient (since it will copy the redundant bytes to the output file multiple time, doesn't try to align block sizes to disk block boundaries, uses small input block sizes if you have a large file that starts at a small [but non-zero] offset), but it seems to do what you want. You can make it considerably more complex to verify that the ranges of bytes specified by the input files don't leave any holes in the output file and to avoid copying duplicated data more than once. The checks for names that contain a "-" but are not just a string of digits followed by a "-" followed by a string of digits could be simplified with ksh and bash specific constructs. But the constructs used here should be portable to any shell that handles basic POSIX shell parameter expansion requirements correctly.

This is too simplistic to work if you want to process a file with a starting offset that is close to your process' maximum available address space. (In other words it probably won't work for terabyte sized files where the starting offset in one or more of your input files is relatively large.) But, it should give you a starting point for a more advanced script:
Code:
#!/bin/ksh
IAm=${0##*/}
of="final.$$"
> $of   # create zero-length temporary output file
maxe=0
for i in *-*
do      printf "Looking at \"%s\"\n" "$i"
        b=${i%%-*}
        e=${i##*-}
        if [ -z "$b" ] || [ -z "$e" ] || [ "$i" != "$b-$e" ] ||
                [ "$b" != "${b#*[!0-9]}" ] || [ "$e" != "${e#*[!0-9]}" ]
        then    continue
        fi
        if [ $e -gt $maxe ]
        then    maxe=$e
        fi
        if [ $b -gt 0 ]
        then    seek="ibs=102400 obs=$b seek=1"
        else    seek='bs=10240'
        fi
        echo starting dd if=$i of=$of $seek conv=notrunc
        dd if=$i of=$of $seek conv=notrunc
done
if [ $maxe -gt 0 ]
then    printf "Creating 0-%d\n" $maxe
        mv $of 0-$maxe
        exit
fi
rm $of
printf "%s: No input files found; no output file created.\n" "$IAm" >&2
exit 1

I use the Korn shell, but this script will work with any POSIX conforming shell without changing anything other than the first line in the script to specify your shell.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 12-03-2013
fantastic, works perfect at first try, thanks a lot Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Get file's first x bytes

is there a better way to do this: head -c 10000k /var/dump.log | head -c 6000k unfortunately, the "-c" option is not available on sun solaris. so i'm looking at "dd". but i dont know how to use it to achieve the same exact goal as the above head command. this needs to work on both solaris... (5 Replies)
Discussion started by: SkySmart
5 Replies

2. Shell Programming and Scripting

Shell script - entered input(1-40 bytes) needs to be converted exactly 40 bytes

hello, suppose, entered input is of 1-40 bytes, i need it to be converted to 40 bytes exactly. example: if i have entered my name anywhere between 1-40 i want it to be stored with 40 bytes exactly. enter your name: donald duck (this is of 11 bytes) expected is as below - display 11... (3 Replies)
Discussion started by: shravan.300
3 Replies

3. UNIX for Dummies Questions & Answers

X bytes of 0, Y bytes of random data, Z bytes of 5, T bytes of 1. ??

Hello guys. I really hope someone will help me with this one.. So, I have to write this script who: - creates a file home/student/vmdisk of 10 mb - formats that file to ext3 - mounts that partition to /mnt/partition - creates a file /mnt/partition/data. In this file, there will... (1 Reply)
Discussion started by: razolo13
1 Replies

4. Programming

Copying 1024 bytes data in 3-bytes chunk

Hi, If I want to copy a 1024 byte data stream in to the target location in 3-bytes chunk, I guess I can use the following script. dd bs=1024 count=3 if=/src of=/dest But, I would like to know, how to do it via a C program. I have tried this with memcpy(), that did not help. (3 Replies)
Discussion started by: royalibrahim
3 Replies

5. Shell Programming and Scripting

Error PHP Fatal error: Allowed memory size of 67108864 bytes exhausted(tried to allocate 401 bytes)

While running script I am getting an error like Few lines in data are not being processed. After googling it I came to know that adding such line would give some memory to it ini_set("memory_limit","64M"); my input file size is 1 GB. Is that memory limit is based on RAM we have on... (1 Reply)
Discussion started by: elamurugu
1 Replies

6. Programming

how to inspect the bytes in a file?

What is the easiest way to inspect the bytes stored in a file? Ideally, If my file was 10 bytes each of which had only the high bit set, I'd be able to browse for it and get output like this: 01 - 10000000 02 - 10000000 03 - 10000000 04 - 10000000 05 - 10000000 06 - 10000000 07 -... (7 Replies)
Discussion started by: sneakyimp
7 Replies

7. UNIX for Dummies Questions & Answers

Bytes of character in file

Hi, How do I check for the total bytes of character used by a file? Can I used a od command to check? Thanks. (1 Reply)
Discussion started by: user50210
1 Replies

8. Shell Programming and Scripting

Concatenating the two lines in a file

hi My requirement is i have a file with some records like this file name ::xyz a=1 b=100,200 ,300,400 ,500,600 c=700,800 d=900 i want to change my file a=1 b=100,200,300,400 c=700,800 d=900 if record starts with " , " that line should fallows the previous line.please give... (6 Replies)
Discussion started by: srivsn
6 Replies

9. Shell Programming and Scripting

Remove first N bytes and last N bytes from a binary file on AIX.

Hi all, Does anybody know or guide me on how to remove the first N bytes and the last N bytes from a binary file? Is there any AWK or SED or any command that I can use to achieve this? Your help is greatly appreciated!! Best Regards, Naveen. (1 Reply)
Discussion started by: naveendronavall
1 Replies

10. UNIX for Dummies Questions & Answers

Take a file from the system and put on tape and reset the file to 0 bytes

:mad: I did this the other day but one of my support personnel removed my history so i could call it back up to remeber the exact command since i am air-headed at times. I am trying to take a 30 MEG file off the system and drop it to tape then i want to make the file go back to being 0 bytes so... (1 Reply)
Discussion started by: JackieRyan26
1 Replies
Login or Register to Ask a Question