Conversion from ASCII to binary for physical simulation code in C/C++


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Conversion from ASCII to binary for physical simulation code in C/C++
# 1  
Old 03-23-2011
Conversion from ASCII to binary for physical simulation code in C/C++

Good evening, everybody

A good math friend told me that it would be possible to shrink the size of the numerical datas I produce with a physical simulation code I programmed for my PhD.

It usually writes at least 100 GB to complete the simulation, and it seems that it is too high. There are some quotas to respect, and I have been told that it would possible to use "binary" datas instead of ASCII datas.

Here are the inputs of the problem.
- I calculate and produce the datas using a C/C++ simulation code.
- I make post treatments using bash and awk.
- I make plots using GNUplot 4.4 [splot works great with pm3d now Smilie Have a look to "Not so frequently asked question" website to be aware of GNUplot possibilities]

I found how to use binary mode in Gawk and GNUplot, but the main point is missing: which conversion should I make to decrease the volume of the datas in a loseless way ? They are all numerical datas, then I would convert double precision number coded in ASCII (visible with "more" command) into "binary" (which I assume to be a language abuse, because converting datafile into binary using "od" command just multiply the size of the initial file... :-( ).

Example. Each line of my generated main data file is:
3000 -3.9e-13 -4.24661e-05 0 299.964 300 1.50018e+16 1.50005e+16 0 00 1 0 0

What conversion do you recommand to optimize space needed ?

By writing this post, I feel that I should convert double precision numbers coded with characters in a ascii file into double precision datas coded with numbers only. How can I do that? "od" command is sufficient ?

Glad of any help,

Cheers from France,
Thibault

Last edited by Cybertib; 03-23-2011 at 07:18 PM.. Reason: added example
# 2  
Old 03-23-2011
You could compress the data? You can get 4:1 compression on text easily, and don't have to store the decompressed data on disk to use it. This will cause some more CPU usage though.

Code:
$ program_that_spews_gigs_of_data | gzip > data.gz
# Tell the gnuplot script to process "/dev/stdin" or "/proc/self/fd/0" instead
# of a filename
$ gunzip < data.gz | gnuplot file.script

Binary data could be smaller yet, but telling gnuplot how to use it, while possible, may be difficult.

Code:
$ gnuplot
> help plot
...
Subtopics available for plot:
    acsplines         axes              bezier            binary
    csplines          cumulative        datafile          errorbars
    errorlines        every             example           frequency
    index             iteration         kdensity          matrix
    parametric        ranges            sbezier           smooth
    special-filenames style             thru              title
    unique            using             with

Subtopic of plot: binary

 The `binary` keyword allows a data file to be binary as opposed to ASCII.
 There are two formats for binary--matrix binary and general binary.  Matrix
 binary is a fixed format in which data appears in a 2D array with an extra
 row and column for coordinate values.  General binary is a flexible format
 for which details about the file must be given at the command line.

 See `binary matrix` or `binary general` for more details.

Subtopics available for plot binary:
    general           matrix

Subtopic of plot binary: matrix

 Gnuplot can read matrix binary files by use of the option `binary` appearing
 without keyword qualifications unique to general binary, i.e., `array`,
 `record`, `format`, or `filetype`.  Other general binary keywords for
 translation should also apply to matrix binary.  (See `binary general` for
 more details.)

 In previous versions, `gnuplot` dynamically detected binary data files.  It
 is now necessary to specify the keyword `binary` directly after the filename.

 Single precision floats are stored in a binary file as follows:

       <N+1>  <y0>   <y1>   <y2>  ...  <yN>
        <x0> <z0,0> <z0,1> <z0,2> ... <z0,N>
        <x1> <z1,0> <z1,1> <z1,2> ... <z1,N>
         :      :      :      :   ...    :
...

As for how to write a binary value in C? Easy as pie. You just write it.
Code:
{
        double close_enough=3.14;
        FILE *fout=fopen("filename", "w");
        write(&close_enough, 1, sizeof(close_enough), fout);
        fclose(fout);
}

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 03-23-2011
Thank you for your answer.

Code:
double close_enough=3.14;
        FILE *fout=fopen("filename", "w");
        write(&close_enough, 1, sizeof(close_enough), fout);
        fclose(fout);

This is almost what I've used to produce the 4 times larger file. Smilie
I used fwrite and a buffer instead (to collect whole the datas to put in a line) as follows

Code:
sprintf(buffer, "%lg\t%lg\t%lg\t%lg\t%lg\t%lg\t%lg\t%lg\t%lg\t%lg\t%lg\t%lg\t%lg\t%lg\n", 
							fluence, t, x, z, Te(i,j), 
							Ti(i,j), eDensity(i,j), hDensity(i,j), Intensity(i), intensity(i,j), 
							H(i,j), phase(i,j), angleIncidence, angleRefracted(i,j));
						fwrite(buffer, 1, sizeof(buffer), MainOutput);

Do you think it is same as you proposed ?
# 4  
Old 03-23-2011
No, they're completely different... The variables in a C program start as binary, and one of sprintf's jobs is to convert binary into ASCII. You're converting binary numbers into an ASCII string then writing the ASCII string to file.

My example doesn't convert -- it writes the variable direct, as binary. You could read them in as ASCII with with fgets and sscanf, then just write them back out raw as binary.
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 03-23-2011
Ok then, that was a dummy question, and not an expert one...
Thank you for everything!

---------- Post updated at 12:17 AM ---------- Previous update was at 12:14 AM ----------

Quote:
Originally Posted by Corona688
No, they're completely different... The variables in a C program start as binary, and one of sprintf's jobs is to convert binary into ASCII. You're converting binary numbers into an ASCII string then writing the ASCII string to file.

My example doesn't convert -- it writes the variable direct, as binary. You could read them in as ASCII with with fgets and sscanf, then just write them back out raw as binary.
What about converting the old files into binary ?

"od" command seems good, but after several tests, conversion don't compress the data as expected. You gzip idea is interesting to not reach quotas, but the data need to be extracted before being treated with gnuplot in an ascii form. Using binary might much more fast, following the gnuplot documentation.

Idea?

Thank you again Smilie

Thibault
# 6  
Old 03-23-2011
Quote:
Originally Posted by Cybertib
What about converting the old files into binary ?
Like I said: read them in with fgets and sscanf, write them back out as binary with fwrite(). To make an example that works I'll need to see what your data looks like.
Quote:
"od" command seems good, but after several tests, conversion don't compress the data as expected.
No doubt: It does the precise opposite, dumping binary files in a variety of ASCII forms.
Quote:
You gzip idea is interesting to not reach quotas, but the data need to be extracted before being treated with gnuplot in an ascii form.

Using binary might much more fast, following the gnuplot documentation.
Could be. Also means that if you make a mistake in your C program, you've !$@^ed up 100 gigs of data faster than you ever could before.

It's just occurred to me that doing it in double-precision is pointless anyway; you've already processed it with single-precision awk before this point.

The gnuplot "matrix" format is out. It stores everything as floating point numbers, even the number of rows, which means you get 8 million rows max before it starts expressing the number of rows in exponential notation and ending up with slightly too few or too many.
Quote:
Idea?
Their documentation in this area seems especially impenetrable. I'm working on something.

---------- Post updated at 06:17 PM ---------- Previous update was at 06:07 PM ----------

Code:
// Generate a raw binary file for gnuplot to work with.
// compile with -lm

#include <stdio.h>
#include <math.h>

int main(void)
{
        int n, points=100;
        FILE *fp=fopen("sin.bin", "wb");

        for(n=0; n<points; n++)
        {
                float v[3]= { (2*3.14159*n)/(points-1) };
                v[1]=sin(v[0]);         v[2]=cos(v[0]);
                // writes three floats in a row.  x, sin(x), cos(x)
                fwrite(v, 3, sizeof(float), fp);
        }
        fclose(fp);
        return(0);
}

Code:
$ gcc graph.c -lm
$ ./a.out
$ ls -l sin.bin
-rw-r--r-- 1 monttyle monttyle 1200 Mar 23 18:15 sin.bin
$ gnuplot
> plot "sin.bin" binary format='%f%f%f' using 1:2;
(pops up a picture of a sine wave)
>

How to cram that into a surface plot or whatever I'm not sure but it's something to work from.

For doubles, use %lf.
# 7  
Old 03-24-2011
Hello back!
Thanks for your sin.bin example Smilie Helped me a lot to not waste time on this part!

Then I succeeded in using GNUplot with the binary datas I generate with my simulation code using

Code:
double buffer[]={x, y, Z(i,j)}; 
fwrite(buffer, sizeof(double), sizeof(buffer), Mesh);

which successfully:
- produce smaller and binary datas
- and permit to make plots with gnuplot using

Code:
gnuplot> plot "Mesh0.dat" binary format='%3lf' using 2:3 w d

But now, the following point is the last: how to select my datas using awk with binary datas ?
With ascii datas, I had to write in gnuplot the line
Code:
gnuplot> plot "< awk '{if($1==0) print }' Mesh0.dat" using 2:3 with dots

But what about filtering binary datas with awk ?

After few hours (!!) of walking around gnu.org awk guide, I some tricks to make conversions
- invoque awk --use-non-decimal-numbers
- use the following script gnu.org/software/gawk/manual/gawk.html#Ordinal-Functions or #Bitwise-Functions [did not posted 5 posts yet, can send full link sorry]

Quite hard! However, everybody use binary data in good simulation codes... How do they manage data filtering ?

When this gonna work, I think I will turn to perl to avoid such mess.
PS: we can use any program under gnuplot using the "<" operator in plot command, then if nothing is possible with binary in awk, we can add a step with "od" or perl. Those stuff are new for me!

Thanks a lot,
Thibault

---------- Post updated at 07:32 PM ---------- Previous update was at 07:27 PM ----------

Other methods here

how to read binary data file? in Awk
How do you use binary conversion in python/bash/awk
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

File conversion from Binary to ASCII though UNIX command

Hi All , I have a mainframe file which contains the data in EBCDIC format.I have downloaded this file from mainframe to windows in binary format(unreadable raw data).Now I want convert this file to ASCII format(readable format data) through Unix command.I have tried iconv but that is not working... (2 Replies)
Discussion started by: STCET22
2 Replies

2. Solaris

ASN Binary to ASCII

Dears, I need help to convert the binary file into ASCII format. Actually we have CDRs which is generated by telecom switch at this is in ASN1 format or binary format, I need to convert those binary formatted file into ASCII format using Perl, or shell scripting. Is there any way to solve... (3 Replies)
Discussion started by: PRINCESS_RORO
3 Replies

3. Shell Programming and Scripting

Bash - binary data to ascii code

Hello, With bash-script (ubunto server) I'm trying to read a binary file and, for each character, give back its ascii code (including extended ascii). For example: HEX => ASCII => PRINT f5 => 245 => õ 50 => 80 => P To load the binary file into a variable I tried in this way: ... (2 Replies)
Discussion started by: math4
2 Replies

4. Solaris

EBCDIC to ASCII Binary conversion issue on Solaris i-series Unix

Hi All, I am facing EBCDIC to ASCII Binary conversion on Solaris i-series Unix system. However this is working fine on Solaris Sparc Unix system. Input file having EBCDIC format does not work on Solaris i-series Unix system. Could you please tell me, what will be the root cause for same? (14 Replies)
Discussion started by: amodkavi
14 Replies

5. Shell Programming and Scripting

binary to ascii conversion

Hi, I have got a library file, created by compiling C code. The file information with "file" command, gives it a "application/x-archive" type file. I want to extract the release string of my software from this file, so that i can know which version of C files were used to create the lib. Can... (3 Replies)
Discussion started by: atulmt
3 Replies

6. UNIX for Dummies Questions & Answers

Ascii or Binary?

Hello all, I am working with ftp servers in unix, and always I have to get and put files but I don't know exactly if I have to get or put them as an ascii or binary. Some files that I use are: .txt, .sav, .fmb, .pct, .sh, .ksh, .dat, .log. Somebody can tell me what is the difference between... (2 Replies)
Discussion started by: Geller
2 Replies

7. Shell Programming and Scripting

binary to ascii

Hi, Is there a way to convert the binary file to ascii . the binary file is pipe delimited. from source the file(pipe delimited) is ftped to mainframe and from mainframe it is ftped to the unix box using binary format. Is there a way to change it back to ascii and view it? Thanks! (3 Replies)
Discussion started by: dnat
3 Replies

8. SCO

ascii to binary conversion in sco 5.0.5

Here is what I did . . . . I FTP'd several *.dbf zipped files from a SCO 5.0.5 server to winXP machine, and did not set the transfer mode to BIN, now when i was uncompressing these files in SCO 5.0.5 , it was giving "Bad Decode Table error. Is there a way to convert the *.dbf.Z files to Binary so... (1 Reply)
Discussion started by: sameek1211
1 Replies

9. Shell Programming and Scripting

ascii conversion

after converting my ebcidic file to ascii i get the following output 2097152+0 records in 1797345+1 records out Why is there a difference in number of records. Is the converson chopping off any records. All i am doing is just a conversion using the following script dd if=xaa cbs=152 ... (0 Replies)
Discussion started by: rintingtong
0 Replies

10. UNIX for Advanced & Expert Users

Convert ASCII to BINARY

Here is what I did . . . . I FTP'd several *.pdf files from a web site to a UNIX server, and did not set the transfer mode to BIN, now Adobe thinks that the documents are corrupted. Is there a way to convert the *.pdf files to Binary so that Adobe can open them again. I would just re-download... (2 Replies)
Discussion started by: pc9456
2 Replies
Login or Register to Ask a Question