How to replicate Ruby´s binary file reading with Java?


 
Thread Tools Search this Thread
Top Forums Programming How to replicate Ruby´s binary file reading with Java?
# 1  
Old 11-20-2014
How to replicate Ruby´s binary file reading with Java?

Hello to all guys,

Maybe some expert could help me.

I have a working ruby script shown below that reads a big binary file (more than 2GB). The chunks of data I want to analyze
is separated by the sequence FF47 withing the binary. So, in the ruby script is defined as "line separator" = FF47 ($/="\xff\x47")
in order to read the file "line by line" avoiding to load the entire big file in memory.

The program works great and now I'm trying to apply this algorithm in Java. I've seen built-in ways in java to read not big binary files
but I don't know how to set as line separator the sequence FF47.

How can I do this?
Code:
#!/usr/bin/env ruby -E BINARY
# -*- encoding: utf-8 -*-
 
BEGIN{  $/="\xff\x47".force_encoding("BINARY")   }   
 
IO.foreach(ARGV[0]){ |l| 
        CurrentLine = l.unpack('H*')[0]
  ### Process each line stored in variable "CurrentLine" as desired ###
  ### ...
  ### ...
} if File.exists?(ARGV[0])

Thanks for any help.

Regards
# 2  
Old 11-21-2014
Quote:
Originally Posted by Ophiuchus
but I don't know how to set as line separator the sequence FF47.
I am no expert in Java, but i don't think this is possible. You probably have to do it yourself, like in good old C. You open a file (fopen()) and use fseek(), fread() and ftell() to find what you search for. The functions are part of the standard library, so they should work the same way in C and Java.

I hope this helps.

bakunin
# 3  
Old 11-21-2014
Quote:
Originally Posted by bakunin
I am no expert in Java, but i don't think this is possible. You probably have to do it yourself, like in good old C. You open a file (fopen()) and use fseek(), fread() and ftell() to find what you search for. The functions are part of the standard library, so they should work the same way in C and Java.

I hope this helps.

bakunin

AFAIK there's no way to use the stdio-based family of library calls (fopen(), etc.) and have them treat the binary sequence "FF47" as a "line" separator.

Even if you could set your LOCALE envvals to use a character set that uses "FF47" as a 16-bit character newline character (if one even exists), the fact that it's a binary file could break things - the "newline" character might not always be in a 16-bit boundary.

The only way to do what the OP asked is to read the file as a binary file, and search for the "FF47" bits. And hope that the way the file was written wasn't in a way that's endian-dependent. Especially when using Java on a little-endian machine (x86, most ARM OS's) as Java tends to read/write data in network byte order - big endian - for portability.
# 4  
Old 11-21-2014
Quote:
Originally Posted by achenle
AFAIK there's no way to use the stdio-based family of library calls (fopen(), etc.) and have them treat the binary sequence "FF47" as a "line" separator.
You do not treat them as a "line separator", but simply search for the sequence and then read what's after. Using stdios function calls doesn't have "line separators" because there is no such thing as a "line" which could be separated. Sorry for not mentioning that explicitly, i thought it was obvious.

bakunin

Last edited by bakunin; 11-21-2014 at 11:34 AM.. Reason: typo
# 5  
Old 11-21-2014
Quote:
Originally Posted by bakunin
You do not treat them as a "line separator", but simply search for the sequence and then read what's after. Using stdios function calls doesn't have "line separators" because there is no such thing as a "line" which could be separated. Sorry for not mentioning that explicitly, i thought it was obvious.

bakunin
Actually, there are two stdio-based calls that process input line-by-line - gets() and fgets().

And yes, the only way to do what the OP wants in Java is to search through the data looking for the binary separator sequence.
# 6  
Old 11-21-2014
Hello bakunin and achenle,

Thanks for your answers. Sounds great an option that reads line by line from a binary file in C using get(), fget() as you said, but since the "lines" or chunks are separated by FF65 and in my original ruby code I process very well the chunks with regular expressions, I'm afraid I cannot use C for this task since I thinks it doesn't has support for Perl regular expressions fashion, I'm not sure.

Regards
# 7  
Old 11-21-2014
Quote:
Originally Posted by Ophiuchus
Hello bakunin and achenle,

Thanks for your answers. Sounds great an option that reads line by line from a binary file in C using get(), fget() as you said, but since the "lines" or chunks are separated by FF65 and in my original ruby code I process very well the chunks with regular expressions, I'm afraid I cannot use C for this task since I thinks it doesn't has support for Perl regular expressions fashion, I'm not sure.

Regards
The C language doesn't have the regular expressions that Perl uses but it has its own built-in regular expressions the same as sed and awk so look up the man page of regexec / regcomp etc...
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Solaris

Reading binary content

Dear Gurus I am stuck with the peice of work and do not know from where to start. I get a machine generated file which is binary file contain binary data, i want to read binary data as it is without converting into any other format. i want to read byte by byte. Please let me know what... (24 Replies)
Discussion started by: guddu_12
24 Replies

2. Programming

help with reading a binary file and fseek

this is my code and no matter what record number the user enters i cant get any of the records fields to read into the structure acct. What am i doing wrong? #include <stdio.h> typedef struct { char name; int number; float balance; } acct_info_t; int main (int... (0 Replies)
Discussion started by: bjhum33
0 Replies

3. Programming

reading binary files

#include <stdio.h> /* typedef struct { char name; int number; float balance; } acct_info_t; */ int main() { FILE *fptr; fptr = fopen("acct_info", "r"); int magic = 5; fseek(fptr,3,SEEK_SET); fread(&magic,sizeof(int),1,fptr);... (7 Replies)
Discussion started by: robin_simple
7 Replies

4. Shell Programming and Scripting

Running remote system shell script and c binary file from windows machine using java

Hi, I have an shell script program in a remote linux machine which will do some specific monitoring functionality. Also, have some C executables in that machine. From a windows machine, I want to run the shell script program (If possible using java). I tried with SSH for this. but, in... (1 Reply)
Discussion started by: ram.sj
1 Replies

5. Shell Programming and Scripting

reading fixed length flat file and calling java code using shell scripting

I am new to shell scripting and I have to to the following I have a flat file with storename(lenth 20) , emailaddress(lenth 40), location(15). There is NO delimiters in that file. Like the following str00001.txt StoreName emailaddress location... (3 Replies)
Discussion started by: willywilly
3 Replies

6. Programming

Reading a binary file in text or ASCII format

Hi All, Please suggest me how to read a binary file in text or ASCII format. thanks Nagendra (3 Replies)
Discussion started by: Nagendra
3 Replies

7. Shell Programming and Scripting

Reading Numerical Binary Data using KSH

Hi, I've searched and couldn't find anyone else with this problem. Is there anyway (preferably using ksh - but other script languages would do) that I can read in binary float data into a text file. The data (arrays from various stages of radar processing) comes in various formats, but mainly... (3 Replies)
Discussion started by: Jonny2Vests
3 Replies

8. Programming

Binary not getting executed from Java on Solaris environment

In the Java programme, I am calling function, "Runtime.getRuntime().exec( cmdarray ); " with the array of arguments in which first argument is the binary(C-executable) file and argv1,argv2 and so on. This will be executed on Sun OS system.. I can execute using "sh -c cmdarray" on the shell... (0 Replies)
Discussion started by: shafi2all
0 Replies

9. Programming

Reading from a binary file

I'm having trouble with reading information back into a program from a binary file. when i try to display the contents of the file i get a Memory fault(coredump). would anyone be able to assist? this is my fread line fread(&file_data,sizeof(struct book_type),1,fileSave); ive also tried it without... (3 Replies)
Discussion started by: primal
3 Replies
Login or Register to Ask a Question