Text File with Binary Values processing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Text File with Binary Values processing
# 1  
Old 03-27-2017
Apple Text File with Binary Values processing

Hello all,
I have a txt file containing millions of lines. Below is the example:

Code:
{tx:be} head -50 file.txt 
Instr1: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

Instr1: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000000000000000000000001100001010000000010011101001111000000000100010100111111110010000000000000000000000000000000000000001 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000000000000000000000001100001010000000010010101001111000000000100010100111111111000000000000000000000000000000000000000001 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000000000000000000000001100001010000000000000101000011000000000100010101000000000010000000000000000000000000000000000000001 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000110110000000000000000100001010000000010011100101000000000000100010101000000001110000000000000000000000000000000000000001 

Instr1: 000000000100000000000000000001111110000000000000000000010000000000100010101000000010110000000000000010001001111011011100000111 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000110111000111110000100100001010000000011100100101000000000000100010011110110111000000000000000000000000000000000000000001 

Instr1: 000001010110000000000100000100000000101001011101100110100000000000100010011110111000000000000000000000000000000000000000000001 

Instr1: 000001010110000000000011000100000000100101011101100110100000000000100010011110111001000000000000000000000000000000000000000001 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000110111000111101110100100001010000000011100100101000000000000100010011110111010100000000000000000000000000000000000000001 

Instr1: 000000110111000111101110100100001010000000011100100101000000000000100010011110111010100000000000000000000000000000000000000001 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000100110000000000001011100001110000000010011100100010000000000100010100111111110110000000000000000000000000000000000000001 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000000000000000000000001100001010000000000000101000011000000000100010101000000000010000000000000000000000000000000000000001 

Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

Instr1: 000000110110000000000000000100001010000000010011100101000000000000100010101000000001110000000000000000000000000000000000000001 

Instr1: 000000000100000000000000000001111110000000000000000000010000000000100010101000000010110000000000000010001001111011011100000111

There are empty lines which I take off using "sed 's/^$/d' file.txt"

Now the problem is, I want to find number of uniq values on the binary field. Here is what I want:
in the binary values, I was to find how many times the uniq values in field [57:50] are occuring. (MSB -> 125, LSB -> 0). There are total 126 bits in the lines.
I have sorted the files using sort:
Code:
sort -k2.50,2.57 file.txt

output:
{tx:be} tail -50 file.txt 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000011111111110000000000100010100000001101110000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000100111111110000000000100010100000100101100000000000000000000000000000000000000001 
Instr1: 000000000010000000000000000010000010100111000100111111110000000000100010100000100101100000000000000000000000000000000000000001 
Instr1: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
Instr1: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

As you can see, the files are sorted based on the fields that I am interested in. Now I am not sure how to find the Number of occurence (uniq) in those fields.

I have tried the uniq command, but surely it doesn't help:
Code:
uniq -c -f1 -s75 -w69 file.txt

Output: (truncated)
2751026 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 
     23 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 
     23 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000001 
     23 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000001 
     24 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000011000000000000000000000000000000000000000001 
     24 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000 
     22 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000101000000000000000000000000000000000000000001 
     19 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000000000000000000000001 
     17 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000111000000000000000000000000000000000000000000 
     18 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000001 
     18 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000000000000000000000000000000000000001 
     17 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000001010000000000000000000000000000000000000000001 
     14 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000001011000000000000000000000000000000000000000001 
      8 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000001100000000000000000000000000000000000000000001 
     11 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000001101000000000000000000000000000000000000000001 
      6 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000001110000000000000000000000000000000000000000001 
      5 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000001111000000000000000000000000000000000000000001 
      1 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000 
      2 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000001 
      4 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000010001000000000000000000000000000000000000000001 
      4 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000010010000000000000000000000000000000000000000001 
      4 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000010011000000000000000000000000000000000000000001 
      4 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000010100000000000000000000000000000000000000000001 
      3 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000010101000000000000000000000000000000000000000001 
     11 Instr1: 000000000100000000000000000000000000000000000000000000001000000000100001110001111111000000000000000010000111001000010010000001 
     12 Instr1: 0000000001000000000000000000000000000000000000000000000010

What I am looking for in output is perhaps: (i am randomly putting values here)
Code:
2000 Instr1[or any sutitable text]: '00000000'
150 Instr1:  '10001100'
120 Instr1: '00100000'
and so on

I think the 'uniq' command should be ok, but I am open to anything.

Thanks in advance.
# 2  
Old 03-27-2017
How about
Code:
awk '/^ *$/ {next} {C[substr ($0, 50, 8)]++} END {for (c in C) printf "%4d Instr1: %s\n", C[c], c}' file
   3 Instr1: 00111001
   2 Instr1: 10111011
   2 Instr1: xxxxxxxx
  11 Instr1: 00000000
   2 Instr1: 00000001
   1 Instr1: 00100101
   4 Instr1: 00100111

You may want to change $0 to $2 if you need to count char positions only within the string of binary digits. sort to taste...

Last edited by RudiC; 03-27-2017 at 04:52 PM..
This User Gave Thanks to RudiC For This Post:
# 3  
Old 03-27-2017
If I understand your file contents (from your two examples) and assuming that the 1st field in your input is not always Instr1:, you might want to try this slight modification to RudiC's suggestion:
Code:
#!/bin/ksh
file=${1:-file.txt}
awk '
!/^ *$/{c[$1 OFS substr($2, 50, 8)]++
}
END {	for(v in c)
		printf("%12d %s\n", c[v], v)
}' "$file" | sort -k1,1nr -k2,3

If file.txt contains you 1st sample (unsorted with blanks lines) and file2.txt contains your 2nd sample (sorted with no blank lines), the above script when invoked without operands produces the output:
Code:
           9 Instr1: 00000000
           5 Instr1: 00101000
           2 Instr1: 00000010
           2 Instr1: 00110100
           2 Instr1: 01000011
           2 Instr1: 01001111
           2 Instr1: xxxxxxxx
           1 Instr1: 00100010

and, if it is invoked with the operand file2.txt, produces the output:
Code:
          48 Instr1: 11111110
           2 Instr1: xxxxxxxx

This was written and tested using a Korn shell, but will work with any shell that uses Bourne shell syntax and understands the parameter expansions required by the POSIX standards (such as bash, dash, ksh, and zsh).

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 03-28-2017
Is there any way to count from the LSB (Which is start from the right hand side of the binaries in this case)?
# 5  
Old 03-28-2017
Yes, of course. Simple arithmetics. Any idea from your side?
# 6  
Old 03-28-2017
Quote:
Originally Posted by RudiC
Yes, of course. Simple arithmetics. Any idea from your side?
I am assuming simple C[-1] would start from the end of line.
# 7  
Old 03-28-2017
In shell, yes. In awk, sorry, no. Use length function and subtract target position.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Read record from the text file contain multiple separated values & assign those values to variables

I have a file containing multiple values, some of them are pipe separated which are to be read as separate values and some of them are single value all are these need to store in variables. I need to read this file which is an input to my script Config.txt file name, first path, second... (7 Replies)
Discussion started by: ketanraut
7 Replies

2. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

3. Shell Programming and Scripting

Text processing of file

I have a text file which is a dataset. and I need to convert it into a CSV format The file is as follows : First line : -1 3:1 11:1 14:1 19:1 39:1 42:1 55:1 64:1 67:1 73:1 75:1 76:1 80:1 83:1 Second line " +1 5:1 11:1 15:1 32:1 39:1 40:1 52:1 63:1 67:1 73:1 74:1 76:1 78:1 83:1 There are a... (6 Replies)
Discussion started by: ajayram
6 Replies

4. Programming

Reading a binary file in text or ASCII format

Hi All, Please suggest me how to read a binary file in text or ASCII format. thanks Nagendra (3 Replies)
Discussion started by: Nagendra
3 Replies

5. UNIX for Advanced & Expert Users

Converting Binary decimal coded values to Ascii Values

Hi All, Is there any command which can convert binary decimal coded values to ascii values... i have bcd values like below оооооооооооо0о-- -v - Pls suggest a way to convert this. Thanks, Deepti.Gaur (3 Replies)
Discussion started by: gaur.deepti
3 Replies

6. UNIX for Dummies Questions & Answers

text file processing

Hello! There is a text file, that contains hierarchy of menues, like: Aaaaa->Bbbbb Aaaaa->Cccc Aaaaa-> {spaces} Ddddd (it means that the full path is Aaaaa->Cccc->Ddddd ) Aaaaa-> {more spaces} Eeeee (it means that the full path is Aaaaa->Cccc->Ddddd->Eeeee ) Fffffff->Ggggg... (1 Reply)
Discussion started by: alias47
1 Replies

7. UNIX for Dummies Questions & Answers

How to convert binary Unix file to text

Hi all, I have a print control file (dflt) for Oracle which is in binary. As I am going to develope an application in Window environment, I would like to reference the dflt file. But it is in binary format and I cannot access it. Anyone can suggest me how to convert the file into text or... (5 Replies)
Discussion started by: user12345
5 Replies

8. UNIX for Dummies Questions & Answers

Binary data to text file conversion

Dear Sir; i want to know how the binary data convert to text file or readablw format (ASCII).If possible pl. help me for the software and where it is available for download. i.e. (1 Reply)
Discussion started by: auro123
1 Replies

9. UNIX for Dummies Questions & Answers

Processing a text file

A file contains one name per line, such as: john doe jack bruce nancy smith sam riley When I 'cat' the file, the white space is treated as a new line. For example list=`(cat /path/to/file.txt)` for items in $list do echo $items done I get: john doe (1 Reply)
Discussion started by: TheCrunge
1 Replies

10. UNIX for Advanced & Expert Users

Modifying binary file by editing Hex values ?

Hi , i'm using special binary file (lotus notes) and modifying an hexadecimal address range with windows hex editor and it works fine ! The file is an AIX one and i'm forced to transfert (ftp) it before modifying it and re-transfert ! NOW i would do this directly under AIX ! I can... (4 Replies)
Discussion started by: Nicol
4 Replies
Login or Register to Ask a Question