Visit Our UNIX and Linux User Community


bash: read file line by line (lines have '\0') - not full line has read???


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting bash: read file line by line (lines have '\0') - not full line has read???
# 1  
Old 02-23-2010
bash: read file line by line (lines have '\0') - not full line has read???

I am using the while-loop to read a file.
The file has lines with null-terminated strings (words, actually.)
What I have by that reading - just a first word up to '\0'!
I need to have whole string up to 'new line' - (LF, 10#10, 16#A)

What I am doing wrong?
Code:
   #make file 'grb' with '\0's :
--0223-112518:~/develop/src> printf \
> "hello\0 word\0 done\0\n"\
> "this\0 next\0 line\0\n"\
> "last\0 ln\n"\
> >grb
--0223-112602:~/develop/src>
  # now reading it line by line:
--0223-112603:~/develop/src> while IFS= read ln; do echo "$ln"; done < grb
hello
this
last
  # - only first words are printed ?!?!
--0223-112714:~/develop/src> cat grb
hello word done
this next line
last ln
--0223-112738:~/develop/src>

How to get whole line inside of the loop?

Thanks!
# 2  
Old 02-23-2010
I'm not sure if bash can handle null bytes (usually they don't belong to text files).
As a quick fix I would use another tool for parsing files containing null bytes.

By the way, some shells (I tried with zsh and pdksh using read -r) seem to handle it.
# 3  
Old 02-23-2010
Tools Can you use the tr command to trim out those characters?

Code:
tr -d '\0' < textfile > newfile

# 4  
Old 02-23-2010
Quote:
Originally Posted by radoulov
I'm not sure if bash can handle null bytes (usually they don't belong to text files).
As a quick fix I would use another tool for parsing files containing null bytes.

By the way, some shells (I tried with zsh and pdksh using read -r) seem to handle it.
I have checked it here (Solaris) and -r does not make a trick.

Quote:
Originally Posted by joeyg
Can you use the tr command to trim out those characters?
Yes, I can, but it is as 'hands up' on problem Smilie
By now I do it in perl, and that is pretty useful way, but I'd like to now how to be in such situation.
What I do not like in the 'tr' - need to create another file. Also, the removing (-d) is not useful as I need to read positioned fields, but replacing with spaces works.
Code:
 > cat grb|tr '\0',' '| while IFS= read ln; do echo $ln; done;
hello word done
this next line
last ln
 >

I am not sure: that way with the 'cat ..', it is, again, done on whole file, isn't it?
(And, it seems to me, there is some glitch in bash-2.05 in processing pipe by while (something about that I've experiensed about half year ago.) Seems something with asigning variables...
So, another point why I do not like that solution by 'tr..'
# 5  
Old 04-14-2010
I realise this thread is over a month old, but I'll add my input even if it's no longer useful to the original poster, but just for others browsing. I realise the following code is far from elegant (ugly would be a good word), but it "works" (namely, it allows you to read full lines including nulls into a string for processing, one line at a time, without losing the nulls, using only Bash builtins).

Bash treats strings the same as C does (null-terminated), so it is obviously impossible to read in strings containing true nulls. The following code only breaks from the loop when the "read" command returns null (\0) twice in a row without other intervening text. It optionally adds an escaped null back into the string with each non-terminal read.

Generally it is far less painful to use something like Perl for this, but if you really are stuck in Bash and need a solution without external tools, maybe this will help.

Code:
printf \
  "hello\0word\0done\0\n"\
  "this\0next\0line\0\n"\
  "last\0ln\n"\
  > grb
buffer=""
xtra=""
while IFS= read -r -d '' ln; do
  buffer+="$xtra"
  ## If you wish to re-include the nulls as \0, which will work
  ## when you output with "printf", do this
  if [[ -n "$buffer" ]]; then
    buffer+="\0"
    ## otherwise
    #buffer+=" "
  fi
  buffer+="${ln%%$'\n'*}"
  xtra="${ln#*$'\n'}"
  if [[ "${ln/$'\n'}" != "$ln" ]]; then
    ## USE "$buffer" HERE HOWEVER YOU WISH
    printf "${buffer}\n" ## ..for example
    ## ...TILL HERE
    buffer=""
  else
    xtra=""
  fi
done <grb

Quote:
(And, it seems to me, there is some glitch in bash-2.05 in processing pipe by while (something about that I've experiensed about half year ago.) Seems something with asigning variables...
So, another point why I do not like that solution by 'tr..'
When bash reads from a pipe it spawns a subshell, so any variables you assign within the subshell will disappear after the command/loop which reads from the pipe finishes. It's not a "glitch", it's a feature. Using shell redirection avoids this. For example:

Code:
blob=""
while read temp; do
  blob+="$temp"
done < filename
echo "$blob"

will work, but:

Code:
blob=""
cat filename | while read temp; do
  blob+="$temp"
done
echo "$blob"

will not.
# 6  
Old 04-14-2010
Very nice rowanthorpe.. I try to use bash builtins to spawn as few subshells and processes with fd's as possible.. always on a some server.

The pipe vs. redir question is one I've been trying to figure out too.. One of the best ways I've found to handle that is by using exec manually on fds. Consider this dos2unix clone and it's alternate way of determining input. N6=/dev/null personal pref..
Code:
dos2unixx ()
{
    [[ $# -eq 0 ]] && exec tr -d '\015\032' || [[ ! -f "$1" ]] && echo "Not found: $1" && return;
    for f in "$@";
    do
        [[ ! -f "$f" ]] && continue;
        tr -d '\015\032' < "$f" > "$f.t" && cmp "$f" "$f.t" > $N6 && rm -f "$f.t" || ( touch -r "$f" "$f.t" && mv "$f" "$f.b" && mv "$f.t" "$f" && rm -f "$f.b" ) >&$N6;
    done
}



And strangely enough, earlier today I was doing some work on my own builtin MORE command, basically I wanted a cat pager, this does pretty good but I've only had it a day..

Code:
shmore ()
{
    local l L M="`echo;tput setab 4&& tput setaf 7||echo -en  \"\e[34;01\"`   --- SH More ---   `tput sgr0||echo -e \"\e[m"`";
    L=1;
    while read l; do
        echo "${l}";
        ((L++));
        [ "$L" == "${LINES:-80}" ] && {
            L=1;
            read -p"$M" -u1
        };
    done
}


Finally, here's the shcat I use.. and if you do $ cat file | shcat | head, you get an error from the pipe issue you talk about. However you can work around it with an $ exec 2>&1 in the correct place.
Code:
shcat ()
{
    local l f e IFS="";
    e=0;
    if [ $# -eq 0 ]; then
        while read -r l; do
            echo "${l}";
        done;
    else
        for f in "$@";
        do
            if [ -r "${f}" ]; then
                while read -r l; do
                    echo "${l}";
                done < "${f}";
            else
                 < "${f}";
                e=1;
            fi;
        done;
        return $e;
    fi
}


Also, these 2 aliases I have created over time that work very well for stuff like this.

Code:
cata='exec 2>&1 cat -A'
cate='exec 2>&1 cat -v | sed s/\\^\\[/\\\\033/g'



---------- Post updated at 02:13 AM ---------- Previous update was at 02:08 AM ----------

Code:
seq -s`echo -ne \012` --format=%03g 0 128
seq -s`echo -ne \\012` --format=%03g 0 128
seq -s`echo -ne "\\012"` --format=%03g 0 128
seq -s`echo -ne "\\011"` --format=%03g 0 128
seq -s`echo -ne "\\010"` --format=%03g 0 128
seq -s`echo -ne "\\009"` --format=%03g 0 128
seq -s`echo -ne "\\002"` --format=%03g 0 128
seq -s`echo -ne "\\02"` --format=%03g 0 128
seq -s`echo -ne "\\2"` --format=%03g 0 128
seq -s`tput cols`` --format=%03g 0 128
seq -s`tput cols` --format=%03g 0 128
seq -s`tput sgr` --format=%03g 0 128
seq -s`tput eol` --format=%03g 0 128
seq -s`tput erase` --format=%03g 0 128
seq -s`tput bs` --format=%03g 0 128
seq -s`tput kbs` --format=%03g 0 128
seq -s'`tput kbs` ' --format=%03g 0 128
seq -s"`tput kbs` " --format=%03g 0 128
seq -s" `tput kbs` " --format=%03g 0 128
seq -s" `tput kbs`" --format=%03g 0 128
seq -s" `tput kbs` " --format=%03g 0 128


part of my shell session from earlier today... I was actually trying to make the separator be a null.. might be useful to know there are several ways to output nulls..


Code:
aa_print_ascii_chart ()
{
    local i;
    for i in `seq ${1:-0} ${2:-256}`;
    do
        echo -e "\\0$(( $i/64*100 + $i%64/8*10 + $i%8 ))";
    done
}

# 7  
Old 04-14-2010
They are great scripts AskApache! I will read them in more depth when I get online later. At a glance at your shcat script, and the mention of the broken pipe problem, I remembered a thread over at the gnulib-bug-mailinglist, particularly this bit:
Quote:
> The second is that the echo builtin in bash-3.2 displays a message on
> a write error, instead of letting the exit status communicate the error.
> When the shell receives SIGPIPE and handles it without exiting, writes
> to that pipe return -1/EPIPE, and the echo builtin reports the error. In
> earlier versions, you wouldn't have seen the message.
>
> The bash 3.2 "printf" builtin doesn't have this problem though.

Aha! So it's really a bug in the 'echo' built-in, and using 'printf' is a
work-around.
I tried running your shcat with the echo's replaced with printf's as below:
Code:
shcat ()
{
   local l f e IFS="";
   e=0;
   if [ $# -eq 0 ]; then
       while read -r l; do
           printf "${l}\n"
       done;
   else
       for f in "$@";
       do
           if [ -r "${f}" ]; then
               while read -r l; do
                   printf "${l}\n"
               done < "${f}";
           else
                < "${f}";
               e=1;
           fi;
       done;
       return $e;
   fi
}

but on my shell it still had the error... Strangely, when I used the non-builtin printf like so:
Code:
shcat ()
{
   local l f e IFS="";
   e=0;
   if [ $# -eq 0 ]; then
       while read -r l; do
           /bin/printf "${l}\n"
       done;
   else
       for f in "$@";
       do
           if [ -r "${f}" ]; then
               while read -r l; do
                   /bin/printf "${l}\n"
               done < "${f}";
           else
                < "${f}";
               e=1;
           fi;
       done;
       return $e;
   fi
}

it worked perfectly (but ridiculously slowly...).

Previous Thread | Next Thread
Test Your Knowledge in Computers #500
Difficulty: Easy
Comments in code provide a high-level description of what a block of code, function, or program does.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Read the file line by line and do something with lines

I have a file file_name_O.txt The file can have different number of other files names or nothing I will check cnt=`wc -l file_name_0.txt` if ;then exit 1 fi Now I have to start checking file names, i.e. read txt file line by line. If amount of ,lines equal 1, I can... (4 Replies)
Discussion started by: digioleg54
4 Replies

2. Shell Programming and Scripting

With script bash, read file line per line starting at the end

Hello, I'm works on Ubuntu server My goal : I would like to read file line per line, but i want to started at the end of file. Currently, I use instructions : while read line; do COMMAND done < /var/log/apache2/access.log But, the first line, i don't want this. The file is long... (5 Replies)
Discussion started by: Fuziion
5 Replies

3. Shell Programming and Scripting

[BASH] read 'line' issue with leading tabs and virtual line breaks

Heyas I'm trying to read/display a file its content and put borders around it (tui-cat / tui-cat -t(ypwriter). The typewriter-part is a 'bonus' but still has its own flaws, but thats for later. So in some way, i'm trying to rewrite cat using bash and other commands. But sadly it fails on... (2 Replies)
Discussion started by: sea
2 Replies

4. Shell Programming and Scripting

How to read file line by line and compare subset of 1st line with 2nd?

Hi all, I have a log file say Test.log that gets updated continuously and it has data in pipe separated format. A sample log file would look like: <date1>|<data1>|<url1>|<result1> <date2>|<data2>|<url2>|<result2> <date3>|<data3>|<url3>|<result3> <date4>|<data4>|<url4>|<result4> What I... (3 Replies)
Discussion started by: pat_pramod
3 Replies

5. Shell Programming and Scripting

Bash script to read a file from particular line till required line and process

Hi All, Am trying to write wrapper shell/bash script on a utility tool for which i need to pass 2 files as arugment to execute utility tool. Wraper script am trying is to do with above metion 2 files. utility tool accepts : a. userinfo file : which contains username b. item file : which... (2 Replies)
Discussion started by: Optimus81
2 Replies

6. Shell Programming and Scripting

Need a program that read a file line by line and prints out lines 1, 2 & 3 after an empty line...

Hello, I need a program that read a file line by line and prints out lines 1, 2 & 3 after an empty line... An example of entries in the file would be: SRVXPAPI001 ERRO JUN24 07:28:34 1775 REASON= 0000, PROCID= #E506 #1065: TPCIPPR, INDEX= 003F ... (8 Replies)
Discussion started by: Ferocci
8 Replies

7. Shell Programming and Scripting

Read full line from file

hello all I'm writing a bash script and I need to read data from a file line by line The number of words of each line is not known and I want to check if anywhere in the line exists the substring www..That substring is a string by itself or a substring of other strings.So what I tried so far... (4 Replies)
Discussion started by: vlm
4 Replies

8. Shell Programming and Scripting

how to read the contents of two files line by line and compare the line by line?

Hi All, I'm trying to figure out which are the trusted-ips and which are not using a script file.. I have a file named 'ip-list.txt' which contains some ip addresses and another file named 'trusted-ip-list.txt' which also contains some ip addresses. I want to read a line from... (4 Replies)
Discussion started by: mjavalkar
4 Replies

9. Shell Programming and Scripting

Shell script to read a text file line by line & process it...

Hi , I am trying to write an shell, which reads a text file (from a location) having a list of numbers of strictly 5 digits only ex: 33144 Now my script will check : 1) that each entry is only 5 digits & numeric only, no alphabets, & its not empty. 2)then it executes a shell script called... (8 Replies)
Discussion started by: new_to_shell
8 Replies

10. Shell Programming and Scripting

cat file1 read line-per-line then grep -A 15 lines down in fileb

STEP 1 # Set variable FILE=/tmp/mainfile SEARCHFILE =/tmp/searchfile # THIS IS THE MAIN FILE. cat /tmp/mainfile Interface Ethernet0/0 "outside", is up, line protocol is up Hardware is i82546GB rev03, BW 100 Mbps Full-Duplex(Full-duplex), 100 Mbps(100 Mbps) MAC address... (6 Replies)
Discussion started by: irongeekio
6 Replies

Featured Tech Videos