Find Unread Files

 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support Find Unread Files
# 8  
Old 01-12-2014
here u go.

Code:
lastfilename=`cat $HOME/lastfilename.txt`

find * -newer $lastfilename > $HOME/listoffilestoprocess

while read line
do
    ETL_PROCESS.sh $line
    echo $line > $HOME/lastfilename.txt
done < $HOME/listoffilestoprocess

rm -rf $HOME/listoffilestoprocess

# 9  
Old 01-16-2014
Quote:
Originally Posted by Fracker
here u go.

Code:
lastfilename=`cat $HOME/lastfilename.txt`

find * -newer $lastfilename > $HOME/listoffilestoprocess

while read line
do
    ETL_PROCESS.sh $line
    echo $line > $HOME/lastfilename.txt
done < $HOME/listoffilestoprocess

rm -rf $HOME/listoffilestoprocess

This is a good first cut, but there are a couple of problems here:
  1. If there are enough files in the directory, the expansion of * may overflow ARG_MAX limits on your system.
  2. The list returned by find will not be sorted by timestamp, so there is no guarantee that the last file processed by this script will be the newest file. If it isn't, the next time you run the script some files will be processed again.
I think the following script will get around those problems:
Code:
#!/bin/ksh
lastfile="$HOME/lastfilename.txt"
if [ -f "$lastfile" ]
then    read -r newest < "$lastfile"
else    newest=""
fi
ls -rt|( 
        if [ -n "$newest" ]
        then    # lastfile was not empty.  Skip over files older than the file
                # named in lastfile.
                while read -r file
                do      if [ "$file" = "$newest" ]
                        then    break
                        fi
                done
        fi
        # Process all files newer than the one previously listed in last file
        # (or all files in the directory if lastfile didn't exist or was empty).
        while read -r file
        do      # Process newer files in order from oldest to newest...
                ETL_PROCESS.sh "$file"
                # The script should abort here if ETL_PROCESS.sh failed...
                # Record the last file processed.
                printf "%s\n" "$file" > "$lastfile"
        done
)

But, if someone edits the last file processed in this directory after more files are added, this script (and the original script) will ignore the new files added after the last time the script ran until the time the file was edited. If that is a concern, the following may be a safer approach:
Code:
#!/bin/ksh
processed="$HOME/processed.txt"
# If the list of already processed files does not exist, create an empty list.
if [ ! -f "$processed" ] 
then    touch "$processed"
fi
ls -rt | grep -vF -f "$processed" | while read -r file
# Process all files newer that haven't already been processed...
do      # Process newer files in order from oldest to newest...
        ETL_PROCESS.sh "$file"
        # This script should skip the next step if ETL_PROCESS.sh failed.
        # Add current file to the list of processed files.
        printf "%s\n" "$file" >> "$processed"
done

It keeps a list of files processed and skips any file in that list when the script is run again later. It doesn't care about timestamps other than the fact that it will hand ETL_PROCESS.sh unprocessed files in order from the oldest to the newest.

Note, however, that this script can fail if a filename in the directory containing files to be processed can contain a file name that is a substring of another file's name. You haven't given us any indication of how files are named, so if this is a concern the grep command in the pipeline in this script would have to be adjusted to account for the actual filenames you'll be using. And, of course, the list of processed files should be edited to remove old files when they are removed from the directory.

Assuming that ETL_PROCESS.sh provides some indication that it successfully processed a file, all of these scripts should verify that a file was processed successfully before continuing with later files. The first two scripts should exit and not process any newer files until the problem is fixed or some files may never be processed. The last script above only needs to avoid adding the failed file to the list of processed files (unless ETL_PROCESS.sh has to process input files in the order in which they were received).

Both of these scripts were written and tested using ksh, but there is nothing here that is ksh specific as long as you're using a shell that recognizes basic POSIX shell syntax requirements (such as bash and ksh).

Hope this helps...
# 10  
Old 02-05-2014
Don C. provided, IMO, the best answer. Requires no extra files. Also works when the read program has issues and fails. It keeps the filenames unchanged. The files are not deleted after 15 days. You have to script that as well
I assume you use ksh -> #!/bin/ksh uses ksh as the shell

Code:
# "read_process" is your code or shell script to "read" the file
#    hopefully read_process returns failure when it fails
#!/bin/ksh
cd /directory/with/files
ls | while read fname   # get the name of every file in the directory
do
   if [ -s $fname ] ; then    # file has data in it?  it is not empty?
      read_process $fname   # not empty: run read_process
      if [ $? -eq ] ; then      # read_process ran ok?
         > $fname               # read_process worked make the file zero length (empty)
      fi
   fi
done

This script should be run once a week or maybe every day, as you decide. Do not change the 16 to a 15 or you will have problems - i'm not going into why fully but days are not dates they are the number of (86400 seconds) in the past. Not calendar days. I assume you want an email and have email on your UNIX box

Code:
#/bin/ksh
# assume that a file could keep failing on the read_process,so we keep it
find /directory_path_to_files -type f -mtime +16 -size 0 -exec rm {} \;
find /directory_path_to_files -type f -mtime +16 -size +1 > t.lis
if [ -s t.lis ] ; then
 uuencode t.lis t.txt | /usr/bin/mailx  -s 'you missed processing some files ' you@yourcompany.com  
fi


Last edited by Neo; 03-19-2014 at 08:08 AM.. Reason: change name.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

find -ctime -1 cannot find files without extention

The problem is this one. I tar and gzip files on remote server find . -ctime -1 | tar -cvf transfer_dmz_start_daily.tar *${Today}*.*; Command find . -ctime -1 Doesn't find files without extension .csv .txt I have to collect all files for current day, when the program... (1 Reply)
Discussion started by: digioleg54
1 Replies

2. Shell Programming and Scripting

Find command to find a word from list of files

I need to find a word '% Retail by State' in the folder /usr/sas/reports/RetailSalesTaxallocation. When I tried like below, -bash-4.1$ cd /usr/sas/reports/RetailSalesTaxallocation -bash-4.1$ find ./ -name % Retail by State find: paths must precede expression: Retail Usage: find ... (10 Replies)
Discussion started by: Ram Kumar_BE
10 Replies

3. UNIX for Dummies Questions & Answers

find Search - Find files not matching a pattern

Hello all, this is my first and probably not my last question around here. I do hope you can help or at least point me in the right direction. My question is as follows, I need to find files and possible folders which are not owner = AAA group = BBB with a said location and all sub folders ... (7 Replies)
Discussion started by: kilobyter
7 Replies

4. UNIX for Dummies Questions & Answers

How to see unread e-mails only in Alpine?

Does anyone have any idea how to see only unread (new) e-mails in the Alpine client when using IMAP? I finally have a fast IMAP client, but don't want to go over all the e-mails I've already read through other clients... Thanks in advance for any hints. ---------- Post updated at 01:21 PM... (0 Replies)
Discussion started by: JamesR404
0 Replies

5. UNIX for Dummies Questions & Answers

Pine continuously marks old messages as unread

Hi, I have been having problem with pine for the past few weeks. I use email clinet Thunderbird to view my emails. Every time I open the thunderbird, all my emails were marked as unread. So, I logged into our email server to see what's wrong. even when I opened pine, all messages are labeled as... (0 Replies)
Discussion started by: veepine
0 Replies

6. Shell Programming and Scripting

what is the find to command to find the files created last 30 days

what is the find to command to find the files created last 30 days (5 Replies)
Discussion started by: rajkumar_g
5 Replies

7. Shell Programming and Scripting

perl (conky) and gmail/IMAP unread message count

Hi all, I use Conky monitor (Conky - Home) for my laptop and I needed a script to see the count of new messages on gmail/IMAP, then I made this small perl script (I hope they can be useful to someone :)) gimap.pl #!/usr/bin/perl # gimap.pl by gxmsgx # description: get the count of unread... (0 Replies)
Discussion started by: gxmsgx
0 Replies

8. Shell Programming and Scripting

Little bit weired : Find files in UNIX w/o using find or where command

Yes , I have to find a file in unix without using any find or where commands.Any pointers for the same would be very helpful as i am beginner in shell scritping and need a solution for the same. Thanks in advance. Regards Jatin Jain (10 Replies)
Discussion started by: jatin.jain
10 Replies

9. Shell Programming and Scripting

Find files older than 20 days & not use find

I need to find files that have the ending of .out and that are older than 20 days. However, I cannot use find as I do not want to search in the directories that are underneath the directory that I am searching in. How can this be done?? Find returns files that I do not want. (2 Replies)
Discussion started by: halo98
2 Replies

10. UNIX for Dummies Questions & Answers

Mark messages as unread

Hi Does anyone know how to mark messages as unread either in Pine or from a Terminal or some such ? Thanks, James (1 Reply)
Discussion started by: Rylann
1 Replies
Login or Register to Ask a Question