Checking file consistencies


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Checking file consistencies
# 1  
Old 10-19-2011
Checking file consistencies

Hi All,

I am stuck with a problem here. I have two directories with really huge number of files about 200000+. I did some file processing and in between my program crashed thereby creating some inconsistent files. Running the script over again is out of question now as it takes lot of time to process them.

I need to know which are the inconsistent files and which files are missing in the new directory?

Here's the scenario:
1. I have one directory named main_directory which has the main files and are error free. These are the files which my script was reading and doing some processing.

2. After doing processing my script was writing the files to another directory named "scores". It is here the inconsistencies might exist.

My files in main_directory look like these:

1.wcor
Code:
1234
43232
9483
2345
9484

All the files in main_directory have names like 1.wcor, 2.wcor, 5.wcor etc. I have the complete list of these files in another file file_list.txt, which I populated by
Code:
ls -1 *.wcor > file_list.txt command

The files in my scores directory have some processed files and have the same file name except the extension. For example. 1.wcor above will have 1.sco in scores, 2.wcor in main_directory will have 2.sco in scores directory. But kindly note the these are not named continuously. In between it might happen that after 2.wcor, 5.wcor might come (no 3.wcor or 4.wcor exists) and this goes for scores directory too.

My corresponding 1.sco looks like this:
Code:
3232 5443333 5454 3232 54343

One checking factor here is the number of lines in 1.wcor and number of spaces in 1.sco. If they match, then the file is consistent. This is applicable to all the files both in main_directory and scores.

My task is to print all those files which are "missing and inconsistent". Missing in the sense that "files which do not exist in scores directory but are there in main_directory". Since, my script write the files in write mode, so I do not need to delete inconsistent files, they are all overwritten.

This is to just let me program read only those files and complete the entire operation for all files.

I am using Linux with bash and I have tried some solutions but to no avail:

One of them being:

Code:
find /path/to/main_directory -name '*' -type f | \
while read file
do
      wc -c "$file"  | read size dummy
      echo "`basename $file` $size"
done > realfiles

find / -type ! -name '/path/to/main_directory/*' |\
while read file
do
       wc -c "$file"  | read size dummy
       echo "`basename $file` $size  $file"
done > score_files

# create a file badfiles that is a list of all the failures
awk '{
        FILENAME=="realfiles" {
                key[$1 $2]++
        }
        FILENAME=="shadowfiles" {
                if( !key[$1 $2]) { print $3 }
        }
      }'   main_directory score_files > badfiles

# 2  
Old 10-19-2011
See if this helps you:
Code:
#!/usr/bin/ksh
typeset -i mChars1
typeset -i mChars2
ls -1 main_directory/*.wcor | while read mFName; do
  mBase=$(basename ${mFName} '.wcor')
  if [[ -f "scores_dir/${mBase}.sco" ]]; then
    mChars1=$(wc -c < b1)+1
    mChars2=$(wc -c < b2)
    if [[ ${mChars1} -ne ${mChars2} ]]; then
      echo "File <${mBase}> does not match." > No_Match.txt
    fi
  else
    echo "Not found <${mBase}>" > Not_Found.txt
  fi
done

This User Gave Thanks to Shell_Life For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Check group consistencies

hello masters , please help here. I have 4 cols, I am looking for consistent 'geno' values within 'line', 'part' combinations. If the geno values are not consistent within a 'line', 'part' block, then we delete that block. One of the complications is that geno values are always 2 character, but... (7 Replies)
Discussion started by: ritakadm
7 Replies

2. Shell Programming and Scripting

Checking Multiple File existance in a UNIX folder(Note: File names are all different)

HI Guys, I have some 8 files with different name and extensions. I need to check if they are present in a specific folder or not and also want that script to show me which all are not present. I can write if condition for each file but from a developer perspective , i feel that is not a good... (3 Replies)
Discussion started by: shankarpanda003
3 Replies

3. Shell Programming and Scripting

Awk: File Checking Issues with 9 multiple file

Hi, I have 9 files which are generated dynamically & if there is a some condition which doesn't meet the criteria then file is not created or is of zero size. so further i am unable to consolidate the files based on following code 1 awk -F, -v ptime="201407" 'FNR==1... (3 Replies)
Discussion started by: siramitsharma
3 Replies

4. Shell Programming and Scripting

File checking

Hello Experts, File contains 5 columns with | delimeter. 1,3,5 columns are required columns means it should contains values. reset of the columns it will contain value or not. test1.txt: a@a.com|a|b|c|d |a|b|c|d output: test2.txt a@a.com|a|b|c|d I need the unix script, read the... (5 Replies)
Discussion started by: muralikri
5 Replies

5. UNIX for Advanced & Expert Users

File checking

Hello Experts, File contains 5 columns with | delimeter. 1,3,5 columns are required columns means it should contains values. reset of the columns it will contain value or not. test1.txt: Code: a@a.com|a|b|c|d |a|b|c|d output: test2.txt Code: a@a.com|a|b|c|d I need the unix... (1 Reply)
Discussion started by: muralikri
1 Replies

6. Shell Programming and Scripting

File checking

Hi I have 4 files, I need the check whether these 4 files are having header and Trailer records. header and trailer records are identified with 1,b. If any file is not having these we will not proceed with other process. Output should be 1 if all files are having header and footer other... (4 Replies)
Discussion started by: cnrj
4 Replies

7. Shell Programming and Scripting

Script check for file, alert if not there, and continue checking until file arrives

All, Is there a way to keep checking for a file over and over again in the same script for an interval of time? Ie If { mail -user continue checking until file arrives file arrives tasks exit I don't want the script to run each time and email the user each time a file... (4 Replies)
Discussion started by: markdjones82
4 Replies

8. Shell Programming and Scripting

Checking for a control file before processing a data file

Hi All, I am very new to Shell scripting... I got a requirement. I will have few text files(data files) in a particular directory. they will be with .txt extension. With same name, but with a different extension control files also will be there. For example, Sample_20081001.txt is the data... (4 Replies)
Discussion started by: purna.cherukuri
4 Replies

9. Shell Programming and Scripting

Multiple file existence and checking file size

I want to check the files in particular directory are more that 0 Bytes i.e, Non zero byte file. The script should print a msg if all the files in that directory are empty( 0 Byte). (2 Replies)
Discussion started by: lathish
2 Replies

10. Shell Programming and Scripting

Error checking a file from previous file size

Hi, I'm currently trying to write a script that checks a log file for certain errors. Once checked it then records the filesize in another file. All this is fine, my problem is that the next time I do my error check I only want to check from previously recorded filesize to the end of file. I'm... (2 Replies)
Discussion started by: stuck1
2 Replies
Login or Register to Ask a Question