print out missing files in a sequence


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting print out missing files in a sequence
# 1  
Old 03-30-2012
print out missing files in a sequence

Hello all,

I have several directories with a sequence of files like this
Code:
IM-0001-0001.dcm
IM-0001-0002.dcm
IM-0001-0003.dcm
IM-0001-0004.dcm
IM-0001-0005.dcm

I would like to print out the name of the file that is missing.

I currently have the following ineffecient way to do this and wondering if you would suggest me a better way to do this in multiple directories.

Code:
ls -1 *.dcm | awk -F"-" '{print $3}' > ori.txt

Code:
[]$ cat ori.txt 
0001.dcm
0002.dcm
0004.dcm
0005.dcm

Create another list with all files that are supposed to be there

Code:
[]$ cat main.txt 
0001.dcm
0002.dcm
0003.dcm
0004.dcm
0005.dcm

Code:
[]$ diff ori.txt main.txt 
2a3
> 0003.dcm

It would be good if I could display the full name of the missing file.

Thanks,

Moderator's Comments:
Mod Comment Please, use code tags!

Last edited by Scrutinizer; 03-31-2012 at 04:29 AM..
# 2  
Old 03-30-2012
The trouble with detecting holes in sequences is, how do you detect a hole at the beginning, or the end? Unless you really do know what files are supposed to be there, you're going to be reduced to guessing in some situations no matter what.

Will there ever be more than one sequence in this folder, or just the one?
# 3  
Old 03-30-2012
This can detect some kinds of sequences. It assumes anything with digits and an extension is part of a sequence, and tells different sequences apart from the string before the last set of digits and the extension. It doesn't need the files in sorted order.

Code:
$ cat missing.awk

X=match($0, /[0-9]+\.[^.]*$/) {
        Y=match($0, /\.[^.]*$/);
        PFIX=substr($0, 0, X-1); # IM-0001-
        EXT=substr($0, Y);        # .dcm
        VAL=substr($0, X, Y-X); # 0003

        # To check if the number of digits is changing.
        DIGITS[PFIX,EXT,length(VAL)]++;

        # The +0 is to guarantee a numeric sort, not alphabetic, so "01" < "2".
        if((!SMIN[PFIX,EXT]) || (SMIN[PFIX,EXT]>(VAL+0))) SMIN[PFIX,EXT]=VAL+0;
        if((!SMAX[PFIX,EXT]) || (SMAX[PFIX,EXT]<(VAL+0))) SMAX[PFIX,EXT]=VAL+0;
        F[PFIX,EXT,VAL]=1;
}

END {
        for(X in SMAX)
        {

                split(X, A, SUBSEP);
                PFIX=A[1];      EXT=A[2];

                DC=0;
                DMAX=0;
                for(Z in DIGITS)
                {
                        split(Z, A, SUBSEP);
                        if((A[1] != PFIX) || (A[2] != EXT)) continue;
                        if(A[3] > DMAX) DMAX=A[3];
                        DC++;
                }

                if(DC == 1)     CMDSTR="%0" DMAX "d"
                else            CMDSTR="%d"

                for(N=SMIN[X]+0; N<=(SMAX[X]+0); N++)
                {
                        VAL=sprintf(CMDSTR, N);
                        if(!F[PFIX,EXT,VAL])
                                print "Missing", PFIX VAL EXT;
                }
        }
}

$ touch IM-0001-{0001..0005}.dcm file-{8..15}.dat
$ rm IM-0001-0003.dcm file-9.dat file-11.dat
$ ls | awk -f missing.awk
Missing file-9.dat
Missing file-11.dat
Missing IM-0001-0003.dcm

$

This User Gave Thanks to Corona688 For This Post:
# 4  
Old 03-31-2012
Alternatively try this less general approach:
Code:
printf "%s\n" *.dcm | awk -F'[-.]' '$3>p+1{for(i=p+1;i<$3;i++){s=$0; sub($3"."$4,sprintf("%04d",i)"."$4,s); print s}}{p=$3}'

This assumes that all files have a fixed length, zero-padded counter in the third field, that they have an extension in the fourth field and that all fields (and field separators) other than the third field are identical. This also ensures wildcard expansion is in the right order..

Last edited by Scrutinizer; 03-31-2012 at 05:31 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 04-02-2012
Thanks a lot for your help guys.

Scrutinizer: It works great. I will used the code tages from next time on..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To check the missing file based on sequence number.

Hi All, I have a requirement that i need to list only the missing sequences with a unix script. For Example: Input: FILE_001.txt FILE_002.txt FILE_005.txt FILE_006.txt FILE_008.txt FILE_009.txt FILE_010.txt FILE_014.txt Output: FILE_003.txt FILE_004.txt FILE_007.txt FILE_011.txt... (5 Replies)
Discussion started by: Arun1992
5 Replies

2. Shell Programming and Scripting

Find the missing sequence

Dear all i am having file with max 24 entries. i want to find which sequence is missing file is like this df00231587.dat df01231587.dat df03231587.dat df05231587.dat . . . df23231587.dat the changing seq is 00-23,so i would like to find out which seq is missing like in above... (13 Replies)
Discussion started by: sagar_1986
13 Replies

3. Shell Programming and Scripting

Identifying Missing File Sequence

Hi, I have a file which contains few columns and the first column has the file names, and I would like to identify the missing file sequence number form the file and would copy to another file. My files has data in below format. APKRISPSIN320131231201319_0983,1,54,125,... (5 Replies)
Discussion started by: rramkrishnas
5 Replies

4. Shell Programming and Scripting

Find missing sequence

Hi, I need to find out the missing sequence from a list. However the issue is there is not a fixed start and end, it depends on the generation of files. For eg, it might start with 4000 and end with 9000. Based on this, I need a script which greps the start and end sequence from the... (3 Replies)
Discussion started by: danish0909
3 Replies

5. Shell Programming and Scripting

Case script to get missing sequence among files

I want to use case statement to find the range of missing sequence in my directory which it has some few ( dat & DAT ) files my directory /home/arm/my_folder/20130428 contains : f01_201304280000.DAT f01_201304280001.DAT f01_201304280003.DAT f02_201304280000.dat f02_201304280002.dat... (2 Replies)
Discussion started by: arm
2 Replies

6. Shell Programming and Scripting

How to check missing sequence?

I want to listed files every hours and check the missing sequence my file format is CV.020220131430.txt CV.020220131440.txt CV.020220131450.txt CV.ddmmyyhhm.txt how to check if i have missing files in sequence .. thanks (3 Replies)
Discussion started by: before4
3 Replies

7. Shell Programming and Scripting

How to take the missing sequence Number?

Am using unix aix KSH... I have the files called MMRR0106.DAT MMRR0206.DAT MMRR0406.DAT MMRR0506.DAT MMRR0806.DAT .... ... MMRR3006.DAT MMRR0207.DAT These files are in one dircetory /venky ? I want the output like this ? Missing files are : MMRR0306.DAT MMRR0606.DAT... (7 Replies)
Discussion started by: Venkatesh1
7 Replies

8. Shell Programming and Scripting

Perl : print the sequence number without missing number

Dear Perl users, I need your help to solve my problem below. I want to print the sequence number without missing number within the range. E.g. my sequence number : 1 2 3 4 5 6 7 8 11 12 13 14 my desired output: 1 -8 , 11-14 my code below but still problem with the result: 1 - 14 1 -... (2 Replies)
Discussion started by: mandai
2 Replies

9. Programming

find the missing sequence in hash perl

Dear Perl's Users, Could anyone help me how to solve my problem. I have data with details below. TTY NAME SEQUENCES U-0 UNIX 0 U-1 UNIX 1 U-2 UNIX 2 <-- From 2 jump to 5 U-5 UNIX 5 U-6 UNIX 6 <-- From 6 jump to 20 U-20 ... (2 Replies)
Discussion started by: askari
2 Replies

10. Shell Programming and Scripting

Scan two files and print values missing

Dear Experts, Have been seraching this forum from this morning for my query but dint find hence posting it her... Basically i have two input files BSS and MSS which has a unique string , hence i hav tried and seperated the text to compare frm both files .. Any my present input files look like... (6 Replies)
Discussion started by: shaliniyadav
6 Replies
Login or Register to Ask a Question