How to find a missing file sequence using shell scripting?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to find a missing file sequence using shell scripting?
# 1  
Old 12-28-2014
How to find a missing file sequence using shell scripting?

Hey guys,

I want the below files to be processed with the help of BASH so that i will be able to find the missing file names :

Code:
PP01674520141228X.gz
PP01674620141228X.gz
PP01674820141228X.gz
PP01674920141228X.gz
PP01675420141228X.gz
PP01675520141228X.gz
PP01676020141228X.gz
.
.
.
.
.
.
.
.
.
.
PP13999920141228X.gz

PP -> fixed
01-10 -> vary from 01 to 10
6745 -> sequence number will be any 4 digit but in a sequence
20141228 -> today's date
X -> could be A B C

I want the output to be like:

Code:
Missing files are:

PP01674720141228A.gz
PP01675120141228A.gz
PP01675220141228A.gz
PP01675320141228A.gz
PP01675620141228A.gz
PP01675720141228A.gz
PP01675820141228A.gz
PP01675920141228A.gz

I wrote the script but unable to get the missing files:

Code:
ls /home/tanuj/Desktop/Tanuj1/  -l PP*A* | awk -F" " '{ print substr($9,1,4)" "substr($9,5,4)" "substr($9,9,9) }' | sort -n
nawk '
NR==1
{
name=substr($0,1,4);
seq=substr($0,5,4);
next
}
{
name1=substr($0,1,4);
seq1=substr($0,5,4); 
if(name == name1)
    {
    for(i=seq+1;i<seq1;i++)
    {print name""i}
    }
    name=name1;seq=seq1;    
    } '

Kindly help.Thanks in advance

Last edited by Scrutinizer; 12-28-2014 at 12:49 PM.. Reason: to update; MOD - Code tags
# 2  
Old 12-28-2014
Please use code tags as required by forum rules!

Did you consider searching these fora? Very similar requests have been asked; see table at the bottom.
Some questions:
- How can "PP13999920141228X.gz" exist when char 3 & 4 can be 01 - 10 only?
- Where in the output is the sequence no. 50?
- Why has the "X" in the original file names turned to an "A" in the output sample?
# 3  
Old 12-28-2014
Dear RudiC,
My reply below :
How can "PP13999920141228X.gz" exist when char 3 & 4 can be 01 - 10 only?
[tanuj] PP13999920141228X.gz its a typo mistake it will be PP10999920141228X.gz
- Where in the output is the sequence no. 50?
[tanuj] typo mistake the output will be:
Code:
PP01674720141228A.gz
PP01675020141228A.gz
PP01675120141228A.gz
PP01675220141228A.gz
PP01675320141228A.gz
PP01675620141228A.gz
PP01675720141228A.gz
PP01675820141228A.gz
PP01675920141228A.gz
.
.
..... and so on

- Why has the "X" in the original file names turned to an "A" in the output sample?
[tanuj] X can be anything either 'A' 'B' or 'C' i just need 'A' files to be displayed in output

Last edited by Scrutinizer; 12-28-2014 at 12:50 PM.. Reason: CODE tags
# 4  
Old 12-28-2014
So - should any of the "B" or "C" be turned into an "A"? Or should only "A" files be considered?

---------- Post updated at 18:27 ---------- Previous update was at 17:54 ----------

I'm not sure your ls /home/tanuj/Desktop/Tanuj1/ -l PP*A* will do what you want - it will list the contents of the entire /home/.../Tanuj1/ directory and all files matching PP*A* (e.g. PP_foo_A_bar.tmp) in your current working directory in long format.

However, to get to your desired output, and hoping only today's files are in there, and also that you accept awk although you specified bash in post 1, try

Code:
ls /home/tanuj/Desktop/Tanuj1/PP*A.gz |
awk     'NR==1          {seq = substr ($0, 5, 4) + 0
                         FN1 = substr ($0, 1, 4)
                         FN2 = substr ($0, 9)   
#                        sub (/X\./, "A.", FN2)      # can be dropped as ls will select only "A" files
                        }
                        {for (; seq < substr ($0, 5, 4) + 0; seq++) print FN1 seq FN2
                         seq++
                        }
        '
PP01674720141228A.gz
PP01675020141228A.gz
PP01675120141228A.gz
PP01675220141228A.gz
PP01675320141228A.gz
PP01675620141228A.gz
PP01675720141228A.gz
PP01675820141228A.gz
PP01675920141228A.gz

# 5  
Old 12-28-2014
Would this work? It's a little more bash(ish)..
\ls /home/tanuj/Desktop/Tanuj1 | grep -e '^PP.*A.gz$' | sort -V
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To check the missing file based on sequence number.

Hi All, I have a requirement that i need to list only the missing sequences with a unix script. For Example: Input: FILE_001.txt FILE_002.txt FILE_005.txt FILE_006.txt FILE_008.txt FILE_009.txt FILE_010.txt FILE_014.txt Output: FILE_003.txt FILE_004.txt FILE_007.txt FILE_011.txt... (5 Replies)
Discussion started by: Arun1992
5 Replies

2. Shell Programming and Scripting

Find the missing sequence

Dear all i am having file with max 24 entries. i want to find which sequence is missing file is like this df00231587.dat df01231587.dat df03231587.dat df05231587.dat . . . df23231587.dat the changing seq is 00-23,so i would like to find out which seq is missing like in above... (13 Replies)
Discussion started by: sagar_1986
13 Replies

3. Shell Programming and Scripting

Identifying Missing File Sequence

Hi, I have a file which contains few columns and the first column has the file names, and I would like to identify the missing file sequence number form the file and would copy to another file. My files has data in below format. APKRISPSIN320131231201319_0983,1,54,125,... (5 Replies)
Discussion started by: rramkrishnas
5 Replies

4. Shell Programming and Scripting

Find missing sequence

Hi, I need to find out the missing sequence from a list. However the issue is there is not a fixed start and end, it depends on the generation of files. For eg, it might start with 4000 and end with 9000. Based on this, I need a script which greps the start and end sequence from the... (3 Replies)
Discussion started by: danish0909
3 Replies

5. UNIX for Advanced & Expert Users

Checking missing data's sequence (shell script | UNIX command)

Dear All members, i have some trouble here, i want to ask your help. The case is: I have some data, it's like: -ABCD1234 -ABCD1235 -ABCD1237 -BCDE1111 -BCDE1112 -BCDE1114 there is some missing data's sequence (the format is: ABCD = name 1234 = sequence). I want to print the... (2 Replies)
Discussion started by: septian.tri
2 Replies

6. Programming

find the missing sequence in hash perl

Dear Perl's Users, Could anyone help me how to solve my problem. I have data with details below. TTY NAME SEQUENCES U-0 UNIX 0 U-1 UNIX 1 U-2 UNIX 2 <-- From 2 jump to 5 U-5 UNIX 5 U-6 UNIX 6 <-- From 6 jump to 20 U-20 ... (2 Replies)
Discussion started by: askari
2 Replies

7. Shell Programming and Scripting

How to insert a sequence number column inside a pipe delimited csv file using shell scripting?

Hi All, I need a shell script which could insert a sequence number column inside a dat file(pipe delimited). I have the dat file similar to the one as shown below.. |A|B|C||D|E |F|G|H||I|J |K|L|M||N|O |P|Q|R||S|T As shown above, the column 4 is currently blank and i need to insert sequence... (5 Replies)
Discussion started by: nithins007
5 Replies

8. Shell Programming and Scripting

Shell scripting for this sequence

KINDLY HELP ME FOR SHELL SCRIPTING FOR THIS TASK. My input file consists of thousands of sequence in this format. The given input file consists of four sequences which are starting with ‘>’ symbol (each sequence shown in different colour for easy understanding). I have to use a command at $... (3 Replies)
Discussion started by: kswapnadevi
3 Replies

9. Shell Programming and Scripting

Shell scripting for this sequence to compare

I have two input files (given below) and to compare each line of the File1 with each line of File2 starts with '>sample1'. If a match occurs and that matched line in the File2 contains another line or sequence of lines starting with "Chr" they have to be displayed in output file with that sample.... (4 Replies)
Discussion started by: hravisankar
4 Replies

10. Shell Programming and Scripting

Shell scripting : Help Me for this sequence

I have two input files (given below) and to compare each line of the File1 with each line of File2 starts with '>sample1'. If a match occurs and that matched line in the File2 contains another line or sequence of lines starting with "Chr" they have to be displayed in output file with that sample.... (9 Replies)
Discussion started by: hravisankar
9 Replies
Login or Register to Ask a Question