Count the number of files to delete doesnt match

11-19-2016

Registered User

143, 3

Join Date: Sep 2006

Last Activity: 28 April 2020, 7:36 PM EDT

Location: Bogota - Colombia - South America

Posts: 143

Thanks Given: 9

Thanked 3 Times in 3 Posts

Count the number of files to delete doesnt match

Good evening, need your help please

Need to delete certain files before octobre 1 2016, so need to know how many files im going to delete, for instance

Code:

ls -lrt file_20160*.lis!wc -l

but using grep -c to another file called bplist which contains the list of all files backed up doesn match the count

Code:

grep -c file_20160 bplist.txt

the first query gives me 568 files and the second query returns 1120 records-

So before deleting ive got to make sure ive got the right amount of files to delete, so waht am i doing wrong.

After matching the amount of files to delete i need to add a new file

1 using grep to bplist.txt to clasify files to delete this way:

Code:

for file in $(ls -lrt file_20160*.lis!awk '{print $9}')
do
grep $file bplist.txt >>filetodelete.txt
done

is any better and faster way to do that ?

Id appreciate your help in adavnced

Last edited by Don Cragun; 11-19-2016 at 11:01 PM..

alexcol

View Public Profile for alexcol

Find all posts by alexcol

11-19-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Moderator's Comments:

To keep the forums high quality for all users, please take the time to format your posts correctly.

Please take the time while your account is in read-only mode to review the tutorial below that explains how to correctly use CODE tags. We know that you have seen this tutorial many times before, but if you continue to post without correctly marking sample input, sample output, and code segments with CODE tags, you may be permanently banned from this site...

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-20-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

First: Note that elements of a pipeline are separated by pipe symbols (|); not exclamation points (!). So the code you showed us in post #1 in this thread can't possibly produce the output you described.

Second: We have absolutely no idea what the format is for the data in bplist (or bplist.txt, depending on which part of your post we are to believe). We have absolutely no idea what the format is for the filenames (or pathnames) being processed.

Third: You have not explained why you need to count files to be removed instead of just identifying files to be removed and removing them.

Fourth: You have not given us any indication whether there are duplicates in one or both of your lists, whether files in one list are different than files in the other list, nor if there is any indication that there is a problem with the contents of either list (other than that the line counts are different).

Fifth: Why use the complicated:

Code:

for file in $(ls -lrt file_20160*.lis!awk '{print $9}')

which involves creating a subshell and invoking two utilities and can fail miserably if there are any whitespace characters in any of your filenames, when:

Code:

for file in file_20160*.lis

would be MUCH faster and, if you properly quoted the expansion (i.e., "$file") in your for loop, suffers none of the problems possible in your current loop.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-20-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Further to Don's remarks, if you are using file name expansion with the for loop,

The first attempt might look like this:

Code:

for file in file_20160*.lis
do
  grep "$file" bplist.txt >>filetodelete.txt
done

It is important to test for the case when there are zero files that fit the pattern, otherwise you end up with an a variable that contains file_20160*.lis, which would then become a regular expression, since that is what grep, like so:

Code:

grep file_20160*.lis bplist.txt >>filetodelete.txt

which would then delete any file names that start with "file_2016" followed by zero or more zeroes and ".lis" from the file..

Now probably those files do not exist in your case, but it is best to avoid a possible loop hole altogether, by testing if a file exists and use string matching instead of regex matching, using grep's -F parameter. To avoid partial file name matches (where the pattern or string that grep is looking for is a subset of the filename) another important parameter would be the -x option, which forces line matches. A third thing would be to avid the possibility that files that start with a - sign could be interpreted as an option flag to grep. One way to stop this is by using the -- flag. Because you file pattern starts with file that will not be an issue here, but it is good practice to do that anyways, so that in future if you ever change the pattern so that it starts with an *, this will not break things.

So then it becomes:

Code:

for file in file_20160*.lis
do
  if [ -f "$file" ]; then 
    grep -Fx -- "$file" bplist.txt >>filetodelete.txt
  fi
done

Now that last thing here is that you are appending to the file here, probably out of necessity, otherwise the file would be overwritten with very loop. An alternative would be to redirect the loop itself so the file would only be opened once and you do not have to delete the file prior to running the loop:

Code:

for file in file_20160*.lis
do
  if [ -f "$file" ]; then
    grep -Fx -- "$file" bplist.txt
  fi
done > filetodelete.txt

One last thing. This is still an expensive way to do it because an external program in a subshell is used to perform the operations for every iteration in the for loop, which is resource intensive.

An alternative would be to use a pipe (|) and grep's - operator for stdin, which most grep's (but not all) will honor, together with the file flag -f

Code:

ls file_20160*.lis | grep -Fxf -  -- bplist.txt > filetodelete.txt

if there are not too many files in the directory.

Or use the more robust:

Code:

for file in file_20160*.lis
do
  if [ -f "$file" ]; then
    printf "%s\n" "$file"
  fi
done |
grep -Fxf -  -- bplist.txt > filetodelete.txt

Since the - operator for stdin is not universally supported in grep, another way would be to use process substitution ( <( ... ) ) that is used in for modern bash, ksh93 or zsh:

Code:

grep -Fxf <(ls file_20160*.lis) -- bplist.txt > filetodelete.txt

if there are not too many files in the directory.

Or again the more robust variety:

Code:

grep -Fxf <(
  for file in file_20160*.lis
  do
    if [ -f "$file" ]; then
      printf "%s\n" "$file"
    fi
  done
) -- bplist.txt > filetodelete.txt

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

11-26-2016

Registered User

143, 3

Join Date: Sep 2006

Last Activity: 28 April 2020, 7:36 PM EDT

Location: Bogota - Colombia - South America

Posts: 143

Thanks Given: 9

Thanked 3 Times in 3 Posts

OK, i will take into account your Recommendations. Thank you very much everyone of you for the options you gave me to reach out what i want.

First you were right bplist file had duplicated lines, thats why number of records didn match,so by removing duplicates i typed:

Code:

sort List-prosclbt00c-xpbatch-01112016_Q607965.txt|uniq > List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt

this file has this format:

Code:

-r   1452 Nov 01 10:10 /produccion/explotacion/xpbatch/SHELL_PLAN_FAMILIA-SEA_MOVIMIENTO/logs/LogMonitoreoSeaMovimiento20150601_2200
00.log
-r--r--r-- xpbatch   explotaci          40 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/LICENSE
-r--r--r-- xpbatch   explotaci          40 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/LICEN
SE
-r--r--r-- xpbatch   explotaci          46 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/READM
E
-r--r--r-- xpbatch   explotaci         113 Nov 01 10:10 /produccion/explotacion/xpbatch/local.cshrc
-r--r--r-- xpbatch   explotaci         159 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/README.ht
ml
-r--r--r-- xpbatch   explotaci         580 Nov 01 10:10 /produccion/explotacion/xpbatch/local.profile
-r--r--r-- xpbatch   explotaci         607 Nov 01 10:10 /produccion/explotacion/xpbatch/local.login
-r--r--r-- xpbatch   explotaci         632 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/lib/c
mm/GRAY.pf
-r--r--r-- xpbatch   explotaci         955 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/Welco
me.html
-r--r--r-- xpbatch   explotaci        1044 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/lib/c
mm/LINEAR_RGB.pf
-r--r--r-- xpbatch   explotaci        2856 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/lib/m
anagement/jmxremote.password.template
-r--r--r-- xpbatch   explotaci        3144 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/lib/c
mm/sRGB.pf

Now i wanted to search some files in bplist, but it is likely grep options are not supported (Sun operating system), it yields error,ie:

Code:

for Archivo in log_HistoricoRecargas??062016*.log*
do
  if [ -f "$Archivo" ]; then
    grep  -Fx -- "$Archivo" List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt
  fi
done > Archivotodelete.txt

grep: illegal option -- F
grep: illegal option -- x
Usage: grep -hblcnsviw pattern file . . .
grep: illegal option -- F

So i remove grep options and run the shell but id didn work because it didnt find records

Code:

for Archivo in log_HistoricoRecargas??062016*.log*
do
  if [ -f "$Archivo" ]; then
    grep  "$Archivo" List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt
  fi
done > Archivotodelete.txt

Code:

SCEL /SCEL/logs1/xpbatch #ls -lrt Archivotodelete.txt
-rw-r--r--   1 xpbatch  explotacion       0 Nov 25 21:47 Archivotodelete.txt

Thats because grep is literally looking the pattern ?? and * characters:

But listing those files does really exist

Code:

ls log_HistoricoRecargas??062016*.log*|wc -l
24220

A appreciate your help in advanced

alexcol

View Public Profile for alexcol

Find all posts by alexcol

11-26-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

On Solaris/SunOS systems use /usr/xpg4/bin/grep -Fx or fgrep -x instead of grep -Fx.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-27-2016

Registered User

143, 3

Join Date: Sep 2006

Last Activity: 28 April 2020, 7:36 PM EDT

Location: Bogota - Colombia - South America

Posts: 143

Thanks Given: 9

Thanked 3 Times in 3 Posts

Thanks, but using grep or fgrep doesnt interpret special characters like *,??
for example this search doesnt list anything

Code:

fgrep -x log_HistoricoRecargas??062016*.log*  List-prosclbt00c-logs1-01112016_Q607965_sindupli.txt

I appreciate your help in advanced

---------- Post updated 11-27-16 at 01:39 AM ---------- Previous update was 11-26-16 at 09:35 PM ----------

or should i use egrep that supports wildcard patterns instead?

On Solaris/SunOS systems support egrep ?

Thanks for your support in advanced

alexcol

View Public Profile for alexcol

Find all posts by alexcol

UNIX for Beginners Questions & Answers

Count the number of files to delete doesnt match

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count the pipes "|" in line and delete line if count greter then number.

Discussion started by: ketanraut

2. Shell Programming and Scripting

How to count number of files in directory and write to new file with number of files and their name?

Discussion started by: Akshay Hegde

3. Shell Programming and Scripting

Count number of match words

Discussion started by: chitech

4. Shell Programming and Scripting

Count the delimeter from a file and delete the row if delimeter count doesnt match.

Discussion started by: Akumar1

5. UNIX for Dummies Questions & Answers

Count number of files in directory excluding existing files

Discussion started by: ammu

6. Shell Programming and Scripting

Match and count the number of times

Discussion started by: cdfd123

7. UNIX for Dummies Questions & Answers

Comparing two files and count number of lines that match

Discussion started by: DerSeb

8. Shell Programming and Scripting

Awk Array doesnt match for substring

Discussion started by: pinnacle

9. UNIX for Dummies Questions & Answers

compare two files if doesnt match then display error message

Discussion started by: atl@mav

10. Shell Programming and Scripting

Grep, count and match two files

Discussion started by: madhunk