To keep the forums high quality for all users, please take the time to format your posts correctly.
Please take the time while your account is in read-only mode to review the tutorial below that explains how to correctly use CODE tags. We know that you have seen this tutorial many times before, but if you continue to post without correctly marking sample input, sample output, and code segments with CODE tags, you may be permanently banned from this site...
First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)
Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.
First: Note that elements of a pipeline are separated by pipe symbols (|); not exclamation points (!). So the code you showed us in post #1 in this thread can't possibly produce the output you described.
Second: We have absolutely no idea what the format is for the data in bplist (or bplist.txt, depending on which part of your post we are to believe). We have absolutely no idea what the format is for the filenames (or pathnames) being processed.
Third: You have not explained why you need to count files to be removed instead of just identifying files to be removed and removing them.
Fourth: You have not given us any indication whether there are duplicates in one or both of your lists, whether files in one list are different than files in the other list, nor if there is any indication that there is a problem with the contents of either list (other than that the line counts are different).
Fifth: Why use the complicated:
which involves creating a subshell and invoking two utilities and can fail miserably if there are any whitespace characters in any of your filenames, when:
would be MUCH faster and, if you properly quoted the expansion (i.e., "$file") in your for loop, suffers none of the problems possible in your current loop.
Further to Don's remarks, if you are using file name expansion with the for loop,
The first attempt might look like this:
It is important to test for the case when there are zero files that fit the pattern, otherwise you end up with an a variable that contains file_20160*.lis, which would then become a regular expression, since that is what grep, like so:
which would then delete any file names that start with "file_2016" followed by zero or more zeroes and ".lis" from the file..
Now probably those files do not exist in your case, but it is best to avoid a possible loop hole altogether, by testing if a file exists and use string matching instead of regex matching, using grep's -F parameter. To avoid partial file name matches (where the pattern or string that grep is looking for is a subset of the filename) another important parameter would be the -x option, which forces line matches. A third thing would be to avid the possibility that files that start with a - sign could be interpreted as an option flag to grep. One way to stop this is by using the -- flag. Because you file pattern starts with file that will not be an issue here, but it is good practice to do that anyways, so that in future if you ever change the pattern so that it starts with an *, this will not break things.
So then it becomes:
Now that last thing here is that you are appending to the file here, probably out of necessity, otherwise the file would be overwritten with very loop. An alternative would be to redirect the loop itself so the file would only be opened once and you do not have to delete the file prior to running the loop:
One last thing. This is still an expensive way to do it because an external program in a subshell is used to perform the operations for every iteration in the for loop, which is resource intensive.
An alternative would be to use a pipe (|) and grep's - operator for stdin, which most grep's (but not all) will honor, together with the file flag -f
if there are not too many files in the directory.
Or use the more robust:
Since the - operator for stdin is not universally supported in grep, another way would be to use process substitution ( <( ... ) ) that is used in for modern bash, ksh93 or zsh:
if there are not too many files in the directory.
Hello,
I have been working on Awk/sed one liner which counts the number of occurrences of '|' in pipe separated lines of file and delete the line from files if count exceeds "17".
i.e need to get records having exact 17 pipe separated fields(no more or less)
currently i have below :
awk... (1 Reply)
Hi!
I just want to count number of files in a directory, and write to new text file, with number of files and their name
output should look like this,,
assume that below one is a new file created by script
Number of files in directory = 25
1. a.txt
2. abc.txt
3. asd.dat... (20 Replies)
Input:
some random text SELECT TABLE1 some more random text
some random text SELECT TABLE2 some more random text
some random text SELECT TABLE3 some more random text
some random text SELECT TABLE1 some more random text
Output:
'SELECT TABLE1' 2
'SELECT TABLE2' 1
'SELECT TABLE3' 1
I... (5 Replies)
I have a file containing about 5 million rows, in the file there are some records which has extra delimiter at random position. (we dont know the positions), now we have to Count the delimeter from each row and if the count of delimeter is not matching then I want to delete those rows from the... (5 Replies)
Hi,
Please let me know how to find out number of files in a directory excluding existing files..The existing file format will be unknown..each time..
Thanks (3 Replies)
Hello all,
I always found help for my problems using the search option, but this time my request is too specific. I have two files that I want to compare. File1 is the index and File2 contains the data:
File1:
chr1 protein_coding exon 500 600 . + . gene_id "20532";... (0 Replies)
Awk Array doesnt match for substring
nawk -F"," 'FNR==NR{a=$2 OFS $3;next} a{print $1,$2,a}' OFS="," file1 file2
I want cluster3 in file1 to match with cluster3int in file2
output getting:
Output required:
Help is appreciated (8 Replies)
hi ,
i have one file ,i need to search particular word from this file and if content is matched then echo MATCHED else NOT MATCHED
file contains : mr x planned to score 75% in exam but end up with 74%.
word to be searched id 75%
please help me out .
waiting for reply
thanks in advance (2 Replies)
I am writing the below script to do a grep and count number of occurances between two tab delimited files.
I am trying to achieve..
1) Extract column 2 and column 3 from the S.txt file. Put it in a temp pattern file
2) Grep and count column 2 in D.txt file
3) Compare the counts between... (19 Replies)