Identify file pattern, take count of pattern, then act


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Identify file pattern, take count of pattern, then act
# 1  
Old 05-13-2015
Identify file pattern, take count of pattern, then act

Guys -
Need your ideas on a section of code to finish something up. To make a long story short, I'm parsing a print output file that goes to pre-printed forms. I'm intercepting it, parsing it, formatting it, cutting it up into individual pages, grabbing the text I want in zones, building an .fdf (declares the zones), then populating (pdftk).

All of this is working fine.... In the end, I separate docs that have multiple pages (from those that don't) into a separate directory where I want to slam them together..in order. Single .pdf...for these multipages all with common "ID" name.

No problem getting the concatenation to work manually - I need your help with the code at the end of this bash script.

I've got a directory ($DIRECTORY) that has files that looks like this:
Code:
368363.pdf
368363-2.pdf
368363-3.pdf
368373.pdf
368373-2.pdf
368373-3.pdf
368389.pdf
368389-2.pdf

The beauty is, this directory will ONLY contain files that have other files matching it's first six characters as I move through line by line. And it WILL be in order with the first file being 'filname.pdf' and everything following...by line...being 'filename-2(++).pdf'.

What I want to do is simple...read the directory, take in a file, one at a time, store all it counterparts with the -'X'.pdf into a variable, then slam them all together with pdftk before I get to the next line (i.e - pdftk $ALL $pdform cat output $linem (m to indicate multipage since pdftk is a bitch about using same input and output names. I can put in a line at the end to move it all and cleanup).

Something like:

Code:
for f in $DIRECTORY; do
         FILE=$(echo ${f##*/})         #Cut out the path leaving filename.pdf
        CATEM=$(ls -1f|grep $FILE|sort|uniq|tr '\r\n' ' ') # Gives me each file, separated by space (rather than line) to input to pdftk as a variable...

I'm lost after this. I don't think my plan is going to work because the original file will contain EVERYTHING and won't be re-read as I 'do stuff'....

Need some fresh eyes/ideas on this...this script is HUGE and does a ton of processing and I think I'm just getting tired of looking at it! Lol...

Thanks, fellas.
# 2  
Old 05-13-2015
You can skip the further processing if the files are "counterparts". To be clear, if the filename contains "-" (or any other check which you think would be safe).

Something like..

Code:
for f in $DIRECTORY; do
  echo $f | grep -q '-'
  if [ $? -eq 0 ]; then
    continue # pick the next file if its not the main file
  else
    FILE=$(echo ${f##*/})
    CATEM=$(ls -1f|grep $FILE|sort|uniq|tr '\r\n' ' ')
    # any other stuff 
  fi
done

# 3  
Old 05-13-2015
With extended pattern matching, available in recent bash, you could try
Code:
for f in !(*-*).pdf
   do CATEM=$(ls ${f%.*}*)
      echo $CATEM
   done
368363-2.pdf 368363-3.pdf 368363.pdf
368373-2.pdf 368373-3.pdf 368373.pdf
368389-2.pdf 368389.pdf

If the order is relevant, you'll need another step.

Last edited by RudiC; 05-13-2015 at 05:33 AM..
# 4  
Old 05-13-2015
Quote:
Originally Posted by ampsys
Guys -
Need your ideas on a section of code to finish something up. To make a long story short, I'm parsing a print output file that goes to pre-printed forms. I'm intercepting it, parsing it, formatting it, cutting it up into individual pages, grabbing the text I want in zones, building an .fdf (declares the zones), then populating (pdftk).

All of this is working fine.... In the end, I separate docs that have multiple pages (from those that don't) into a separate directory where I want to slam them together..in order. Single .pdf...for these multipages all with common "ID" name.

No problem getting the concatenation to work manually - I need your help with the code at the end of this bash script.

I've got a directory ($DIRECTORY) that has files that looks like this:
Code:
368363.pdf
368363-2.pdf
368363-3.pdf
368373.pdf
368373-2.pdf
368373-3.pdf
368389.pdf
368389-2.pdf

The beauty is, this directory will ONLY contain files that have other files matching it's first six characters as I move through line by line. And it WILL be in order with the first file being 'filname.pdf' and everything following...by line...being 'filename-2(++).pdf'.

What I want to do is simple...read the directory, take in a file, one at a time, store all it counterparts with the -'X'.pdf into a variable, then slam them all together with pdftk before I get to the next line (i.e - pdftk $ALL $pdform cat output $linem (m to indicate multipage since pdftk is a bitch about using same input and output names. I can put in a line at the end to move it all and cleanup).

Something like:

Code:
for f in $DIRECTORY; do
         FILE=$(echo ${f##*/})         #Cut out the path leaving filename.pdf
        CATEM=$(ls -1f|grep $FILE|sort|uniq|tr '\r\n' ' ') # Gives me each file, separated by space (rather than line) to input to pdftk as a variable...

I'm lost after this. I don't think my plan is going to work because the original file will contain EVERYTHING and won't be re-read as I 'do stuff'....

Need some fresh eyes/ideas on this...this script is HUGE and does a ton of processing and I think I'm just getting tired of looking at it! Lol...

Thanks, fellas.
The above code snippet doesn't even come close to doing what the comments indicate it should do. The for loop will execute once with f set to the directory named by $DIRECTORY. It will set FILE to the last component of that directory; not the name of a file in that directory.

Note also that with filenames like 123456.pdf and 123456-2.pdf, ls or anything else that sorts its output alphabetically will sort 123456-2.pdf before 123456.pdf since - sorts before . in ASCII and in the C Locale sort order.

There is no end to your sample for loop, so I don't know what you're planning to do after you get the list of files. And, I'm not sure whether you are trying to create a list of filenames (with $DIRECTORY stripped off) that can be used in the directory where the files are located, or a list of pathnames (including $DIRECTORY so the list can be used in another directory. It is also not clear whether $DIRECTORY is an absolute pathname for that directory or a relative pathname for that directory. The following code will only work if $DIRECTORY expands to an absolute pathname for a directory.

Rather than creating a list of files as a scalar variable, with bash or ksh it would be much easier to do this with an array (especially if your directory structure or filenames might ever contain any whitespace characters). Perhaps the following will give you something you can build on to get what you want:
Code:
#!/bin/bash
DIRECTORY="$(PWD)/dir"
cd "$DIRECTORY" || exit 1
for pdf in ??????.pdf
do	base=${pdf%.pdf}
	FILES=("$pdf" "$base-"*.pdf)
	PATHS=("$DIRECTORY/$pdf" "$DIRECTORY/$base-"*.pdf)
	echo "FILES (${#FILES[@]} elements):"
	printf '\t"%s"\n' "${FILES[@]}"
	echo "PATHS (${#PATHS[@]} elements):"
	printf '\t"%s"\n' "${PATHS[@]}"
	echo
done

This loop runs in the directory specified by $DIRECTORY and creates a list of your desired filenames and a list of pathname for those filenames. I assume you'll want one of those and can delete the code for the one you don't want. I also assume that you'll want to replace one of those printf commands with a pdftk command, but I don't see how $ALL, $linem, or $pdform from your description relate to the list of filenames or pathnames you want to use; so I'm leaving that as an exercise for the reader.

In a directory that contains the files:
Code:
1 2 3 -2.pdf
1 2 3 -3.pdf
1 2 3 -4.pdf
1 2 3 .pdf
368363-2.pdf
368363-3.pdf
368363.pdf
368389-2.pdf
368389.pdf

it produces the output:
Code:
FILES (4 elements):
	"1 2 3 .pdf"
	"1 2 3 -2.pdf"
	"1 2 3 -3.pdf"
	"1 2 3 -4.pdf"
PATHS (4 elements):
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/1 2 3 .pdf"
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/1 2 3 -2.pdf"
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/1 2 3 -3.pdf"
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/1 2 3 -4.pdf"

FILES (3 elements):
	"368363.pdf"
	"368363-2.pdf"
	"368363-3.pdf"
PATHS (3 elements):
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/368363.pdf"
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/368363-2.pdf"
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/368363-3.pdf"

FILES (2 elements):
	"368389.pdf"
	"368389-2.pdf"
PATHS (2 elements):
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/368389.pdf"
	"/Users/dwc/test/unix.com/shell/Identify_file_pattern,.../dir/368389-2.pdf"

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grep pattern file and count occurances

Guys I am trying grep to read from pattern file and count occurances of each word. input is firstplace secondplace secondpot patternfile is place first second i want the following. 1-count number of times keywords in patternfile occurs in begining of each line in input file. so... (12 Replies)
Discussion started by: ahfze
12 Replies

2. Shell Programming and Scripting

Pattern count in file

hi , I have a below file which contain as Use descriptive thread titles when posting Urgent. For example, do not post questions with subjects like "Help Me!", "Urgent Urgent Urgent" . or "Doubt". Post subjects like "Execution Problems with Cron" or "Help with Backup Shell Script".... (7 Replies)
Discussion started by: Jewel
7 Replies

3. UNIX for Advanced & Expert Users

Count number of lines between a pattern in a large file

1000CUS E Y4NYRETAIL 10010004HELIOPOLIS 110000500022360591000056XX EG 1101DEBY XXAD ZSSKY TSSROS 1102HANYNNYY@HOTMAIL.COM 210030/05/201301/06/2013AED 3100 OPE 3100 CLO 3100 The 1000CUS E Y NYCORPORATE 10010004HELIOPOLIS 110000500025270504550203XX EG 1101XXXQ FOR CTING AND... (1 Reply)
Discussion started by: john2022
1 Replies

4. Shell Programming and Scripting

Search for a pattern in a String file and count the occurance of each pattern

I am trying to search a file for a patterns ERR- in a file and return a count for each of the error reported Input file is a free flowing file without any format example of output ERR-00001=5 .... ERR-01010=10 ..... ERR-99999=10 (4 Replies)
Discussion started by: swayam123
4 Replies

5. Shell Programming and Scripting

Checking a pattern in file and the count of characters

I am having a zipped file which has the following URL contents - 98.70.217.222 - - "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-" In this line here is we only need to consider the... (4 Replies)
Discussion started by: Naks_Sh10
4 Replies

6. Shell Programming and Scripting

Identify file name pattern in different file names

Hi, need help in recognizing the pattern of file name. For e.g. file name 1: <static file prefix>.<store cd>_<YYYYMMDD>.<ext> file name 2: <static file prefix>_<YYYYMMDD>.<ext> I want to know that there are 3 dots "." in the file name1 and one dot "." in file name2. How can I know... (3 Replies)
Discussion started by: dips_ag
3 Replies

7. Shell Programming and Scripting

How to count the pattern in a file by awk

hello everybody, I have 3 files eg- sample1 sample2 sample3 each file contain word babu many times eg- cat sample1 babu amit msdfmdfkl babu abhi babu ruby amit babu I want to count only the count of babu ,how many times it appeared . (5 Replies)
Discussion started by: abhigrkist
5 Replies

8. Shell Programming and Scripting

Count the number of occurrences of a pattern between each occurrence of a different pattern

I need to count the number of occurrences of a pattern, say 'key', between each occurrence of a different pattern, say 'lu'. Here's a portion of the text I'm trying to parse: lu S1234L_149_m1_vg.6, part-att 1, vdp-att 1 p-reserver IID 0xdb registrations: key 4156 4353 0000 0000 ... (3 Replies)
Discussion started by: slipstream
3 Replies

9. UNIX for Dummies Questions & Answers

Search and Count Occurrences of Pattern in a File

I need to search and count the occurrences of a pattern in a file. The catch here is it's a pattern and not a word ( not necessarily delimited by spaces). For eg. if ABCD is the pattern I need to search and count, it can come in all flavors like (ABCD, ABCD), XYZ.ABCD=100, XYZ.ABCD>=500,... (6 Replies)
Discussion started by: tektips
6 Replies

10. Shell Programming and Scripting

nawk-how count the number of occurances of a pattern, when don't know the pattern

I've written a script to count the total size of SAN storage LUNs, and also display the LUN sizes. From server to server, the LUNs sizes differ. What I want to do is count the occurances as they occur and change. These are the LUN sizes: 49.95 49.95 49.95 49.95 49.95 49.95 49.95 49.95... (2 Replies)
Discussion started by: cyber111
2 Replies
Login or Register to Ask a Question