Perl: batch replace a portion of text in files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl: batch replace a portion of text in files
# 1  
Old 11-03-2014
Question Perl: batch replace a portion of text in files

Hi all,

What I would like to achieve is to batch change the code below in every pdf in a given directory (each pdf is uncompressed so that can be easily edited).
An example of the javascript code:

if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_216', 15259]); } catch(e) { console.println(e); } };

The portion pp_216 can vary and and the number 216 is just an example. After pp_ there can be two possibilities: (1) an ordinary number (Arabic), e.g. 1, 2 216 etc. If this is the case, I would like to subtract 16 from this number. For example:

this.zoomType = zoomtype.pref;
this.pageNum = 200;



Second option is a Roman number

if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_v', 15259]); } catch(e) { console.println(e); } };

In this case I would like to have this Roman numbered change into Arabic and then take 2 of it.

For example:

this.zoomType = zoomtype.pref;
this.pageNum = 3;


At first I tried to use bash, but it seems that it does not allow for what I am looking for. Perl supports Roman numbers [no link because I do not have 5 points] and regex. When it comes to regex before I was told that it would not be good idea to use it, I had come up with this piece of code:

use warnings; use strict; our @array = `find -P $path -type f -name \'*.pdf\'`; foreach my $p (@array){ open(my $source, $p) or die "Cannot open a file"; while(my $line = <$source>){ if($line =~ (?<=pp_)\d+(?:\'\d+)?){

but it is possibly buggy and supports only Arabic (0-9) numbers.

All the code above is also in attached .txt since I cannot set the correct formatting. In an attachment you can also find sample 'unpacked' pdf with two entries edited by me (line 6242 and 6246) to show what I am looking for.

Last edited by menteith; 11-04-2014 at 08:04 AM.. Reason: formatting; added unpacked pdf
# 2  
Old 11-03-2014
I'm unsure if a uncompressed pdf document is plain text. If so you could use awk for this problem:

Code:
awk '
BEGIN {
    rn="MDCLXVI" # Roman numerals desc order
    split("1000 500 100 50 10 5 1",v) # value for each numeral
}
function roman_val(s,val,c,d,p,q) {
    if(s !~ "^[" rn tolower(rn) "]+$") return 0;
    c = split(toupper(s),d,"")
    val = v[index(rn,d[c])]
    while (--c) {
      p = index(rn,d[c])
      q = index(rn,d[c+1])
      val += (p>q)? -v[p] : v[p]
    }
    return val
}
/p_[0-9]+/ {
  x=$0
  while(match(x, "p_[0-9]+")) {
      pg=substr(x,RSTART+2,RLENGTH-2)-16
      n=n substr(x,1,RSTART) "p_" pg
      x=substr(x,RSTART+RLENGTH)
  }
  $0= n x
}

$0 ~ "p_[" rn tolower(rn) "]+" {
  while(match($0, "p_[" rn tolower(rn) "]+")) {
      pg=roman_val(substr($0,RSTART+2,RLENGTH-2))-2
      x=substr($0,1,RSTART) "p_" pg substr($0,RSTART+RLENGTH)
      $0=x
  }
} 1' your_uncompressed.pdf

# 3  
Old 11-04-2014
Code

Quote:
Originally Posted by Chubler_XL
I'm unsure if a uncompressed pdf document is plain text.
I think you are right. Unpacked pdf file looks like a plain text file but it is still probably a binary file. However, the tool I used to unpack (pdftk) claims that unpacked file can be edited by a simple text editor:

Quote:
Uncompress PDF page streams for editing the PDF in a text editor (e.g., vim, emacs)
pdftk doc.pdf output doc.unc.pdf uncompress
When I run your script, it shows some binary characters and crashes. I also thought of using awk but as far as I know it works only if a file is plain text.

Could you have a look at the pdf I have attached in my first post?
# 4  
Old 11-04-2014
How about this perl solution using the Roman CPAN module:

Code:
use warnings;
use strict;

use Roman;

our @array = `find -P . -type f -name \'*.pdf\'`;

foreach my $p (@array){

    chomp($p);
    open(my $source, $p) or die "Cannot open file $p";
    open(my $dest, '>', $p . ".new") or die "Cannot open output file $p.new";
    binmode($source);
    binmode($dest);

    while(my $line = <$source>){
        while (my ($page) = $line =~ /pp_(\d+)/) {
            my $newpage = $page-16;
            $line =~ s/pp_$page/zUNIQz_$newpage/;
        }
        while (my ($rdigit) = $line =~ /pp_([MDCLXVI]+)/i) {
           my $newpage = arabic($rdigit)-2;
           $line =~ s/pp_$rdigit/zUNIQz_$newpage/;
        }
        $line =~ s/zUNIQz_/pp_/g;
        print $dest $line;
    }
    close($source);
    close($dest);
    rename "$p" => "$p.bak" or
          die "can't rename $p to $p.bak";
    rename "$p.new" => "$p" or
          die "can't rename $p.new to $p";
}

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Required to get a portion of a text

Hello Gurus, I have a filesystem like below : /u03/oracle/EBSDEV/fs1/EBSapps/appl I want to get only the portion of the above text like below... /u03/oracle/EBSDEV Can you please advice on this? Thanks- P (5 Replies)
Discussion started by: pokhraj_d
5 Replies

2. Shell Programming and Scripting

Bash to add portion of text to files in directory using numerical match

In the below bash I am trying to rename eachof the 3 text files in /home/cmccabe/Desktop/percent by matching the numerical portion of each file to lines 3,4, or 5 in /home/cmccabe/Desktop/analysis.txt. There will always be a match between the files. When a match is found each text file in... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

Shell or perl script to replace XML text in bulk

Hi, I am looking for assistance over shell or perl (without XML twig module) which replace string in XML file under particular branch..example of code file sample.. Exact requirment : Replace "Su saldo es" in below file with "Your balance" but only in XML branch of Text id=98 and Text Id=12... (7 Replies)
Discussion started by: Ashu_099
7 Replies

4. Shell Programming and Scripting

perl script to replace the text in the original file

Hi Folks, I have an html file which contains the below line in the body tagI am trying the replace hello with Hello Giridhar programatically. <body> <P><STRONG><FONT face="comic sans ms,cursive,sans-serif"><EM>Hello</EM></FONT></STRONG></P> </body> I have written the below code to... (3 Replies)
Discussion started by: giridhar276
3 Replies

5. UNIX for Dummies Questions & Answers

Perl one liner to replace text

Not quite a unix question but problem in a perl command. Taking a chance if someone knows about the error cat 1 a b c d perl -p -e 's/a/b/g' 1 b b c d What is the problem here?? perl -p -i -e 's/a/b/g' 1 Can't remove 1: Text file busy, skipping file. (2 Replies)
Discussion started by: analyst
2 Replies

6. Shell Programming and Scripting

perl : replace multiline text between two marker points

Hi there I just wondered if someone could give me some perl advice I have a bunch of text files used for a wiki that have common headings such as ---++ Title blah ---++ Summary blah ---++ Details Here is the multiline block of text I wish to (6 Replies)
Discussion started by: rethink
6 Replies

7. Shell Programming and Scripting

Executing a batch of files within a shell script with option to refire the individual files in batch

Hello everyone. I am new to shell scripting and i am required to create a shell script, the purpose of which i will explain below. I am on a solaris server btw. Before delving into the requirements, i will give youse an overview of what is currently in place and its purpose. ... (2 Replies)
Discussion started by: goddevil
2 Replies

8. Shell Programming and Scripting

Replace/Remove not specific text in perl

Hello, Consider that i have many files that have the below format: file1 900 7777 1000 5 6 23 nnnnnnnnnnnnnnnnnn 1100 kkkkkkk file2 900 1989 1000 5 3 10 kkkdfdfdffd 1100 kkkkkkk What i would like to do is on every file to search the line that starts with... (4 Replies)
Discussion started by: chriss_58
4 Replies

9. Shell Programming and Scripting

Find and add/replace text in text files

Hi. I would like to have experts help on below action. I have text files in which page nubmers exists in form like PAGE : 1 PAGE : 2 PAGE : 3 and so on there is other text too. I would like to know is it possible to check the last occurance of Page... (6 Replies)
Discussion started by: lodhi1978
6 Replies

10. UNIX for Dummies Questions & Answers

Find and replace portion of file names

Hey all, So I know you can easily find and replace words and strings in text files, but is there an easy way to find and replace just a sub-portion of text in the file name. For example, in a directory I have tons of file names that start with F00001-0708, and I want to change all the files to... (2 Replies)
Discussion started by: hertingm
2 Replies
Login or Register to Ask a Question