Perl Text Manipulation


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl Text Manipulation
# 1  
Old 03-17-2010
Perl Text Manipulation

I'm in need of help for a project that I'm working on. I believe Perl would be the best way of handling the string manipulation, however, I've barely used perl, and I'm used to BASH scripting. Another note is, this project is in a Windows environment, so I can use Perl, but I do not have shell based utilities available to me, unfortunately.

I have a batch job run that runs tiffinfo on several tiff images, and I get the following output:

Code:
TYPE:
filename.tif:
TIFF Directory at offset 0x10eb4
  Image Width: 1728 
  Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane

This is just for one tiff image. If I had 4 tiff images, for example, my output would be the following:

Code:
TYPE:
filename.tif:
TIFF Directory at offset 0x10eb4
  Image Width: 1728 
  Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
filename2.tif:
TIFF Directory at offset 0x4aac
  Image Width: 1728 
  Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
filename3.tif:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Image Width: 124 
  Image Length: 124
  Resolution: 31, 31 pixels/inch
  Bits/Sample: 8
  Compression Scheme: None
  Photometric Interpretation: RGB color
  Software: "¼"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
TYPE:
filename4.tif:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Image Width: 1419 
  Image Length: 1001
  Resolution: 300, 300 pixels/inch
  Bits/Sample: 8
  Compression Scheme: LZW
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)

What I'm wanting is, for some perl code to be written to manipulate each "block" of text a certain way. I guess we could call a "block" each section between each TYPE: section. Here's how I'm needing each block formatted:

The value of Compression Scheme:, Resolution:, Photometric Interpretation:, Image Width:, Image Length: all need to be separated by colons, right after the filename (after TYPE:). For example the final output (if you were only manipulating 1 block) would be:

Code:
TYPE:
filename.tif:CCITT Group 4:200, 200 pixels/inch:min-is-white:1728:2376:
TIFF Directory at offset 0x10eb4
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane

If there are any Perl gurus out there, I'm looking to discuss how to get this completed. Please let me know and any help is appriciated.

---------- Post updated at 08:33 AM ---------- Previous update was at 08:33 AM ----------

Note sure why the text looks so bad, I used bbcode and formatted it correctly, double and tripple checked open/close tags.
# 2  
Old 03-17-2010
Maybe something like this ?

Code:
C:\>
C:\> REM your data file is saved as "tiff_output.txt"
C:\> REM I don't show it here in order to save space
C:\> REM display the contents of the Perl program
C:\>
C:\> type tiff_process.pl
#!perl -w
# pass the filename as first argument; set the variable $infile to it
$infile = $ARGV[0];
# open the file
open (IN, $infile) or die "Can't open $infile: $!";
# loop through it
while (<IN>) {
  chomp;
  # if we've reached the "record-delimiter i.e. TYPE:" or EOF
  if (/^TYPE:/ or eof) {
    # check if array "@line" is defined; if so, print it after
    # appending the additional information after filetype i.e. 2nd element
    if (@line) {
      $line[1] .= "$cs:$r:$pi:$iw:$il:";
      foreach (@line) {print $_,"\n"}
      # Important - reset the array
      @line = ();
    }
  } else {
    # check for compression scheme, resolution, photometric interpretation,
    # image width and image length, and set the variables $cs, $r, $pi, $iw, $il
    if (/^\s*Compression Scheme:\s*(.*?)$/) {$cs = $1}
    elsif (/^\s*Resolution:\s*(.*?)$/) {$r = $1}
    elsif (/^\s*Photometric Interpretation:\s*(.*?)$/) {$pi = $1}
    elsif (/^\s*Image Width:\s*(.*?)\s*$/) {$iw = $1}
    elsif (/^\s*Image Length:\s*(.*?)$/) {$il = $1}
  }
  # add the current line to the array @line
  push @line, $_;
}
# clean up after we're done
close (IN) or die "Can't close $infile: $!";
C:\>
C:\>
C:\> REM execute the Perl program
C:\>
C:\> perl tiff_process.pl tiff_output.txt
TYPE:
filename.tif:CCITT Group 4:200, 200 pixels/inch:min-is-white:1728:2376:
TIFF Directory at offset 0x10eb4
  Image Width: 1728
  Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
filename2.tif:CCITT Group 4:200, 200 pixels/inch:min-is-white:1728:2376:
TIFF Directory at offset 0x4aac
  Image Width: 1728
  Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
filename3.tif:None:31, 31 pixels/inch:RGB color:124:124:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Image Width: 124
  Image Length: 124
  Resolution: 31, 31 pixels/inch
  Bits/Sample: 8
  Compression Scheme: None
  Photometric Interpretation: RGB color
  Software: "╝"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
TYPE:
filename4.tif:LZW:300, 300 pixels/inch:RGB color:1419:1001:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Image Width: 1419
  Image Length: 1001
  Resolution: 300, 300 pixels/inch
  Bits/Sample: 8
  Compression Scheme: LZW
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
C:\>
C:\>

HTH,
tyler_durden

Last edited by durden_tyler; 03-17-2010 at 04:17 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Text manipulation

i want to generate a list line-by-line of normal characters using letters . for example : dnds gnos mgod pets jnfp etc... i want to use all letters with all the posibilities is there a script that can do this ? (3 Replies)
Discussion started by: suppliernr1
3 Replies

2. UNIX for Dummies Questions & Answers

Text manipulation help

Hello unix.com I'm having trouble with a text file. It looks like this: Alvaro Costa Daldit Kaur Sings Brian G Heward Desmond Ogilvie John Der William Gherasim Lance Mackey Donald Kopplin Robert Mckinlay Jahir Hussain Mohamed Jack Benaim Abraham Weiss I want... (7 Replies)
Discussion started by: galford
7 Replies

3. UNIX for Dummies Questions & Answers

Text manipulation help

Hello unix.com users, I have a ip file (line-by-line). How can I delete the ips that keep repeating by mark XXX.XXX.XXX.* ... I want to erase only the lines that keep repeating more than 2 times. Example: 1.2.3.1 1.2.3.2 1.2.3.3 I want to erase all ips blocks that are repeating by C... (1 Reply)
Discussion started by: galford
1 Replies

4. UNIX for Dummies Questions & Answers

Text Manipulation Help

Hello Unix.com, I have a text in format: john sara lee How can I make it: john:john john:john1 john:john12 john:john123 sara:sara sara:sara12 sara:sara123 and so on (2 Replies)
Discussion started by: galford
2 Replies

5. UNIX for Dummies Questions & Answers

text manipulation help

Hello again unix.com How can I extract from a large file in format: steve@aol.com steve hawkins Location of this member is bla bla bla sun@hotmail.com Sun Ying This member is using browser bla bla bla to another text in format: steve@aol.com steve hawkins sun@hotmail.com sun ying ... (5 Replies)
Discussion started by: galford
5 Replies

6. Shell Programming and Scripting

Text Manipulation Help

Hello unix.com people! How can I modify a text in format: A:B:C A:B:C A:B:C into C/A/B C/A/B C/A/B Note: Text is line by line and "C", "B", "A" fields are different each row. Thanks in advance. (7 Replies)
Discussion started by: galford
7 Replies

7. Shell Programming and Scripting

text manipulation

Hi All; i need to do text processing : I have a file: file1.txt >>>>>>>>>>>> 30 2 23 some 30 2 22 text 30 2 21 xyz 30 2 20 ttttt 30 2 19 ttttt-1 30 2 18 xryz 30 2 17 xyzr 30 2 16 xy111z 30 2 15 xanyyz 30 2 14 xzz 30 2 13 xyy 30 2 0 zzz-w 50 3 25 zzz-w 50 3 12 productw 50 3 10... (4 Replies)
Discussion started by: unlx
4 Replies

8. Shell Programming and Scripting

Perl Text manipulation

Hello All, I have been working on a great script to remotely gather server info and store it in a .txt that can be imported to .xls I have been reading the hostnames that are in the /.shh/known_hosts file so I don't have to mess with passing a password - via ssh (not easy to do , by the... (1 Reply)
Discussion started by: dfezz1
1 Replies

9. UNIX for Dummies Questions & Answers

Help with text manipulation

Hi there, I have some text files in unix format that processed by a program in windows, and when I open them with less or vi in linux, a warn for opening binary file is prompted, and as shown in vi, between every two characters there was inserted a "^@". How can I fix this. Plus, there are over... (2 Replies)
Discussion started by: dustinwang2003
2 Replies

10. Shell Programming and Scripting

Text Manipulation.

Hi I have only ever used awk and sed for basic requirements up until now. I have had to break a log down for multiple purposes. Using awk, sed and a date script. I am left with this: (message id, time of msg attempt, message id, domain name, time of msg completion) ... (4 Replies)
Discussion started by: Icepick
4 Replies
Login or Register to Ask a Question