The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Truncate last <n> characters from a file Gwailo88 UNIX for Dummies Questions & Answers 1 03-05-2008 12:52 AM
Dynamic Attribute Changes er_aparna Shell Programming and Scripting 8 10-31-2006 12:57 AM
UNIX->C++ File attribute mizrachi High Level Programming 0 08-05-2004 08:34 AM
File attribute Help please Cube3k Linux 1 12-15-2003 05:31 PM
File Created On attribute dpalmer UNIX for Dummies Questions & Answers 1 09-16-2001 11:44 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 02-07-2009
parshant_bvcoe parshant_bvcoe is offline
Registered User
  
 

Join Date: Dec 2008
Posts: 23
Truncate the content within alt attribute to first 250 characters.

I have a xml file which contains image tag as follows:

<image><img src="wstc_0007_0007_0_img0001.jpg" width="351" height="450" alt="This is the cover page. Brazil &#x2022; Japan &#x2022; Korea &#x2022; Mexico &#x2022; Singapore &#x2022; Spain" type="photograph" orient="portrait"/></image>

Now, i want to write a script that will count whether the number of characters within alt attribute are greater than 250 and if it is; the data within alt attribute should be truncated to contain only first 250 characters.

It would be really nice if anybody could provide me the way to do so.
  #2 (permalink)  
Old 02-07-2009
Corona688 Corona688 is offline
Registered User
  
 

Join Date: Aug 2005
Location: Saskatchewan
Posts: 1,960
You know, if it truncated it in the middle of an &#x2022; or somesuch the result could be invalid HTML.
  #3 (permalink)  
Old 02-07-2009
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,361

This truncates the alt attribute by item, rather than characters:

Code:
altcheck() #@ USAGE: altcheck [max]
{ #@ Truncate the alt attribute in $line to $max characters
  max=${1:-250}
  right=${line#*alt=\"}
  left=${line%"$right"}
  alt=${right%%\"*}
  right=${line#*"$alt"}
  while [ ${#alt} -gt $max ]
  do
    alt=${alt% &*}
  done
  line=$left$alt$right
}

while IFS= read -r line
do
  case $line in
    *"<img"*alt=*) altcheck ;;
  esac
  printf "%s\n" "$line"
done < "$FILE"
  #4 (permalink)  
Old 02-10-2009
parshant_bvcoe parshant_bvcoe is offline
Registered User
  
 

Join Date: Dec 2008
Posts: 23
Hi Johnson!

Thanks for your help! . But above script is not working. Please help!!!. Thanks.
  #5 (permalink)  
Old 02-10-2009
joeyg's Avatar
joeyg joeyg is offline Forum Staff  
modérateur
  
 

Join Date: Dec 2007
Location: Home of 17-time world champion Boston Celtics
Posts: 1,311
Wink Another way to look at this

I duplicated some of the text inside that alt area so I could show it trimmed down. And then trimmed it at 150.
Note that it does not necessarily break nicely, and does not address the quotation " character.
However, this logic appears to trim on that field.

Code:
> cat file164
<image><img src="wstc_0007_0007_0_img0001.jpg" width="351" height="450" alt="This is the cover page. Brazil &#x2022; Japan &#x2022; Korea &#x2022; Mexico &#x2022; Singapore &#x2022; Spain" type="photograph" orient="portrait"/></image>
<image><img src="wstc_0007_0007_0_img0001.jpg" width="351" height="450" alt="This is the cover page. Brazil &#x2022; Japan &#x2022; Korea &#x2022; Mexico &#x2022; Singapore &#x2022; Spain Brazil &#x2022; Japan &#x2022; Korea &#x2022; Mexico &#x2022; Singapore &#x2022; Spain" type="photograph" orient="portrait"/></image>

> sed "s/alt/~alt/g" file164 | sed "s/type/~type/g" | awk -F"~" '{print $1,substr($2,1,150),$3}'
<image><img src="wstc_0007_0007_0_img0001.jpg" width="351" height="450"  alt="This is the cover page. Brazil &#x2022; Japan &#x2022; Korea &#x2022; Mexico &#x2022; Singapore &#x2022; Spain"  type="photograph" orient="portrait"/></image>
<image><img src="wstc_0007_0007_0_img0001.jpg" width="351" height="450"  alt="This is the cover page. Brazil &#x2022; Japan &#x2022; Korea &#x2022; Mexico &#x2022; Singapore &#x2022; Spain Brazil &#x2022; Japan &#x2022; Kor type="photograph" orient="portrait"/></image>
  #6 (permalink)  
Old 02-11-2009
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,361
Quote:
Originally Posted by parshant_bvcoe View Post
But above script is not working.

What does "not working" mean? What does happen?

Where does the script fail?

Are there any error messages?
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 02:55 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0