Sponsored Content
Top Forums Shell Programming and Scripting .PDF and .TXT to .XML. Is it possible? Post 302509416 by Chubler_XL on Wednesday 30th of March 2011 06:19:21 PM
Old 03-30-2011
dajon and yinyuemi, actual file contents required between <text> </text> and base64 encode of file between <pdfcontent> and </pdfcontent>

Corona688, CDATA escaping will need to be done, because text file may contain "<" and "&" and these are illegal in XML data blocks.

optik77 - wonder if it would be better to base64 encode the text file too?

Code:
for pdffile in *.pdf
do
   txtfile=${pdffile%.txt}.txt
   xmlfile=${pdffile%.txt}.xml
   if [ -f $pdffie ] && [ -f $txtfile ]
   then
        printf '<file>\n<text><![CDATA['
        sed 's/]]>/] ]>/g' "$txtfile"
        printf ']]></text>\n<pdfcontent>'
        openssl base64 < "$pdffile"
        echo "</pdfcontent>"
        echo "</file>"
    fi > "$xmlfile"
done


Last edited by Chubler_XL; 03-30-2011 at 09:44 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Converter XML to PDF in Unix

Does anyone know of a lightweight freeware utility that will do the following?: 1) Input an XML file and XLS file 2) Do a transform 3) Then output a pdf file for Unix Platform. Thanks Andrea (3 Replies)
Discussion started by: andrea.giovanno
3 Replies

2. HP-UX

pdftotext / PDF conversion to .txt binaries

Good day, I've been trying to look for a way to compile the Xpdf sources in our HP-UX server, but have been failing to do so because there is no GCC installed, and I don't have privileges to install GCC. I was looking for a functionality to convert PDF files to .txt, which is exactly like the... (2 Replies)
Discussion started by: mike_s_6
2 Replies

3. Shell Programming and Scripting

Parsing txt, xml files and preparing csv file

Hi, I need to parse text, xml files to get the statistic numbers and prepare summary csv file. What is the best way to parse these file and prepare csv file. Any idea you have , please? Regards, (2 Replies)
Discussion started by: LinuxLearner
2 Replies

4. UNIX for Dummies Questions & Answers

XML to TXT or CSV

Hi all, I am new to unix and even newer to XML :wall: I have a dataset which I need to work on and extract data from but I cant even see things. its a XML file which i need to analyse and return the results in xml as well but need to filter some of them like i would do with excel file so not... (7 Replies)
Discussion started by: A-V
7 Replies

5. UNIX for Dummies Questions & Answers

Need help converting txt to XML

I have a table as following Archive id Line Author Time Text 1fjj34 3 75jk5l 03:20 this is an evidence regarding ... 1fjj34 4 gjhhtrd 03:21 we have seen those documents before 1fjj34 10 645jmdvvb 04:00 Will you consider such an offer?... (0 Replies)
Discussion started by: A-V
0 Replies

6. Shell Programming and Scripting

Replace the .txt file between two strings in XML file

Hi i am having XML file with many number of lines,I need to replace between two strings with .txt file using awk. For ex <PersonInfoShipTo ------------------------------ /> My requirement is to replace the content between <PersonInfoShipTo ------------------------------ /> help me. Thanks... (9 Replies)
Discussion started by: Padmanabhan
9 Replies

7. Shell Programming and Scripting

Download pdf's using wget convert to txt

wget -i genedx.txt The code above will download multiple pdf files from a site, but how can i download and convert these to .txt? I have attached the master list (genedx.txt - which contains the url and file names) as well as the two PDF's that are downloaded. I am trying to have those... (7 Replies)
Discussion started by: cmccabe
7 Replies

8. Red Hat

How to convert TXT to PDF in RHEL 6?

Hello friends, I need to convert ASCII text to PDF on RHEL 6 so I did the below and could generate PDF but it has lot of junk/special characters. yum install enscript ghostscript enscript -p output.ps input.txt ps2pdf output.ps output.pdf So I download latest source of Ghostscript... (4 Replies)
Discussion started by: magnus29
4 Replies

9. Solaris

How to convert pdf file to txt?

Hello Unix gurus, I am learning unix. I have lots pdf data files. I need to convert them into txt files. Can you please guide me how to do that? Thanks in advance. Rao (1 Reply)
Discussion started by: raopatwari
1 Replies

10. Shell Programming and Scripting

Using awk for converting xml to txt

Hi, I have a xml script, I converted it to .txt with values comma seperated using awk function. But I want the output values should be inside double quotes My xml script (Workorders.xml) is shown like below: <?xml version="1.0" encoding="utf-8" ?> <scbm-extract version="3.3">... (8 Replies)
Discussion started by: Viswanatheee55
8 Replies
base64(n)					       Text encoding & decoding binary data						 base64(n)

__________________________________________________________________________________________________________________________________________________

NAME
base64 - base64-encode/decode binary data SYNOPSIS
package require Tcl 8 package require base64 ?2.4? ::base64::encode ?-maxlen maxlen? ?-wrapchar wrapchar? string ::base64::decode string _________________________________________________________________ DESCRIPTION
This package provides procedures to encode binary data into base64 and back. ::base64::encode ?-maxlen maxlen? ?-wrapchar wrapchar? string Base64 encodes the given binary string and returns the encoded result. Inserts the character wrapchar every maxlen characters of output. wrapchar defaults to newline. maxlen defaults to 60. Note well: If your string is not simple ascii you should fix the string encoding before doing base64 encoding. See the examples. The command will throw an error for negative values of maxlen, or if maxlen is not an integer number. ::base64::decode string Base64 decodes the given string and returns the binary data. The decoder ignores whitespace in the string. EXAMPLES
% base64::encode "Hello, world" SGVsbG8sIHdvcmxk % base64::encode [string repeat xyz 20] eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6 eHl6eHl6eHl6 % base64::encode -wrapchar "" [string repeat xyz 20] eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6eHl6 # NOTE: base64 encodes BINARY strings. % set chemical [encoding convertto utf-8 "Cu2088Hu2081u2080Nu2084Ou2082"] % set encoded [base64::encode $chemical] Q+KCiEjigoHigoBO4oKET+KCgg== % set caffeine [encoding convertfrom utf-8 [base64::decode $encoded]] BUGS, IDEAS, FEEDBACK This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category base64 of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation. KEYWORDS
base64, encoding COPYRIGHT
Copyright (c) 2000, Eric Melski Copyright (c) 2001, Miguel Sofer base64 2.4 base64(n)
All times are GMT -4. The time now is 09:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy