Sponsored Content
Top Forums Shell Programming and Scripting .PDF and .TXT to .XML. Is it possible? Post 302509477 by kurumi on Wednesday 30th of March 2011 11:21:02 PM
Old 03-31-2011
Quote:
Originally Posted by Chubler_XL
kurumi, how does that generate the base64 encode of the pdf file?
well i missed that out didn't i ? Smilie

to generate base64,

Code:
require 'base64'

# xml template
xml=<<EOF
<file>
<text>%s</text>
<pdfcontent>%s</pdfcontent>
</file>
EOF

Dir["*.txt"].each do |file|
    filename=file.sub(/\.txt$/,"")
    pdf = filename+".pdf"
    xmlfile = filename+".xml"
    if File.exists?( pdf )
        b4=Base64.encode64( File.open(pdf).read )
        w = sprintf( xml , file, b4 )
        File.open(xmlfile,"w").write(w)
    end
end

---------- Post updated at 10:21 PM ---------- Previous update was at 10:18 PM ----------

Quote:
Originally Posted by yinyuemi
Good Points, Thanks Chubler_XL.
I have no any idea about how XML parsing, so I followed you code,
how about this? please let me know as usual if any problemSmilie

Code:
for filename in `ls -l |awk -F'[. ]' '/\.txt/||/\.pdf/{++a[$(NF-1)]}END{for(i in a) if(a[i]==2) print i}'`
do
echo -e "<file>\n<text><![CDATA[" `sed 's/]]>/] ]>/g' $filename.txt ` "]]></text>\n<pdfcontent>"  `openssl base64 -in $filename.pdf` "</pdfcontent>\n</file>\n"  >$filename.xml
done

one problem i see is the listing of files using ls -l. A simple shell expansion will do. No need to use ls -l
This User Gave Thanks to kurumi For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Converter XML to PDF in Unix

Does anyone know of a lightweight freeware utility that will do the following?: 1) Input an XML file and XLS file 2) Do a transform 3) Then output a pdf file for Unix Platform. Thanks Andrea (3 Replies)
Discussion started by: andrea.giovanno
3 Replies

2. HP-UX

pdftotext / PDF conversion to .txt binaries

Good day, I've been trying to look for a way to compile the Xpdf sources in our HP-UX server, but have been failing to do so because there is no GCC installed, and I don't have privileges to install GCC. I was looking for a functionality to convert PDF files to .txt, which is exactly like the... (2 Replies)
Discussion started by: mike_s_6
2 Replies

3. Shell Programming and Scripting

Parsing txt, xml files and preparing csv file

Hi, I need to parse text, xml files to get the statistic numbers and prepare summary csv file. What is the best way to parse these file and prepare csv file. Any idea you have , please? Regards, (2 Replies)
Discussion started by: LinuxLearner
2 Replies

4. UNIX for Dummies Questions & Answers

XML to TXT or CSV

Hi all, I am new to unix and even newer to XML :wall: I have a dataset which I need to work on and extract data from but I cant even see things. its a XML file which i need to analyse and return the results in xml as well but need to filter some of them like i would do with excel file so not... (7 Replies)
Discussion started by: A-V
7 Replies

5. UNIX for Dummies Questions & Answers

Need help converting txt to XML

I have a table as following Archive id Line Author Time Text 1fjj34 3 75jk5l 03:20 this is an evidence regarding ... 1fjj34 4 gjhhtrd 03:21 we have seen those documents before 1fjj34 10 645jmdvvb 04:00 Will you consider such an offer?... (0 Replies)
Discussion started by: A-V
0 Replies

6. Shell Programming and Scripting

Replace the .txt file between two strings in XML file

Hi i am having XML file with many number of lines,I need to replace between two strings with .txt file using awk. For ex <PersonInfoShipTo ------------------------------ /> My requirement is to replace the content between <PersonInfoShipTo ------------------------------ /> help me. Thanks... (9 Replies)
Discussion started by: Padmanabhan
9 Replies

7. Shell Programming and Scripting

Download pdf's using wget convert to txt

wget -i genedx.txt The code above will download multiple pdf files from a site, but how can i download and convert these to .txt? I have attached the master list (genedx.txt - which contains the url and file names) as well as the two PDF's that are downloaded. I am trying to have those... (7 Replies)
Discussion started by: cmccabe
7 Replies

8. Red Hat

How to convert TXT to PDF in RHEL 6?

Hello friends, I need to convert ASCII text to PDF on RHEL 6 so I did the below and could generate PDF but it has lot of junk/special characters. yum install enscript ghostscript enscript -p output.ps input.txt ps2pdf output.ps output.pdf So I download latest source of Ghostscript... (4 Replies)
Discussion started by: magnus29
4 Replies

9. Solaris

How to convert pdf file to txt?

Hello Unix gurus, I am learning unix. I have lots pdf data files. I need to convert them into txt files. Can you please guide me how to do that? Thanks in advance. Rao (1 Reply)
Discussion started by: raopatwari
1 Replies

10. Shell Programming and Scripting

Using awk for converting xml to txt

Hi, I have a xml script, I converted it to .txt with values comma seperated using awk function. But I want the output values should be inside double quotes My xml script (Workorders.xml) is shown like below: <?xml version="1.0" encoding="utf-8" ?> <scbm-extract version="3.3">... (8 Replies)
Discussion started by: Viswanatheee55
8 Replies
All times are GMT -4. The time now is 03:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy