Sponsored Content
Top Forums Shell Programming and Scripting How to convert a xls file to csv? Post 302505876 by mirni on Thursday 17th of March 2011 10:05:19 PM
Old 03-17-2011
There is also a utility 'xls2csv' that does the job nicely. Its part of package 'catdoc' and should be in a repository of most popular distros.
on RH variants:
Code:
sudo yum install catdoc
xls2csv myExcelSpreadsheet.xls > converted.csv

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

From xls to csv file

Can we convert an xls file into csv format in Unix Thanks Suresh (1 Reply)
Discussion started by: sureshg_sampat
1 Replies

2. Shell Programming and Scripting

Convert a csv file to an xls format

Hi, I have a file coming in xxx.txt(csv format) i do some work on it and i need to send out as a .xls format. Is there any way there is some code i can use in my script to convert this? I'm struggling on this. Thanks (11 Replies)
Discussion started by: Pablo_beezo
11 Replies

3. UNIX for Dummies Questions & Answers

Unix script to convert .csv file to.xls format

I have a .csv file in Unix box i need a UNIX script to convert the.csv files to.xls format. Its very urgent please help me. (1 Reply)
Discussion started by: moon_friend
1 Replies

4. Shell Programming and Scripting

converting xls file to txt file and xls to csv

I need to convert an excel file into a text file and an excel file into a CSV file.. any code to do that is appreciated thanks (6 Replies)
Discussion started by: bandar007
6 Replies

5. Shell Programming and Scripting

Shell convert xls to csv

Hi does anybody know how to convert xls to csv undex linux. I need only data (that is log from test, dont need any macro and so on) from xls. Any idea how to do that? Perl? shell? Could you give me any example? Thanks in advance for answer. Gracjan (4 Replies)
Discussion started by: Gracjan
4 Replies

6. Shell Programming and Scripting

how to convert XLS to CSV and DOC/RTF to TXT

Hi, i don't know anything about PERL. Can anyone help me providing PERL scripts for 1. converting XLS to CSV (and vice-versa) 2. converting DOC/RTF to TXT Thanks much Prvn (1 Reply)
Discussion started by: prvnrk
1 Replies

7. Shell Programming and Scripting

how to convert .xls to .csv

Hi, I have problem..How to convert .xls file to .csv.. Plz help me for this problem.. (1 Reply)
Discussion started by: varma457
1 Replies

8. AIX

How to convert csv file to xls file

Hi All, I have a java program running in AIX machine which gives me the output in form of .CSV but my clients wants output in the form of .xls When I gave the command mv <filename.csv> <filename.xls> The contents of this .xls file is not exactly in seprate columns as in CSV, the contents... (1 Reply)
Discussion started by: chetu777
1 Replies

9. UNIX and Linux Applications

Tool for Convert XLS into CSV in UNIX

Hi I wanted to convert some XLS files into CSV format in my UNIX box. Unix box is handling very important data which are related to data warehouse.It is fully optimized by installing minimum packages since server need more resources to handle reports generating. Just for convert XLS files... (6 Replies)
Discussion started by: luke_devon
6 Replies

10. Shell Programming and Scripting

Perl script to Convert XLSX or XLS files to CSV file

Hi All, I've got in a situation where I need to convert .xlsx or .xls formatted files into simple text file or .csv file. I've found many options but doing this using PERL script is the best way I believe.I'm in AIX box. Perl code should have 2 params while running. i.e perl... (1 Reply)
Discussion started by: manab86
1 Replies
catdoc(1)						      General Commands Manual							 catdoc(1)

NAME
catdoc - reads MS-Word file and puts its content as plain text on standard output SYNOPSIS
catdoc [-vlu8btawxV] [-m number] [ -s charset] [ -d charset] [ -f output-format] file DESCRIPTION
catdoc behaves much like cat(1) but it reads MS-Word file and produces human-readable text on standard output. Optionally it can use latex(1) escape sequences for characters which have special meaning for LaTeX. It also makes some effort to recognize MS-Word tables, although it never tries to write correct headers for LaTeX tabular environment. Additional output formats, such is HTML can be easily defined. catdoc doesn't attempt to extract formatting information other than tables from MS-Word document, so different output modes means mainly that different characters should be escaped and different ways used to represent characters, missing from output charset. See CHARACTER SUBSTITUTION below catdoc uses internal unicode(7) representation of text, so it is able to convert texts when charset in source document doesn't match charset on target system. See CHARACTER SETS below. If no file names supplied, catdoc processes its standard input unless it is terminal. It is unlikely that somebody could type Word document from keyboard, so if catdoc invoked without arguments and stdin is not redirected, it prints brief usage message and exits. Processing of standard input (even among other files) can be forced using dash '-' as file name. By default, catdoc wraps lines which are more than 72 chars long and separates paragraphs by blank lines. This behavior can be turned of by -w switch. In wide mode catdoc prints each paragraph as one long line, suitable for import into word processors which perform word wrap- ping. OPTIONS
-a - shortcut for -f ascii. Produces ASCII text as output. Separates table columns with TAB -b - process broken MS-Word file. Normally, catdoc checks if first 8 bytes of file is Microsoft OLE signature. If so, it processes file, otherwise it just copies it to stdin. It is intended to use catdoc as filter for viewing all files with .doc extension. -dcharset - specifies destination charset name. Charset file has format described in CHARACTER SETS below and should have .txt extension and reside in catdoc library directory ( ${exec_prefix}/lib/catdoc). By default, current locale charset is used if langinfo support compiled in. -fformat - specifies output format as described in CHARACTER SUBSTITUTION below. catdoc comes with two output formats - ascii and tex. You can add your own if you wish. -l Causes catdoc to list names of available charsets to the stdout and exit successfully. -mnumber Specifies right margin for text (default 72). -m 0 is equivalent to -w -scharset Specifies source charset. (one used in Word document), if Word document doesn't contain UTF-16 text. When reading rtf documents, it is typically not necessary, because rtf documents contain ansicpg specification. But it can be set wrong by Word (I've seen RTF documents on Russian, where cp1252 was specified). In this case this option would take precedence over charset, specified in the document. But source_charset statement in the configuration file have less priority than charset in the document. -t - shortcut for -f tex converts all printable chars, which have special meaning for LaTeX(1) into appropriate control sequences. Separates table columns by &. -u - declares that Word document contain UNICODE (UTF-16) representation of text (as some Word-97 documents). If catdoc fails to correct Word document with default charset, try this option. -8 - declares is Word document is 8 bit. Just in case that catdoc recognizes file format incorrectly. -w disables word wrapping. By default catdoc output is split into lines not longer than 72 (or number, specified by -m option) characters and paragraphs are separated by blank line. With this option each paragraph is one long line. -x causes catdoc to output unknown UNICODE character as xNNNN, instead of question marks. -v causes catdoc to print some useless information about word document structure to stdout before actual start of text. -V outputs catdoc version CHARACTER SETS
When processing MS-Word file catdoc uses information about two character sets, typically different - input and output. They are stored in plain text files in catdoc data directory. Character set files should contain two whitespace-sepa- rated hexadecimal numbers - 8-bit code in character set and 16-bit Unicode code. Anything from hash mark to end of line is ignored, as well as blank lines. catdoc distribution includes some of these character sets. Additional character set definitions, directly usable by catdoc can be obtained from ftp.unicode.org. Charset files have .txt suffix, which shouldn't be specified in command-line or configuration files. Note that catdoc is distributed with Cyrillic charsets as default. If you are not Russian, you probably don't want it, an should reconfig- ure catdoc at compile time or in runtime configuration file. When dealing with documents with charsets other than default, remember that Microsoft never uses ISO charsets. While letters in, say cp1252 are at the same position as in ISO-8859-1, some punctuation signs would be lost, if you specify ISO-8859-1 as input charset. If you use cp1252, catdoc would deal with those signs as described in CHARACTER SUBSTITUTION below. CHARACTER SUBSTITUTION
catdoc converts MS-Word file into following internal Unicode representation: 1. Paragraphs are separated by ASCII Line Feed symbol (0x000A) 2. Table cells within row are separated by ASCII Field Separator symbol (0x001C) 3. Table rows are separated by ASCII Record Separator (0x001E) 4. All printable characters, including whitespace are represented with their respective UNICODE codes. This UNICODE representation is subsequently converted into 8-bit text in target character set using following four-step algorithm: 1. List of special characters is searched for given Unicode character. If found, then appropriate multi-character sequence is output instead of character. 2. If there is an equivalent in target character set, it is output. 3. Otherwise, replacement list is searched and, if there is multi-character substitution for this UNICODE char, it is output. 4. If all above fails, "Unknown char" symbol (question mark) is output. Lists of special characters and list of substitution are character set-independent, because special chars should be escaped regardless of their existence in target character set (usually, they are parts of US-ASCII, and therefore exist in any character set) and replacement list is searched only for those characters, which are not found in target character set. These lists are stored in catdoc data directory in files with prefix of format name. These files have following format: Each line can be either comment (starting with hash mark) or contain hexadecimal UNICODE value, separated by whitespace from string, which would be substituted instead of it. If string contain no whitespace it can be used as is, otherwise it should be enclosed in single or dou- ble quotes. Usual backslash sequences like ' ',' ' can be used in these string. RUNTIME CONFIGURATION
Upon startup catdoc reads its system-wide configuration file /etc/catdocrc and then user-specific configuration file ${HOME}/.catdocrc. These files can contain following directives: source_charset = charset-name Sets default source charset, which would be used if no -s option specified. Consult configuration of nearby windows workstation to find one you need. target_charset = charset-name Sets default output charset. You probably know, which one you use. charset_path = directory-list colon-separated list of directories, which are searched for charset files. This allows you to install additional charsets in your home directory. If first directory component of path is ~ it is replaced by contents of HOME environment variable. On MS-DOS platform, if directory name starts with %s, it is replaced with directory of executable file. Empty element in list (i.e. two con- sequitve colons) is considered current directory. map_path = directory-list colon-separated list of directories, which are searched for special character map and replacement map. Same substitution rules as in charset_path are applied. format = format name Output format which would be used by default. catdoc comes with two formats - ascii and tex but nothing prevents you from writing your own format (set two map files - special character map and replacement map). unknown_char = character specification sets character to output instead of unknown Unicode character (default '?') Character specification can have one of two form - character enclosed in single quotes or hexadecimal code. use_locale =(yes|no) Enables or disables automatic selection of output charset (default yes), based on system locale settings (if enabled at compile time). If automatic detection is enabled, than output charset settings in the configuration files (but not in the command line) are ignored, and current system locale charset is used instead. There are no automatic choice of input charset, based of locale language, because most modern Word files (since Word 97) are Unicode anyway BUGS
Doesn't handle fast-saves properly. Prints footnotes as separate paragraphs at the end of file, instead of producing correct LaTeX com- mands. Cannot distinguish between empty table cell and end of table row. SEE ALSO
xls2csv(1), cat(1), strings(1), utf(4), unicode(7) AUTHOR
V.B.Wagner <vitus@45.free.net> MS-Word reader Version 0.94.4 catdoc(1)
All times are GMT -4. The time now is 08:15 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy