Query: text::csv_xs
OS: centos
Section: 3
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
CSV_XS(3) User Contributed Perl Documentation CSV_XS(3)NAMEText::CSV_XS - comma-separated values manipulation routinesSYNOPSISuse Text::CSV_XS; my @rows; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!"; while (my $row = $csv->getline ($fh)) { $row->[2] =~ m/pattern/ or next; # 3rd field should match push @rows, $row; } close $fh; $csv->eol (" "); open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!"; $csv->print ($fh, $_) for @rows; close $fh or die "new.csv: $!";DESCRIPTIONText::CSV_XS provides facilities for the composition and decomposition of comma-separated values. An instance of the Text::CSV_XS class will combine fields into a CSV string and parse a CSV string into fields. The module accepts either strings or files as input and support the use of user-specified characters for delimiters, separators, and escapes. Embedded newlines Important Note: The default behavior is to accept only ASCII characters in the range from 0x20 (space) to 0x7E (tilde). This means that fields can not contain newlines. If your data contains newlines embedded in fields, or characters above 0x7e (tilde), or binary data, you must set "binary => 1" in the call to "new". To cover the widest range of parsing options, you will always want to set binary. But you still have the problem that you have to pass a correct line to the "parse" method, which is more complicated from the usual point of usage: my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); while (<>) { # WRONG! $csv->parse ($_); my @fields = $csv->fields (); will break, as the while might read broken lines, as that does not care about the quoting. If you need to support embedded newlines, the way to go is to not pass "eol" in the parser (it accepts " ", " ", and " " by default) and then my $csv = Text::CSV_XS->new ({ binary => 1 }); open my $io, "<", $file or die "$file: $!"; while (my $row = $csv->getline ($io)) { my @fields = @$row; The old(er) way of using global file handles is still supported while (my $row = $csv->getline (*ARGV)) { Unicode Unicode is only tested to work with perl-5.8.2 and up. On parsing (both for "getline" and "parse"), if the source is marked being UTF8, then all fields that are marked binary will also be marked UTF8. For complete control over encoding, please use Text::CSV::Encoded: use Text::CSV::Encoded; my $csv = Text::CSV::Encoded->new ({ encoding_in => "iso-8859-1", # the encoding comes into Perl encoding_out => "cp1252", # the encoding comes out of Perl }); $csv = Text::CSV::Encoded->new ({ encoding => "utf8" }); # combine () and print () accept *literally* utf8 encoded data # parse () and getline () return *literally* utf8 encoded data $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default # combine () and print () accept UTF8 marked data # parse () and getline () return UTF8 marked data On combining ("print" and "combine"), if any of the combining fields was marked UTF8, the resulting string will be marked UTF8. Note however that all fields before the first field that was marked UTF8 and contained 8-bit characters that were not upgraded to UTF8, these will be bytes in the resulting string too, causing errors. If you pass data of different encoding, or you don't know if there is different encoding, force it to be upgraded before you pass them on: $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);SPECIFICATIONWhile no formal specification for CSV exists, RFC 4180 1) describes a common format and establishes "text/csv" as the MIME type registered with the IANA. Many informal documents exist that describe the CSV format. How To: The Comma Separated Value (CSV) File Format 2) provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported. 1) http://tools.ietf.org/html/rfc4180 2) http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm The basic rules are as follows: CSV is a delimited data format that has fields/columns separated by the comma character and records/rows separated by newlines. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. However, if a line contains a single entry that is the empty string, it may be enclosed in double quotes. If a field's value contains a double quote character it is escaped by placing another double quote character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format. o Each record is a single line ended by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks may be embedded. o Fields are separated by commas. o Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde). In binary mode all characters are accepted, at least in quoted fields. o A field within CSV must be surrounded by double-quotes to contain a the separator character (comma). Though this is the most clear and restrictive definition, Text::CSV_XS is way more liberal than this, and allows extension: o Line termination by a single carriage return is accepted by default o The separation-, escape-, and escape- characters can be any ASCII character in the range from 0x20 (space) to 0x7E (tilde). Characters outside this range may or may not work as expected. Multibyte characters, like U+060c (ARABIC COMMA), U+FF0C (FULLWIDTH COMMA), U+241B (SYMBOL FOR ESCAPE), U+2424 (SYMBOL FOR NEWLINE), U+FF02 (FULLWIDTH QUOTATION MARK), and U+201C (LEFT DOUBLE QUOTATION MARK) (to give some examples of what might look promising) are therefor not allowed. If you use perl-5.8.2 or higher, these three attributes are utf8-decoded, to increase the likelihood of success. This way U+00FE will be allowed as a quote character. o A field within CSV must be surrounded by double-quotes to contain an embedded double-quote, represented by a pair of consecutive double- quotes. In binary mode you may additionally use the sequence ""0" for representation of a NULL byte. o Several violations of the above specification may be allowed by passing options to the object creator.FUNCTIONSversion (Class method) Returns the current module version. new (Class method) Returns a new instance of Text::CSV_XS. The objects attributes are described by the (optional) hash ref "\%attr". my $csv = Text::CSV_XS->new ({ attributes ... }); The following attributes are available: eol An end-of-line string to add to rows. When not passed in a parser instance, the default behavior is to accept " ", " ", and " ", so it is probably safer to not specify "eol" at all. Passing "undef" or the empty string behave the same. Common values for "eol" are "