Query: text::csv_xs
OS: suse
Section: 3
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
CSV_XS(3) User Contributed Perl Documentation CSV_XS(3)NAMEText::CSV_XS - comma-separated values manipulation routinesSYNOPSISuse Text::CSV_XS; my @rows; my $csv = Text::CSV_XS->new ({ binary => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!"; while (my $row = $csv->getline ($fh)) { $row->[2] =~ m/pattern/ or next; # 3rd field should match push @rows, $row; } $csv->eof or $csv->error_diag (); close $fh; $csv->eol (" "); open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!"; $csv->print ($fh, $_) for @rows; close $fh or die "new.csv: $!";DESCRIPTIONText::CSV_XS provides facilities for the composition and decomposition of comma-separated values. An instance of the Text::CSV_XS class can combine fields into a CSV string and parse a CSV string into fields. The module accepts either strings or files as input and can utilize any user-specified characters as delimiters, separators, and escapes so it is perhaps better called ASV (anything separated values) rather than just CSV. Embedded newlines Important Note: The default behavior is to only accept ascii characters. This means that fields can not contain newlines. If your data contains newlines embedded in fields, or characters above 0x7e (tilde), or binary data, you *must* set "binary => 1" in the call to "new ()". To cover the widest range of parsing options, you will always want to set binary. But you still have the problem that you have to pass a correct line to the "parse ()" method, which is more complicated from the usual point of usage: my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); while (<>) { # WRONG! $csv->parse ($_); my @fields = $csv->fields (); will break, as the while might read broken lines, as that doesn't care about the quoting. If you need to support embedded newlines, the way to go is either my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); while (my $row = $csv->getline (*ARGV)) { my @fields = @$row; or, more safely in perl 5.6 and up my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); open my $io, "<", $file or die "$file: $!"; while (my $row = $csv->getline ($io)) { my @fields = @$row; Unicode (UTF8) On parsing (both for "getline ()" and "parse ()"), if the source is marked being UTF8, then all fields that are marked binary will also be be marked UTF8. On combining ("print ()" and "combine ()"), if any of the combining fields was marked UTF8, the resulting string will be marked UTF8. For complete control over encoding, please use Text::CSV::Encoded: use Text::CSV::Encoded; my $csv = Text::CSV::Encoded->new ({ encoding_in => "iso-8859-1", # the encoding comes into Perl encoding_out => "cp1252", # the encoding comes out of Perl }); $csv = Text::CSV::Encoded->new ({ encoding => "utf8" }); # combine () and print () accept *literally* utf8 encoded data # parse () and getline () return *literally* utf8 encoded data $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default # combine () and print () accept UTF8 marked data # parse () and getline () return UTF8 marked dataSPECIFICATIONWhile no formal specification for CSV exists, RFC 4180 1) describes a common format and establishes "text/csv" as the MIME type registered with the IANA. Many informal documents exist that describe the CSV format. How To: The Comma Separated Value (CSV) File Format 2) provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported. 1) http://tools.ietf.org/html/rfc4180 2) http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm The basic rules are as follows: CSV is a delimited data format that has fields/columns separated by the comma character and records/rows separated by newlines. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. However, if a line contains a single entry which is the empty string, it may be enclosed in double quotes. If a field's value contains a double quote character it is escaped by placing another double quote character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format. o Each record is one line terminated by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks can be embedded. o Fields are separated by commas. o Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde). In binary mode all characters are accepted, at least in quoted fields. o A field within CSV must be surrounded by double-quotes to contain a the separator character (comma). Though this is the most clear and restrictive definition, Text::CSV_XS is way more liberal than this, and allows extension: o Line termination by a single carriage return is accepted by default o The separation-, escape-, and escape- characters can be any ASCII character in the range from 0x20 (space) to 0x7E (tilde). Characters outside this range may or may not work as expected. Multibyte characters, like U+060c (ARABIC COMMA), U+FF0C (FULLWIDTH COMMA), U+241B (SYMBOL FOR ESCAPE), U+2424 (SYMBOL FOR NEWLINE), U+FF02 (FULLWIDTH QUOTATION MARK), and U+201C (LEFT DOUBLE QUOTATION MARK) (to give some examples of what might look promising) are therefor not allowed. If you use perl-5.8.2 or higher, these three attributes are utf8-decoded, to increase the likelihood of success. This way U+00FE will be allowed as a quote character. o A field within CSV must be surrounded by double-quotes to contain an embedded double-quote, represented by a pair of consecutive double- quotes. In binary mode you may additionally use the sequence ""0" for representation of a NULL byte. o Several violations of the above specification may be allowed by passing options to the object creator.FUNCTIONSversion () (Class method) Returns the current module version. new (\%attr) (Class method) Returns a new instance of Text::CSV_XS. The objects attributes are described by the (optional) hash ref "\%attr". Currently the following attributes are available: eol An end-of-line string to add to rows. "undef" is replaced with an empty string. The default is "$". Common values for "eol" are "