CSV with commas in field values, remove duplicates, cut columns
Hi
Description of input file I have:
-------------------------
1) CSV with double quotes for string fields.
2) Some string fields have Comma as part of field value.
3) Have Duplicate lines
4) Have 200 columns/fields
5) File size is more than 10GB
Description of output file I need:
-------------------------------
1) Can be of CSV or Pipe delimited
2) But Comma within field value should remain
3) No Duplicate lines
4) I need only first 150 columns
Code I used till now:
-------------------
But with this code, comma's within field value is treated as delimiter.
Hello everyone I'm new here and this is my first post so first of all I want to say that this is a great forum and I have managed to found most of my answers in these forums : )
So with that I ask you my first question:
I have an excel file which I saved as a csv. However the excel file... (3 Replies)
Hi team,
I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record.
can one help me on finding the duplicates,
Thanks in advance.
... (2 Replies)
Hi All,
I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example:
Input file:
12345a rerere.rerere len=23
11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below:
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in... (3 Replies)
I am trying to see if I can use awk to remove duplicates from a file. This is the file:
-==> Listvol <==
deleting /vol/eng_rmd_0941
deleting /vol/eng_rmd_0943
deleting /vol/eng_rmd_0943
deleting /vol/eng_rmd_1006
deleting /vol/eng_rmd_1012
rearrange /vol/eng_rmd_0943
... (6 Replies)
i have data as below
123,"paul phiri",paul@yahoo.com,"po.box 23, BT","Eco Bank,Blantyre,Malawi"
i need an output to be
123,"paul phiri",paul@yahoo.com,"po.box 23 BT","Eco Bank Blantyre Malawi" (5 Replies)
Hi,
I have a file of csv data, which looks like this:
file1:
1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628
2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
In the attached file I am trying to remove all the "" and , (quotes and commas) from $2 and $3 and the "" (quotes) from $4.
I tried the below as a start:
awk -F"|" '{gsub(/\,/,X,$2)} 1' OFS="\t" enhancer.txt > comma.txt
Thank you :). (6 Replies)
how to remove unwanted commas from a .csv file
Input file format
"Server1","server-PRI-Windows","PRI-VC01","Microsoft Windows Server 2012, (64-bit)","Powered On","1,696.12","server-GEN-SFCHT2-VMS-R013,server-GEN-SFCHT2-VMS-R031,server-GEN-SFCHT2-VMS-R023"... (5 Replies)
Discussion started by: ranjancom2000
5 Replies
LEARN ABOUT DEBIAN
tm::serializable::csv
TM::Serializable::CSV(3pm) User Contributed Perl Documentation TM::Serializable::CSV(3pm)NAME
TM::Serializable::CSV - Topic Maps, trait for parsing (and later dumping) CSV stream
SYNOPSIS
# 1) bare bones
my $tm = .....; # get a map from somewhere (can be empty)
Class::Trait->apply ($tm, "TM::Serializable::CSV");
use Perl6::Slurp;
$tm->deserialize (slurp 'myugly.csv');
# 2) exploiting the timed sync in/out mechanism
my $tm = new TM::.... (url => 'file:myugly.csv'); # get a RESOURCEABLE map from somewhere
$tm->sync_in;
DESCRIPTION
This trait provides parsing and dumping from CSV formatted text streams.
INTERFACE
Methods
deserialize
$tm->deserialize ($text)
This method consumes the text string passed in and interprets it as CSV formatted information. What topic map information is generated,
depends on the header line (the first line):
o If the header line contains a field called "association-type", then all rows will be interpreted as assertions. In that the
remaining header fields (in that order) are interpreted as roles (role types). For all rows in the CSV stream, the position where
the "association-type" field was is ignored. The other fields (in that order) are affiliated with the corresponding roles.
Example:
association-type,location,bio-unit
is-born,gold-coast,rumsti
is-born,vienna,ramsti
Scoping cannot be controlled. Also all players and roles (obviously) are directly interpreted as identifiers. Subject identifiers
and locators are not (yet) implemented.
o If the header line contains a field called "id", then all further rows will be interpreted as topic characteristics, with each
topic on one line. The column position where the "id" field in the header is will be interpreted as toplet identifier.
All further columns will be interpreted according to the following:
o If the header column is named "name", the values will be used as topic names.
o Otherwise if the value looks like a URI, an occurrence with that URI value will be be added to the topic.
o Otherwise an occurrence with a string value will be added to the topic.
Example:
name,id,location,homepage
"Rumsti",rumsti,gold-coast,http://rumsti.com
"Ramsti",ramsti,vienna,http://ramsti.com
serialize
$tm->serialize
[Since TM 1.53] This method serializes a fragment of a topic map into CSV. Which fragment can be controlled with the header line and
options (see constructor).
"header_line" (only for serialization)
This string contains a comma separated list (CSV parseable) of headings. If one of the headings is "association-type", then the
generated CSV content will contain associations only. Nothing else is implemented yet. The other headings control which roles (and
in which order) should be included in the CSV content. If a particular role type has more than one player, then all players are
included.
NOTE: As this is inconsistent, this will have to change.
"type" (only for serialization)
If existing, then this controls which association type is to be taken.
"baseuri" (only for serialization)
If existing and non-zero, the base URI of the map will remain in the identifiers. Otherwise it will be removed.
"specification"
If existing (and when selecting only associations), this specification will be interpreted in the sense of "asserts" (see TM).
Example:
$tm->serialize (header_line => 'association-type,location,bio-unit',
type => 'is-born',
baseuri => 0);
SEE ALSO
TM, TM::Serializable
AUTHOR INFORMATION
Copyright 2010 Robert Barta.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
http://www.perl.com/perl/misc/Artistic.html
perl v5.10.1 2012-06-05 TM::Serializable::CSV(3pm)