CSV with commas in field values, remove duplicates, cut columns
Hi
Description of input file I have:
-------------------------
1) CSV with double quotes for string fields.
2) Some string fields have Comma as part of field value.
3) Have Duplicate lines
4) Have 200 columns/fields
5) File size is more than 10GB
Description of output file I need:
-------------------------------
1) Can be of CSV or Pipe delimited
2) But Comma within field value should remain
3) No Duplicate lines
4) I need only first 150 columns
Code I used till now:
-------------------
But with this code, comma's within field value is treated as delimiter.
Hello everyone I'm new here and this is my first post so first of all I want to say that this is a great forum and I have managed to found most of my answers in these forums : )
So with that I ask you my first question:
I have an excel file which I saved as a csv. However the excel file... (3 Replies)
Hi team,
I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record.
can one help me on finding the duplicates,
Thanks in advance.
... (2 Replies)
Hi All,
I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example:
Input file:
12345a rerere.rerere len=23
11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below:
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in... (3 Replies)
I am trying to see if I can use awk to remove duplicates from a file. This is the file:
-==> Listvol <==
deleting /vol/eng_rmd_0941
deleting /vol/eng_rmd_0943
deleting /vol/eng_rmd_0943
deleting /vol/eng_rmd_1006
deleting /vol/eng_rmd_1012
rearrange /vol/eng_rmd_0943
... (6 Replies)
i have data as below
123,"paul phiri",paul@yahoo.com,"po.box 23, BT","Eco Bank,Blantyre,Malawi"
i need an output to be
123,"paul phiri",paul@yahoo.com,"po.box 23 BT","Eco Bank Blantyre Malawi" (5 Replies)
Hi,
I have a file of csv data, which looks like this:
file1:
1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628
2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
In the attached file I am trying to remove all the "" and , (quotes and commas) from $2 and $3 and the "" (quotes) from $4.
I tried the below as a start:
awk -F"|" '{gsub(/\,/,X,$2)} 1' OFS="\t" enhancer.txt > comma.txt
Thank you :). (6 Replies)
how to remove unwanted commas from a .csv file
Input file format
"Server1","server-PRI-Windows","PRI-VC01","Microsoft Windows Server 2012, (64-bit)","Powered On","1,696.12","server-GEN-SFCHT2-VMS-R013,server-GEN-SFCHT2-VMS-R031,server-GEN-SFCHT2-VMS-R023"... (5 Replies)
Discussion started by: ranjancom2000
5 Replies
LEARN ABOUT NETBSD
cut
CUT(1) BSD General Commands Manual CUT(1)NAME
cut -- select portions of each line of a file
SYNOPSIS
cut -b list [-n] [file ...]
cut -c list [file ...]
cut -f list [-d delim] [-s] [file ...]
DESCRIPTION
The cut utility selects portions of each line (as specified by list) from each file and writes them to the standard output. If the file
argument is a single dash ('-') or no file arguments were specified, lines are read from the standard input. The items specified by list can
be in terms of column position or in terms of fields delimited by a special character. Column numbering starts from 1.
list is a comma or whitespace separated set of increasing numbers and/or number ranges. Number ranges consist of a number, a dash (-), and a
second number and select the fields or columns from the first number to the second, inclusive. Numbers or number ranges may be preceded by a
dash, which selects all fields or columns from 1 to the first number. Numbers or number ranges may be followed by a dash, which selects all
fields or columns from the last number to the end of the line. Numbers and number ranges may be repeated, overlapping, and in any order. It
is not an error to select fields or columns not present in the input line.
The options are as follows:
-b list The list specifies byte positions.
-c list The list specifies character positions.
-d string Use the first character of string as the field delimiter character. The default is the <TAB> character.
-f list The list specifies fields, separated by the field delimiter character. The selected fields are output, separated by the field
delimiter character.
-n Do not split multi-byte characters.
-s Suppresses lines with no field delimiter characters. Unless specified, lines with no delimiters are passed through unmodified.
EXIT STATUS
cut exits 0 on success, 1 if an error occurred.
SEE ALSO paste(1)STANDARDS
The cut utility conforms to IEEE Std 1003.2-1992 (``POSIX.2'').
BSD December 21, 2008 BSD