Sponsored Content
Top Forums Shell Programming and Scripting CSV with commas in field values, remove duplicates, cut columns Post 302580066 by krishnix on Wednesday 7th of December 2011 10:35:37 AM
Old 12-07-2011
CSV with commas in field values, remove duplicates, cut columns

Hi

Description of input file I have:
-------------------------
1) CSV with double quotes for string fields.
2) Some string fields have Comma as part of field value.
3) Have Duplicate lines
4) Have 200 columns/fields
5) File size is more than 10GB

Description of output file I need:
-------------------------------
1) Can be of CSV or Pipe delimited
2) But Comma within field value should remain
3) No Duplicate lines
4) I need only first 150 columns

Code I used till now:
-------------------
Code:
cat file.in | awk -F"," '!($1$2$3 in a){a[$1$2$3];print $0}' | cut -d, -f1-150 > file.out

But with this code, comma's within field value is treated as delimiter.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate commas after exporting excel file to csv

Hello everyone I'm new here and this is my first post so first of all I want to say that this is a great forum and I have managed to found most of my answers in these forums : ) So with that I ask you my first question: I have an excel file which I saved as a csv. However the excel file... (3 Replies)
Discussion started by: Spunkerspawn
3 Replies

2. Shell Programming and Scripting

shell script to remove extra commas from CSV outp file

Name,,,,,,,,,,,,,,,,,,,,Domain,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Contact,Phone,Email,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Location -----------------------,------------------------------------------------,-------,-----,---------------------------------,------------------------------------ ----... (1 Reply)
Discussion started by: sreenath1037
1 Replies

3. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

4. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies

5. Linux

How do I format a Date field of a .CSV file with multiple commas in a string field?

I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below: column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in... (3 Replies)
Discussion started by: dhruuv369
3 Replies

6. Shell Programming and Scripting

Trying to remove duplicates based on field and row

I am trying to see if I can use awk to remove duplicates from a file. This is the file: -==> Listvol <== deleting /vol/eng_rmd_0941 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_1006 deleting /vol/eng_rmd_1012 rearrange /vol/eng_rmd_0943 ... (6 Replies)
Discussion started by: newbie2010
6 Replies

7. Shell Programming and Scripting

Shell script that should remove unnecessary commas between double quotes in CSV file

i have data as below 123,"paul phiri",paul@yahoo.com,"po.box 23, BT","Eco Bank,Blantyre,Malawi" i need an output to be 123,"paul phiri",paul@yahoo.com,"po.box 23 BT","Eco Bank Blantyre Malawi" (5 Replies)
Discussion started by: mathias23
5 Replies

8. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

9. Shell Programming and Scripting

Remove quotes and commas from field

In the attached file I am trying to remove all the "" and , (quotes and commas) from $2 and $3 and the "" (quotes) from $4. I tried the below as a start: awk -F"|" '{gsub(/\,/,X,$2)} 1' OFS="\t" enhancer.txt > comma.txt Thank you :). (6 Replies)
Discussion started by: cmccabe
6 Replies

10. Shell Programming and Scripting

How to remove unwanted commas from a .csv file?

how to remove unwanted commas from a .csv file Input file format "Server1","server-PRI-Windows","PRI-VC01","Microsoft Windows Server 2012, (64-bit)","Powered On","1,696.12","server-GEN-SFCHT2-VMS-R013,server-GEN-SFCHT2-VMS-R031,server-GEN-SFCHT2-VMS-R023"... (5 Replies)
Discussion started by: ranjancom2000
5 Replies
cut(1)							      General Commands Manual							    cut(1)

Name
       cut - cut out selected fields of each line of a file

Syntax
       cut -clist [file1 file2...]
       cut -flist [-dchar] [-s] [file1 file2...]

Description
       Use  the  command to cut out columns from a table or fields from each line of a file.  The fields as specified by list can be fixed length,
       that is, character positions as on a punched card (-c option), or the length can vary from line to line and be marked with a  field  delim-
       iter character like tab (-f option).  The command can be used as a filter.  If no files are given, the standard input is used.

       Use to make horizontal ``cuts'' (by context) through a file, or to put files together in columns.  To reorder columns in a table, use and

Options
       list	   Specifies  ranges  that must be a comma-separated list of integer field numbers in increasing order.  With optional - indicates
		   ranges as in the -o option of nroff/troff for page ranges; for example, 1,4,7; 1-3,8; -5,10 (short for 1-5,10);  or	3-  (short
		   for third through last field).

       -clist	   Specifies character positions to be cut out.  For example, -c1-72 would pass the first 72 characters of each line.

       -flist	   Specifies  the  fields  to be cut out.  For example, -f1,7 copies the first and seventh field only.	Lines with no field delim-
		   iters are passed through intact (useful for table subheadings), unless -s is specified.

       -dchar	   Uses the specified character as the field delimiter.  Default is tab.  Space or other characters with special  meaning  to  the
		   shell must be quoted.  The -d option is used only in combination with the -f option, according to XPG3 and SVID2/SVID3.

       -s	   Suppresses  lines  with  no	delimiter  characters.	 Unless  specified, lines with no delimiters are passed through untouched.
		   Either the -c or -f option must be specified.

Examples
       Mapping of user IDs to names:
       cut -d: -f1,5 /etc/passwd
       To set name to the current login name for the csh shell:
       set name=`who am i | cut -f1 -d" "`
       To set name to the current login name for the sh, sh5, and ksh shells:
       name=`who am i | cut -f1 -d" "`

Diagnostics
       "line too long"	   A line can have no more than 511 characters or fields.

       "bad list for c/f option"
			   Missing -c or -f option or incorrectly specified list.  No error occurs if a line has fewer fields than the list  calls
			   for.

       "no fields"	   The list is empty.

See Also
       grep(1), paste(1)

																	    cut(1)
All times are GMT -4. The time now is 08:24 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy