First, selecting 150 fields is tricky when you've got a complex delimiter. You could do it in awk, but only some versions of awk, and it probably wouldn't be fast enough. Do you have a C compiler?
Now that you can do that, I think you're going to need to sort your data in order to remove duplicates. The alternative, storing up to 10 gigabytes in memory during processing so you can tell whether a line's duplicate or not, just isn't feasible. So use it in combination with sort to remove duplicate lines:
Hello everyone I'm new here and this is my first post so first of all I want to say that this is a great forum and I have managed to found most of my answers in these forums : )
So with that I ask you my first question:
I have an excel file which I saved as a csv. However the excel file... (3 Replies)
Hi team,
I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record.
can one help me on finding the duplicates,
Thanks in advance.
... (2 Replies)
Hi All,
I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example:
Input file:
12345a rerere.rerere len=23
11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below:
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in... (3 Replies)
I am trying to see if I can use awk to remove duplicates from a file. This is the file:
-==> Listvol <==
deleting /vol/eng_rmd_0941
deleting /vol/eng_rmd_0943
deleting /vol/eng_rmd_0943
deleting /vol/eng_rmd_1006
deleting /vol/eng_rmd_1012
rearrange /vol/eng_rmd_0943
... (6 Replies)
i have data as below
123,"paul phiri",paul@yahoo.com,"po.box 23, BT","Eco Bank,Blantyre,Malawi"
i need an output to be
123,"paul phiri",paul@yahoo.com,"po.box 23 BT","Eco Bank Blantyre Malawi" (5 Replies)
Hi,
I have a file of csv data, which looks like this:
file1:
1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628
2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
In the attached file I am trying to remove all the "" and , (quotes and commas) from $2 and $3 and the "" (quotes) from $4.
I tried the below as a start:
awk -F"|" '{gsub(/\,/,X,$2)} 1' OFS="\t" enhancer.txt > comma.txt
Thank you :). (6 Replies)
how to remove unwanted commas from a .csv file
Input file format
"Server1","server-PRI-Windows","PRI-VC01","Microsoft Windows Server 2012, (64-bit)","Powered On","1,696.12","server-GEN-SFCHT2-VMS-R013,server-GEN-SFCHT2-VMS-R031,server-GEN-SFCHT2-VMS-R023"... (5 Replies)