Sponsored Content
Top Forums Shell Programming and Scripting CSV with commas in field values, remove duplicates, cut columns Post 302580108 by Corona688 on Wednesday 7th of December 2011 02:17:10 PM
Old 12-07-2011
First, selecting 150 fields is tricky when you've got a complex delimiter. You could do it in awk, but only some versions of awk, and it probably wouldn't be fast enough. Do you have a C compiler?

Code:
#include <stdio.h>
#include <string.h>

#define FIELDS  150

int main(void)
{
        char buf[32768];

        while(fgets(buf, 32768, stdin))
        {
                int n;
                char *c=strstr(buf, "\",\""); // Find end of 1st field

                // Find end of nth field after
                for(n=0; c && (n<(FIELDS-1)); n++) 
                        c=strstr(c+1, "\",\"");

                if(c)  strcpy(c+1, "\n"); // end the line early
                fputs(buf, stdout); // print the line again
        }
}

Code:
$ gcc 150cols.c -o 150cols
$ cat 200cols
"1:asdf","2:asdf","3:asdf","4:asdf",...,"199:asdf","200:asdf"
$ ./150cols < 200cols > 150cols
$ cat 150cols
"1:asdf","2:asdf","3:asdf","4:asdf","5:asdf","6:asdf",...,"150:asdf"

Now that you can do that, I think you're going to need to sort your data in order to remove duplicates. The alternative, storing up to 10 gigabytes in memory during processing so you can tell whether a line's duplicate or not, just isn't feasible. So use it in combination with sort to remove duplicate lines:
Code:
sort -u < input | ./150cols > output

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate commas after exporting excel file to csv

Hello everyone I'm new here and this is my first post so first of all I want to say that this is a great forum and I have managed to found most of my answers in these forums : ) So with that I ask you my first question: I have an excel file which I saved as a csv. However the excel file... (3 Replies)
Discussion started by: Spunkerspawn
3 Replies

2. Shell Programming and Scripting

shell script to remove extra commas from CSV outp file

Name,,,,,,,,,,,,,,,,,,,,Domain,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Contact,Phone,Email,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Location -----------------------,------------------------------------------------,-------,-----,---------------------------------,------------------------------------ ----... (1 Reply)
Discussion started by: sreenath1037
1 Replies

3. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

4. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies

5. Linux

How do I format a Date field of a .CSV file with multiple commas in a string field?

I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below: column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in... (3 Replies)
Discussion started by: dhruuv369
3 Replies

6. Shell Programming and Scripting

Trying to remove duplicates based on field and row

I am trying to see if I can use awk to remove duplicates from a file. This is the file: -==> Listvol <== deleting /vol/eng_rmd_0941 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_1006 deleting /vol/eng_rmd_1012 rearrange /vol/eng_rmd_0943 ... (6 Replies)
Discussion started by: newbie2010
6 Replies

7. Shell Programming and Scripting

Shell script that should remove unnecessary commas between double quotes in CSV file

i have data as below 123,"paul phiri",paul@yahoo.com,"po.box 23, BT","Eco Bank,Blantyre,Malawi" i need an output to be 123,"paul phiri",paul@yahoo.com,"po.box 23 BT","Eco Bank Blantyre Malawi" (5 Replies)
Discussion started by: mathias23
5 Replies

8. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

9. Shell Programming and Scripting

Remove quotes and commas from field

In the attached file I am trying to remove all the "" and , (quotes and commas) from $2 and $3 and the "" (quotes) from $4. I tried the below as a start: awk -F"|" '{gsub(/\,/,X,$2)} 1' OFS="\t" enhancer.txt > comma.txt Thank you :). (6 Replies)
Discussion started by: cmccabe
6 Replies

10. Shell Programming and Scripting

How to remove unwanted commas from a .csv file?

how to remove unwanted commas from a .csv file Input file format "Server1","server-PRI-Windows","PRI-VC01","Microsoft Windows Server 2012, (64-bit)","Powered On","1,696.12","server-GEN-SFCHT2-VMS-R013,server-GEN-SFCHT2-VMS-R031,server-GEN-SFCHT2-VMS-R023"... (5 Replies)
Discussion started by: ranjancom2000
5 Replies
Image::ExifTool::Import(3pm)				User Contributed Perl Documentation			      Image::ExifTool::Import(3pm)

NAME
Image::ExifTool::Import - Import CSV and JSON database files SYNOPSIS
use Image::ExifTool::Import qw(ReadCSV ReadJSON); $err = ReadCSV($csvFile, \%database); $err = ReadJSON($jsonfile, \%database); DESCRIPTION
This module contains routines for importing tag information from CSV (Comma Separated Value) and JSON (JavaScript Object Notation) database files. EXPORTS
Exports nothing by default, but ReadCSV and ReadJSON may be exported. METHODS
ReadCSV / ReadJSON Read CSV or JSON file into a database hash. Inputs: 0) CSV file name. 1) Hash reference for database object. 2) Optional flag to set '-' values to undef in the database. (Used for deleting tags.) 3) [ReadJSON only] Optional character set for converting Unicode escape sequences in strings. Defaults to "UTF8". See the ExifTool Charset option for a list of valid settings. Return Value: These functions return an error string, or undef on success and populate the database hash with entries from the CSV or JSON file. Entries are keyed based on the SourceFile column of the CSV or JSON information, and are stored as hash lookups of tag name/value for each SourceFile. AUTHOR
Copyright 2003-2011, Phil Harvey (phil at owl.phy.queensu.ca) This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. SEE ALSO
Image::ExifTool(3pm) perl v5.12.4 2011-03-12 Image::ExifTool::Import(3pm)
All times are GMT -4. The time now is 04:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy