Sponsored Content
Top Forums UNIX for Dummies Questions & Answers CSV file:Find duplicates, save original and duplicate records in a new file Post 302536521 by arvindosu on Tuesday 5th of July 2011 04:06:18 PM
Old 07-05-2011
Thanks a ton, Corona688! I'm not sure if I was clear in my previous post. Basically, I also have files of about 15GB size. So if I take one of these files, the sample data would look like this-

72426459560 2010-06-2 ABC LC11100619758

95327GNFA4S 2010-06-2 ABC 97BCX3AMD10G

95327GNFA4S 2010-06-2 ABC 97BCX3AMKLMO

900278VGA4T 2010-06-2 ABC QVA697C8LAYMACBF

900278VG567 2010-06-2 ABC QVA697C8LAYMACBF

(column 3 would be the same for the entire 15GB file)

From this file: I want to find duplicates in column one and four. The output would look something like this:

5327GNFA4S 2010-06-2 ABC 97BCX3AMD10G

95327GNFA4S 2010-06-2 ABC 97BCX3AMKLMO

900278VGA4T 2010-06-2 ABC QVA697C8LAYMACBF

900278VG567 2010-06-2 ABC QVA697C8LAYMACBF

I can use a code that will find me duplicates only in column 1 and save it in file ABC.txt. Then rerun the same code to find duplicates in column 4 and save it in ABC2.txt. Two different files..

I want to save all the duplicates with original records (as in the example above) in a new CSV file.

Last edited by arvindosu; 07-05-2011 at 05:16 PM..
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to find Duplicate Records in a text file

Hi all pls help me by providing soln for my problem I'm having a text file which contains duplicate records . Example: abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 tas 3420 3562 ... (1 Reply)
Discussion started by: G.Aavudai
1 Replies

2. Shell Programming and Scripting

find out duplicate records in file?

Dear All, I have one file which looks like : account1:passwd1 account2:passwd2 account3:passwd3 account1:passwd4 account5:passwd5 account6:passwd6 you can see there're two records for account1. and is there any shell command which can find out : account1 is the duplicate record in... (3 Replies)
Discussion started by: tiger2000
3 Replies

3. Shell Programming and Scripting

Find Duplicate records in first Column in File

Hi, Need to find a duplicate records on the first column, ANU4501710430989 0000000W20389390 ANU4501710430989 0000000W67065483 ANU4501130050520 0000000W80838713 ANU4501210170685 0000000W69246611... (3 Replies)
Discussion started by: Murugesh
3 Replies

4. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

5. Shell Programming and Scripting

FILE_ID extraction from file name and save it in CSV file after looping through each folders

FILE_ID extraction from file name and save it in CSV file after looping through each folders My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that? I have folders in unix environment, directory structure is... (15 Replies)
Discussion started by: princetd001
15 Replies

6. Shell Programming and Scripting

Save output of updated csv file as csv file itself

Hi, all I want to sort a csv file based on timestamp from oldest to newest and save the output as csv file itself. Here is an example of my csv file. test.csv SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21 /home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28... (10 Replies)
Discussion started by: refrain
10 Replies

7. Shell Programming and Scripting

Save output of updated csv file as csv file itself, part 2

Hi, I have another problem. I want to sort another csv file by the first field. result.csv SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw /home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92 ... (2 Replies)
Discussion started by: refrain
2 Replies

8. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns Pasting just three column for problem description I want to filter record if column 1 matches CN or DN then, check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345 and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies

9. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
geniconvtbl(1)                                                     User Commands                                                    geniconvtbl(1)

NAME
geniconvtbl - generate iconv code conversion tables SYNOPSIS
geniconvtbl [-fnq] [-p preprocessor] [-W arg] [-Dname] [-Dname=def] [-Idirectory] [-Uname] [infile...] DESCRIPTION
The geniconvtbl utility accepts code conversion rules defined in flat text file(s) and writes code conversion binary table file(s) that can be used to support user-defined iconv code conversions (see iconv(1) and iconv(3C) for more detail on the iconv code conversion). OPTIONS
The following options are supported: -f Overwrites output file if the output file exists. -n Does not generate an output file. This is useful to check the contents of the input file. -p preprocessor Uses specified preprocessor instead of the default preprocessor, /usr/lib/cpp. -q Quiet option. It suppresses warning and error messages. -W arg Passes the argument arg to the preprocessor. If this option is specified more than once, all arguments are passed to the preprocessor. -Dname geniconvtbl recognizes these options and passes them and their arguments to the preprocessor. -Dname=def -Idirectory -Uname OPERANDS
The following operand is supported: infile A path name of an input file. If no input file is specified, geniconvtbl reads from the standard input stream. The user can specify more than one input file if necessary. OUTPUT
If input is from the standard input stream, geniconvtbl writes output to the standard output stream. If one or more input files are speci- fied, geniconvtbl reads from each input file and writes to a corresponding output file. Each of the output file names will be the same as the corresponding input file with .bt appended. The generated output files must be moved to the following directory prior to using the code conversions at iconv(1) and iconv(3C): /usr/lib/iconv/geniconvtbl/binarytables/ The output file name should start with one or more printable ASCII characters as the 'fromcode' name followed by a percentage character (%), followed by one or more printable ASCII characters as the 'tocode' name, followed by the suffix '.bt'. The 'fromcode' and 'tocode' names are used to identify the iconv code conversion at iconv(1) and iconv_open(3C)). The properly named output file should be placed in the directory, /usr/lib/iconv/geniconvtbl/binarytables/. EXAMPLES
Example 1: Generating an iconv code conversion binary table The following example generates a code conversion binary table with output file name convertA2B.bt: example% geniconvtbl convertA2B Example 2: Generating multiple iconv code conversion binary tables The following example generates two code conversion binary tables with output files test1.bt and test2.bt: example% geniconvtbl test1 test2 Example 3: Using another preprocessor The following example generates a code conversion binary table once the specified preprocessor has processed the input file: example% geniconvtbl -p /opt/SUNWspro/bin/cc -W -E convertB2A Example 4: Placing a binary table To use the binary table created in the first example above as the engine of the conversion 'fromcode' ABC to 'tocode' DEF, become super- user and then rename it and place it like this: example# mv convertA2B.bt /usr/lib/iconv/geniconvtbl/binarytables/ABC%DEF.bt Example 5: Providing modified ISO8859-1 to UTF-8 code conversion Write a geniconvtbl source file that defines the code conversion. For instance, you can copy over /usr/lib/iconv/genicon- vtbl/srcs/ISO8859-1_to_UTF-8.src into your directory and make necessary changes at the source file. Once the modifications are done, generate the binary table: example% geniconvtbl ISO8859-1_to_UTF-8.src As super-user, place the generated binary table with a unique name at the system directory where iconv_open(3C) can find the binary table: example su Password: example% cp ISO8859-1_to_UTF-8.bt /usr/lib/iconv/geniconvtbl/binarytables/my-iso-8859-1%utf-8.bt After that, you can do the iconv code conversion. For instance: example% iconv -f my-iso-8859-1 -t utf-8 testfile.txt ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of geniconvtbl: LANG and LC_CTYPE. EXIT STATUS
The following exit values are returned: 0 No errors occurred and the output files were successfully created. 1 Command line options are not correctly used or an unknown command line option was specified. 2 Invalid input or output file was specified. 3 Conversion rules in input files are not correctly defined. 4 Conversion rule limit of input files has been reached. See NOTES section of geniconvtbl(4). 5 No more system resource error. 6 Internal error. FILES
/usr/lib/iconv/geniconvtbl/binarytables/*.bt conversion binary tables /usr/lib/iconv/geniconvtbl/srcs/* conversion source files for user reference ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsu | +-----------------------------+-----------------------------+ SEE ALSO
cpp(1), iconv(1), iconv(3C), iconv_close(3C), iconv_open(3C), geniconvtbl(4), attributes(5), environ(5), iconv(5) Solaris Internationalization Guide for Developers NOTES
The generated and correctly placed output files, /usr/lib/iconv/geniconvtbl/binarytables/*.bt, are used in both 32-bit and 64-bit environ- ments. SunOS 5.10 30 Nov 2001 geniconvtbl(1)
All times are GMT -4. The time now is 04:55 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy