Sponsored Content
Top Forums UNIX for Dummies Questions & Answers CSV file:Find duplicates, save original and duplicate records in a new file Post 302536530 by Corona688 on Tuesday 5th of July 2011 04:27:07 PM
Old 07-05-2011
Does your data really have all those blank lines in it?

---------- Post updated at 02:27 PM ---------- Previous update was at 02:24 PM ----------

Assuming it doesn't actually have all those extra blank lines:

Code:
#!/bin/sh

COL=4

# Break file into livable chunks
split -C 100K < megadata.txt

for FILE in ???
do
        sort -k $COL < "${FILE}" > "${FILE}.tmp"
        rm -f "${FILE}"
        mv "${FILE}.tmp" "${FILE}"
        shift
done

sort -k $COL -m ??? | awk -v COL=${COL} '{
        if($COL == LAST)
        {
                if(orig)
                {       print orig;     orig="";        }

                print;
        }
        else
        {
                LAST=$COL;
                orig=$0;
        }

                }' > output.txt
rm -f x??

Will find
Code:
900278VG567 2010-06-2 ABC QVA697C8LAYMACBF
900278VGA4T 2010-06-2 ABC QVA697C8LAYMACBF

based on your input data.

I don't know of a way to do both columns at once. That'd bring you back to the original problem of needing to store everything in memory at once to tell if there were duplicates.

Last edited by Corona688; 07-05-2011 at 05:45 PM..
This User Gave Thanks to Corona688 For This Post:
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to find Duplicate Records in a text file

Hi all pls help me by providing soln for my problem I'm having a text file which contains duplicate records . Example: abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 tas 3420 3562 ... (1 Reply)
Discussion started by: G.Aavudai
1 Replies

2. Shell Programming and Scripting

find out duplicate records in file?

Dear All, I have one file which looks like : account1:passwd1 account2:passwd2 account3:passwd3 account1:passwd4 account5:passwd5 account6:passwd6 you can see there're two records for account1. and is there any shell command which can find out : account1 is the duplicate record in... (3 Replies)
Discussion started by: tiger2000
3 Replies

3. Shell Programming and Scripting

Find Duplicate records in first Column in File

Hi, Need to find a duplicate records on the first column, ANU4501710430989 0000000W20389390 ANU4501710430989 0000000W67065483 ANU4501130050520 0000000W80838713 ANU4501210170685 0000000W69246611... (3 Replies)
Discussion started by: Murugesh
3 Replies

4. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

5. Shell Programming and Scripting

FILE_ID extraction from file name and save it in CSV file after looping through each folders

FILE_ID extraction from file name and save it in CSV file after looping through each folders My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that? I have folders in unix environment, directory structure is... (15 Replies)
Discussion started by: princetd001
15 Replies

6. Shell Programming and Scripting

Save output of updated csv file as csv file itself

Hi, all I want to sort a csv file based on timestamp from oldest to newest and save the output as csv file itself. Here is an example of my csv file. test.csv SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21 /home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28... (10 Replies)
Discussion started by: refrain
10 Replies

7. Shell Programming and Scripting

Save output of updated csv file as csv file itself, part 2

Hi, I have another problem. I want to sort another csv file by the first field. result.csv SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw /home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92 ... (2 Replies)
Discussion started by: refrain
2 Replies

8. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns Pasting just three column for problem description I want to filter record if column 1 matches CN or DN then, check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345 and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies

9. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
INSTALL-INFO(1) 						   User Commands						   INSTALL-INFO(1)

NAME
install-info - update info/dir entries SYNOPSIS
install-info [OPTION]... [INFO-FILE [DIR-FILE]] DESCRIPTION
Add or remove entries in INFO-FILE from the Info directory DIR-FILE. INFO-FILE and DIR-FILE are required unless the --info-file or --dir-file (or --info-dir) options are given, respectively. OPTIONS
--add-once add only to first matching section, not all. --align=COL start description of new entries at column COL. --calign=COL format second and subsequent description lines to start at column COL. --debug report what is being done. --delete delete existing entries for INFO-FILE from DIR-FILE; don't insert any new entries. --description=TEXT the description of the entry is TEXT; used with the --name option to become synonymous with the --entry option. --dir-file=NAME specify file name of Info directory file; equivalent to using the DIR-FILE argument. --dry-run same as --test. --entry=TEXT insert TEXT as an Info directory entry, overriding any corresponding entry from DIR-FILE. TEXT is written as an Info menu item line followed by zero or more extra lines starting with whitespace. If you specify more than one entry, all are added. If you don't specify any entries, they are determined from information in the Info file itself. --help display this help and exit. --info-dir=DIR same as --dir-file=DIR/dir. --info-file=FILE specify Info file to install in the directory; equivalent to using the INFO-FILE argument. --item=TEXT same as --entry=TEXT. --keep-old do not replace entries, or remove empty sections. --maxwidth, --max-width=COL wrap description at column COL. --menuentry=TEXT same as --name=TEXT. --name=TEXT the name of the entry is TEXT; used with --description to become synonymous with the --entry option. --no-indent do not format new entries in the DIR file. --quiet suppress warnings. --regex=R put this file's entries in all sections that match the regular expression R (ignoring case). --remove same as --delete. --remove-exactly only remove if the info file name matches exactly; suffixes such as .info and .gz are not ignored. --section=SEC put entries in section SEC of the directory. If you specify more than one section, all the entries are added in each of the sections. If you don't specify any sections, they are determined from information in the Info file itself. --section R SEC equivalent to --regex=R --section=SEC --add-once. --silent suppress warnings. --test suppress updating of DIR-FILE. --version display version information and exit. REPORTING BUGS
Email bug reports to bug-texinfo@gnu.org, general questions and discussion to help-texinfo@gnu.org. Texinfo home page: http://www.gnu.org/software/texinfo/ COPYRIGHT
Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
The full documentation for install-info is maintained as a Texinfo manual. If the info and install-info programs are properly installed at your site, the command info install-info should give you access to the complete manual. install-info 5.1 June 2014 INSTALL-INFO(1)
All times are GMT -4. The time now is 07:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy