07-05-2011
Thanks a ton, Corona688! I'm not sure if I was clear in my previous post. Basically, I also have files of about 15GB size. So if I take one of these files, the sample data would look like this-
72426459560 2010-06-2 ABC LC11100619758
95327GNFA4S 2010-06-2 ABC 97BCX3AMD10G
95327GNFA4S 2010-06-2 ABC 97BCX3AMKLMO
900278VGA4T 2010-06-2 ABC QVA697C8LAYMACBF
900278VG567 2010-06-2 ABC QVA697C8LAYMACBF
(column 3 would be the same for the entire 15GB file)
From this file: I want to find duplicates in column one and four. The output would look something like this:
5327GNFA4S 2010-06-2 ABC 97BCX3AMD10G
95327GNFA4S 2010-06-2 ABC 97BCX3AMKLMO
900278VGA4T 2010-06-2 ABC QVA697C8LAYMACBF
900278VG567 2010-06-2 ABC QVA697C8LAYMACBF
I can use a code that will find me duplicates only in column 1 and save it in file ABC.txt. Then rerun the same code to find duplicates in column 4 and save it in ABC2.txt. Two different files..
I want to save all the duplicates with original records (as in the example above) in a new CSV file.
Last edited by arvindosu; 07-05-2011 at 05:16 PM..
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi all
pls help me by providing soln for my problem
I'm having a text file which contains duplicate records .
Example:
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 ... (1 Reply)
Discussion started by: G.Aavudai
1 Replies
2. Shell Programming and Scripting
Dear All,
I have one file which looks like :
account1:passwd1
account2:passwd2
account3:passwd3
account1:passwd4
account5:passwd5
account6:passwd6
you can see there're two records for account1. and is there any shell command which can find out : account1 is the duplicate record in... (3 Replies)
Discussion started by: tiger2000
3 Replies
3. Shell Programming and Scripting
Hi,
Need to find a duplicate records on the first column,
ANU4501710430989 0000000W20389390
ANU4501710430989 0000000W67065483
ANU4501130050520 0000000W80838713
ANU4501210170685 0000000W69246611... (3 Replies)
Discussion started by: Murugesh
3 Replies
4. Shell Programming and Scripting
I have 2 files
"File 1" is delimited by ";" and "File 2" is delimited by "|".
File 1 below (3 record shown):
Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones
Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull
Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies
5. Shell Programming and Scripting
FILE_ID extraction from file name and save it in CSV file after looping through each folders
My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that?
I have folders in unix environment, directory structure is... (15 Replies)
Discussion started by: princetd001
15 Replies
6. Shell Programming and Scripting
Hi, all
I want to sort a csv file based on timestamp from oldest to newest and save the output as csv file itself. Here is an example of my csv file.
test.csv
SourceFile,DateTimeOriginal
/home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21
/home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28... (10 Replies)
Discussion started by: refrain
10 Replies
7. Shell Programming and Scripting
Hi,
I have another problem. I want to sort another csv file by the first field.
result.csv
SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw
/home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92 ... (2 Replies)
Discussion started by: refrain
2 Replies
8. Shell Programming and Scripting
I have csv file with 30, 40 columns
Pasting just three column for problem description
I want to filter record if column 1 matches CN or DN then,
check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345
and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies
9. Shell Programming and Scripting
Hi Experts,
I have csv file with 30, 40 columns
Pasting just 2 column for problem description.
Need to print error if below combination is not present in file
check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same.
For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
lib(3pm) Perl Programmers Reference Guide lib(3pm)
NAME
lib - manipulate @INC at compile time
SYNOPSIS
use lib LIST;
no lib LIST;
DESCRIPTION
This is a small simple module which simplifies the manipulation of @INC at compile time.
It is typically used to add extra directories to perl's search path so that later "use" or "require" statements will find modules which are
not located on perl's default search path.
Adding directories to @INC
The parameters to "use lib" are added to the start of the perl search path. Saying
use lib LIST;
is almost the same as saying
BEGIN { unshift(@INC, LIST) }
For each directory in LIST (called $dir here) the lib module also checks to see if a directory called $dir/$archname/auto exists. If so
the $dir/$archname directory is assumed to be a corresponding architecture specific directory and is added to @INC in front of $dir.
To avoid memory leaks, all trailing duplicate entries in @INC are removed.
Deleting directories from @INC
You should normally only add directories to @INC. If you need to delete directories from @INC take care to only delete those which you
added yourself or which you are certain are not needed by other modules in your script. Other modules may have added directories which
they need for correct operation.
The "no lib" statement deletes all instances of each named directory from @INC.
For each directory in LIST (called $dir here) the lib module also checks to see if a directory called $dir/$archname/auto exists. If so
the $dir/$archname directory is assumed to be a corresponding architecture specific directory and is also deleted from @INC.
Restoring original @INC
When the lib module is first loaded it records the current value of @INC in an array @lib::ORIG_INC. To restore @INC to that value you can
say
@INC = @lib::ORIG_INC;
CAVEATS
In order to keep lib.pm small and simple, it only works with Unix filepaths. This doesn't mean it only works on Unix, but non-Unix users
must first translate their file paths to Unix conventions.
# VMS users wanting to put [.stuff.moo] into
# their @INC would write
use lib 'stuff/moo';
NOTES
In the future, this module will likely use File::Spec for determining paths, as it does now for Mac OS (where Unix-style or Mac-style paths
work, and Unix-style paths are converted properly to Mac-style paths before being added to @INC).
SEE ALSO
FindBin - optional module which deals with paths relative to the source file.
AUTHOR
Tim Bunce, 2nd June 1995.
perl v5.8.0 2002-06-01 lib(3pm)