CSV file:Find duplicates, save original and duplicate records in a new file
Hi Unix gurus,
Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me.
File format: CSV file
File has four columns with no header
File Size is 120GB.
Here are a few sample rows:
There are duplicates in column 1 and 4 (I know this for a fact).
I would like to find all the duplicates in column 1 and 4. In the example above, I want rows 2 and 3 (since the columns 1 has duplicates) and also rows 4 and 5 (since column four has duplicates).
If this is too complicated, may be I can look for duplicates in column 1 first and save a new file and then look for duplicates in column 4. (Since I am new to Unix, may be thats the way to go)
I want to save all the duplicates with original records (as in the example above) in a new CSV file.
---------- Post updated at 01:59 PM ---------- Previous update was at 01:56 PM ----------
For more clarity: My results would look like this:
Hi all
pls help me by providing soln for my problem
I'm having a text file which contains duplicate records .
Example:
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 ... (1 Reply)
Dear All,
I have one file which looks like :
account1:passwd1
account2:passwd2
account3:passwd3
account1:passwd4
account5:passwd5
account6:passwd6
you can see there're two records for account1. and is there any shell command which can find out : account1 is the duplicate record in... (3 Replies)
Hi,
Need to find a duplicate records on the first column,
ANU4501710430989 0000000W20389390
ANU4501710430989 0000000W67065483
ANU4501130050520 0000000W80838713
ANU4501210170685 0000000W69246611... (3 Replies)
I have 2 files
"File 1" is delimited by ";" and "File 2" is delimited by "|".
File 1 below (3 record shown):
Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones
Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull
Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
FILE_ID extraction from file name and save it in CSV file after looping through each folders
My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that?
I have folders in unix environment, directory structure is... (15 Replies)
Hi, all
I want to sort a csv file based on timestamp from oldest to newest and save the output as csv file itself. Here is an example of my csv file.
test.csv
SourceFile,DateTimeOriginal
/home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21
/home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28... (10 Replies)
Hi,
I have another problem. I want to sort another csv file by the first field.
result.csv
SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw
/home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92 ... (2 Replies)
I have csv file with 30, 40 columns
Pasting just three column for problem description
I want to filter record if column 1 matches CN or DN then,
check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345
and if column 2 contains 6789, 6789... (5 Replies)
Hi Experts,
I have csv file with 30, 40 columns
Pasting just 2 column for problem description.
Need to print error if below combination is not present in file
check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same.
For all such rows, the field... (7 Replies)