Sponsored Content
Top Forums Shell Programming and Scripting removing duplicate records comparing 2 csv files Post 302599911 by agama on Sunday 19th of February 2012 09:22:59 AM
Old 02-19-2012
Working with Franklin52's suggestion this is probably all you need:

Code:
grep -v -f file2.csv file1.csv >output-file

I note that in your sample, file2 isn't actually a comma separated list. If that is true, then the previous command will be fine. However, if file2 is indeed a comma separated list (as the name and your description implies) then you'll need to take a different approach.
 

10 More Discussions You Might Find Interesting

1. Linux

Need awk script for removing duplicate records

I have huge txt file having millions of trade data. For e.g Trade.txt (first 8 lines in the file is header info) COB_DATE,TRADE_ID,SOURCE_SYSTEM_TRADE_ID,TRADE_GROUP_ID, TRADE_TYPE,DEALER_NAME,EXTERNAL_COUNTERPARTY_ID, EXTERNAL_COUNTERPARTY_NAME,DB_COUNTERPARTY_ID,... (6 Replies)
Discussion started by: nmumbarkar
6 Replies

2. Shell Programming and Scripting

Removing duplicate records from 2 files

Can anyone help me to removing duplicate records from 2 separate files in UNIX? Please find the sample records for both the files cat Monday.dat 3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE 3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE 3MEHM0JG7AR652083MUTLAB NAL-NAFISAH... (4 Replies)
Discussion started by: zooby
4 Replies

3. Linux

Need awk script for removing duplicate records

I have log file having Traffic line 2011-05-21 15:11:50.356599 TCP (6), length: 52) 10.10.10.1.3020 > 10.10.10.254.50404: 2011-05-21 15:11:50.652739 TCP (6), length: 52) 10.10.10.254.50404 > 10.10.10.1.3020: 2011-05-21 15:11:50.652558 TCP (6), length: 89) 10.10.10.1.3020 >... (1 Reply)
Discussion started by: Rastamed
1 Replies

4. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

5. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies

7. Shell Programming and Scripting

Comparing 2 CSV files and sending the difference to a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 ... (1 Reply)
Discussion started by: Naresh101
1 Replies

8. Shell Programming and Scripting

Removing specific records from files when duplicate key

Hello I have been trying to remove a row from a file which has the same first three columns as another row - I have tried lots of different combinations of suggestion on this forum but can't get it exactly right. what I have is 900 - 1000 = 0 900 - 1000 = 2562 1000 - 1100 = 0 1000 - 1100... (7 Replies)
Discussion started by: tinytimmay
7 Replies

9. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns Pasting just three column for problem description I want to filter record if column 1 matches CN or DN then, check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345 and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies

10. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
h5repack(1)						      General Commands Manual						       h5repack(1)

NAME
h5repack - Copy an HDF5 file to a new file with or without compression/chunking SYNOPSIS
h5repack -i file1 -o file2 [-h] [-v] [-f 'filter'] [-l 'layout'] [-m number] [-e file] DESCRIPTION
h5repack is a command line tool that applies HDF5 filters to a input file file1, saving the output in a new file, file2. 'filter' is a string with the format <list of objects> : <name of filter> = <filter parameters>. <list of objects> is a comma separated list of object names meaning apply compression only to those objects. If no object names are speci- fied, the filter is applied to all objects. <name of filter> can be: GZIP, to apply the HDF5 GZIP filter (GZIP compression) SZIP, to apply the HDF5 SZIP filter (SZIP compression) SHUF, to apply the HDF5 shuffle filter FLET, to apply the HDF5 checksum filter NONE, to remove the filter <filter parameters> contains the optional compression information: SHUF (no parameter) FLET (no parameter) GZIP=<deflation level> from 1-9 SZIP=<pixels per block,coding> (pixels per block is a even number in 2-32 and coding method is 'EC' or 'NN') 'layout' is a string with the format <list of objects> : <layout type> <list of objects> is a comma separated list of object names, meaning that layout information is supplied for those objects. If no object names are specified, the layout is applied to all objects. <layout type> can be: CHUNK, to apply chunking layout COMPA, to apply compact layout CONTI, to apply continuous layout <layout parameters> is present for the chunk case only it is the chunk size of each dimension: <dim_1 x dim_2 x ... dim_n> OPTIONS
file1,file2 The input and output HDF5 files -h Print a help message -f filter Filter type -l layout Layout type -v Verbose mode. Print output (list of objects in the file, filters and layout applied). -e file File with the -f and -l options (only filter and layout flags) -d delta Print only differences that are greater than the limit delta. delta must be a positive number. The comparison criterion is whether the absolute value of the difference of two corresponding values is greater than delta (e.g., |a-b| > delta, where a is a value in file1 and b is a value in file2). -m number Do not apply the filter to objects which size in bytes is smaller than number. If no size is specified a minimum of 1024 bytes is assumed. EXAMPLES
Apply GZIP compression to all objects in file1 and save the output in file2: h5repack -i file1 -o file2 -f GZIP=1 -v Apply SZIP compression only to object 'dset1': h5repack -i file1 -o file2 -f dset1:SZIP=8,NN -v Apply a chunked layout to objects 'dset1' and 'dset2': h5repack -i file1 -o file2 -l dset1,dset2:CHUNK=20x10 -v SEE ALSO
h5dump(1), h5ls(1), h5diff(1), h5import(1), gif2h5(1), h52gif(1), h5perf(1), h5repart(1). h5repack(1)
All times are GMT -4. The time now is 07:38 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy