Join files, omit duplicated records from one file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Join files, omit duplicated records from one file
# 1  
Old 09-26-2017
Join files, omit duplicated records from one file

Hello

I have 2 files, eg

Code:
more file1 file2
::::::::::::::
file1
::::::::::::::
1   fromfile1
2   fromfile1
3   fromfile1
4   fromfile1
5   fromfile1
6   fromfile1
7   fromfile1
::::::::::::::
file2
::::::::::::::
3   fromfile2
5   fromfile2

I want to merge these but only include duplicated fields from the second file. So the result is

Code:
1   fromfile1
2   fromfile1
3   fromfile2
4   fromfile1
5   fromfile2
6   fromfile1
7   fromfile1

Basically merging 2 files but omitting any records in file 1 which appear in file2 based on the key field.

I've started to cobble a script together which
  • makes a list of key fields from file2
  • loops round reading that file and uses grep -v to remove records with that key from file1
  • then use uniq -d to only keep records which were duplicated (so I now have copy of file1 but with noly records 1,2,4,6,7)
  • then concatenate this file and file2

This only if file2 has exactly 2 records.

This feels like something which should be simple but I can't figure it out. I suspect I should be able to use join or maybe awk to achieve what i want but I can't get there & can't find anything through Google.

Can anyone suggest a more elegant solution to my approach? (& frankly one which works because mine doesn't)

Many thanks, Chris
# 2  
Old 09-26-2017
Hello CHoggarth,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{a[$1]=$0;next} ($1 in a){print a[$1];next} 1'  Input_file2  Input_file1

Output will be as follows.
Code:
1   fromfile1
2   fromfile1
3   fromfile2
4   fromfile1
5   fromfile2
6   fromfile1
7   fromfile1

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 09-26-2017
Try also
Code:
sort file[12] | uniq -uw1 | sort - file2
1   fromfile1
2   fromfile1
3   fromfile2
4   fromfile1
5   fromfile2
6   fromfile1
7   fromfile1

This User Gave Thanks to RudiC For This Post:
# 4  
Old 09-26-2017
Ravinder - thanks. This looks great when I change awk to nawk - at least as far as my spec went. I now realise I need to go back to the business & find out whether file2 might include records with keys which are not in file1 at all. If I try that with your solution those records are not included. Is there an easy amendment to your solution? eg file 2 also includes a record:
Code:
8     fromfile2

RudiC - thanks for the reply. I should have said I'm working on Solaris. Unfortunately uniq doesn't have a -w switch on my machine.

Chris
# 5  
Old 09-26-2017
Hi.

Some versions of Solaris could have GNU uniq et al installed:
Code:
OS, ker|rel, machine: SunOS, 5.11, i86pc
Distribution        : Solaris 11.3 X86
guniq uniq (GNU coreutils) 8.16
gawk GNU Awk 3.1.8

Bezs wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicated records and update last line record counts

Hi Gurus, I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record. I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count. awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies

2. Shell Programming and Scripting

Combine/omit data from 2 files

i made a script on my own. this is for the inventory to all of my AWS servers, and i run it to all of my servers to get the hostname, please look at file2. Then i need some data in file3 as well,. i need to combine them #cat file1 192.10.1.41 server.age.com ###### 192.10.0.40 ssh cant... (10 Replies)
Discussion started by: kenshinhimura
10 Replies

3. Shell Programming and Scripting

Listing the file name and no of records in each files for the files created on a specific day

Hi, I want to display the file names and the record count for the files in the 2nd column for the files created today. i have written the below command which is listing the file names. but while piping the above command to the wc -l command its not working for me. ls -l... (5 Replies)
Discussion started by: Showdown
5 Replies

4. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

5. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

6. Shell Programming and Scripting

Omit Blank Lines while comparing two files.

Hello All, I am writting file comparison Utility and I have encountered such a senario where there are 2 files such as follows- 1#!/usr/local/bin/python 2 import gsd.scripts.admin.control.gsdPageEscalate 3.gsd.scripts.admin.control.gsdPageEscalate.main() 1 #!/usr/local/bin/python... (10 Replies)
Discussion started by: Veenak15
10 Replies

7. Shell Programming and Scripting

Command to list only files omit directories.

Hi All I am writting a script that does a comparison between files in 2 diffectent directories. To do this I need a command that will list out only the files in a give directory and omit any sub dorectories with that directory. But I am unable to find it. Please Help. I tried ls... (5 Replies)
Discussion started by: Veenak15
5 Replies

8. UNIX for Dummies Questions & Answers

Using cp -r command to selectively omit *.dat files while copying a directory.

Hi all, I want to copy a directory named Ec1 to another directory named Ec2, newly created. But Ec1 has a bunch of *.dat files and many many other kinds of files. Whle creating Ec2, I selectively want to omit the *.dat files since they are huge files of the order of 100 MBs and there are... (5 Replies)
Discussion started by: d_sai_kumar
5 Replies

9. Shell Programming and Scripting

Loop through files in dir, omit file with latest date

I want to loop through files in a directory but omit the file with the latest date in my list of files. How would I accomplish this? Thanks (2 Replies)
Discussion started by: stringzz
2 Replies

10. Shell Programming and Scripting

join cols from multi files into one file

Hi Fields in Files 1,2,3,4 are pipe"|" separated. Say I want to grep col1 from File1 col3 from File2 col4 from File3 and print to File4 in the following order: col3|col1|col4 what is the best way of doing this? Thanks (2 Replies)
Discussion started by: vbshuru
2 Replies
Login or Register to Ask a Question