How to Remove duplicate value from file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to Remove duplicate value from file?
# 1  
Old 02-18-2013
How to Remove duplicate value from file?

if different branch code is available for same BIC code and one of the branch code is XXX.only one row will be stored and with branch code as XXX .rest of the rows for the BIC code will not be stored.

for example if $7 is BIC code and $8 is branch code
INPUT file are following
Code:
BI	M	IT056868	UNICREDI	VIA	LIBHIA	UNCRIT2B	XXX	UNCR
BI	M	US001165	NEUBERGE	LLC	NEWYRK	NEUBUS33	253	NEUB
HH	M	IND90909	SBILIFES	HNI	NANANN	GGGGGGGG	142	UICC
HH	M	MNOOOO	        98989089	IIC	UMNKSS	MOHAN844	XXX	KLKL
HH	M	MNKKKKKK	90909090	JMV	MNJKMN	MOHAN844	256	LOPD
HH	M	MKLJHJKK	KIKIKIKI	JKJ	NMHMNM	MOHAN844	456	LOPS

here $7 is having 3 bic code with same value"MOHAN844" with diff branch code in $8 and i want only BIC code with XXX value ...and if XXX is not present in $8 then i need all record..

please Help me..

Moderator's Comments:
Mod Comment Use code tags, thanks.

Last edited by zaxxon; 02-18-2013 at 07:35 AM.. Reason: code tags
# 2  
Old 02-18-2013
I think this is more correct
Code:
awk '{if ($8=="XXX") {arr[$7]=$0} else if (!arr[$7]) {arr[$7]=$0}} END {for (i in arr) print arr[i]}' infile
BI      M       IT056868        UNICREDI        VIA     LIBHIA  UNCRIT2B        XXX     UNCR
HH      M       MNOOOO          98989089        IIC     UMNKSS  MOHAN844        XXX     KLKL
BI      M       US001165        NEUBERGE        LLC     NEWYRK  NEUBUS33        253     NEUB
HH      M       IND90909        SBILIFES        HNI     NANANN  GGGGGGGG        142     UICC

get every single instance of $7, but keep the one line that have XXX in $8

---------- Post updated at 13:09 ---------- Previous update was at 12:49 ----------

Some updated.
Code:
awk '$8=="XXX" {arr[$7]=$0} !arr[$7] {arr[$7]=$0} END {for (i in arr) print arr[i]}' infile

# 3  
Old 02-18-2013
Hi zaxxon and jotne

Thank you very much for your quick reply ..
but for this input ..
Code:
BI    M    IT056868    UNICREDI    VIA    LIBHIA    UNCRIT2B    XXX    UNCR
BI    M    US001165    NEUBERGE    LLC    NEWYRK    NEUBUS33    XXX    NEUB
HH    M    IND90909    SBILIFES    HNI    NANANN    GGGGGGGG        UICC
HH    M    MNOOOOO    98989089    IIC    UMNKSS    MOHAN844    XXX    KLKL
HH    M    MNKKKKKK    90909090    JMV    MNJKMN    MOHAN844    256    LOPD
HH    M    MKLJHJKK    KIKIKIKI    JKJ    NMHlMM    MOHAN844    456    LOPS
HH    M    IND90909    SBILIFES    HNI    NANAAN    MSSMSSM    123    UIHH
HH    M    IND90909    SBILIFES    HNI    NAANAN    MSSSMSM    234    UIHH
HH    M    IND90909    SBILIFES    HNI    NANANAN    MSSSMSM    543    UIHH

but last three row is not printing...
condition is given is if among all the branch code ($8) for same BIC code($7) "XXX" is not present then i have to print as usual...duplicate value only remove when XXX is present in among all the Branch code for same Bic code...

Last edited by radoulov; 02-18-2013 at 08:30 AM..
# 4  
Old 02-18-2013
Use CODE TAGS
I get this output, seems for me correct:
Code:
BI      M       IT056868        UNICREDI        VIA     LIBHIA  UNCRIT2B        XXX     UNCR
HH      M       MNOOOOO         98989089        IIC     UMNKSS  MOHAN844        XXX     KLKL
HH      M       IND90909        SBILIFES        HNI     NAANAN  MSSSMSM         234     UIHH
HH      M       IND90909        SBILIFES        HNI     NANAAN  MSSMSSM         123     UIHH
BI      M       US001165        NEUBERGE        LLC     NEWYRK  NEUBUS33        XXX     NEUB
HH      M       IND90909        SBILIFES        HNI     NANANN  GGGGGGGG        UICC

Edit
Ok, if I understand you correctly.
If $8 do contain XXX, keep only the line with XXX.
If $8 does not have XXX, keep all line with same $7.
This will make it much more complicated since all can not fit into one simeple array

PS, always give output example.
PS, you miss one coloumn in row 3
PS, you have a possible type MSS MS etc

Last edited by Jotne; 02-18-2013 at 08:33 AM..
# 5  
Old 02-18-2013
This will sort your Branch codes alphabetically - not sure if you'd care about that...

Code:
crabshack:foo toki$ cat foo.txt
BI M IT056868 UNICREDI VIA LIBHIA UNCRIT2B XXX UNCR
BI M US001165 NEUBERGE LLC NEWYRK NEUBUS33 XXX NEUB
HH M IND90909 SBILIFES HNI NANANN GGGGGGGG UICC
HH M MNOOOOO 98989089 IIC UMNKSS MOHAN844 XXX KLKL
HH M MNKKKKKK 90909090 JMV MNJKMN MOHAN844 256 LOPD
HH M MKLJHJKK KIKIKIKI JKJ NMHlMM MOHAN844 456 LOPS
HH M IND90909 SBILIFES HNI NANAAN MSSMSSM 123 UIHH
HH M IND90909 SBILIFES HNI NAANAN MSSSMSM 234 UIHH
HH M IND90909 SBILIFES HNI NANANAN MSSSMSM 543 UIHH
crabshack:foo toki$ cat foo.sh
#!/bin/bash

file=foo.txt
unique_branch=$( awk '{print $7}' ${file} | sort | uniq )

for branch in ${unique_branch}; do
	grep_output=$( grep " ${branch} XXX " ${file} )
	if [ "$?" -eq "0" ]; then
		echo "${grep_output}"
	else
		grep " ${branch} " ${file}
	fi
done

exit 0
crabshack:foo toki$ ./foo.sh
HH M IND90909 SBILIFES HNI NANANN GGGGGGGG UICC
HH M MNOOOOO 98989089 IIC UMNKSS MOHAN844 XXX KLKL
HH M IND90909 SBILIFES HNI NANAAN MSSMSSM 123 UIHH
HH M IND90909 SBILIFES HNI NAANAN MSSSMSM 234 UIHH
HH M IND90909 SBILIFES HNI NANANAN MSSSMSM 543 UIHH
BI M US001165 NEUBERGE LLC NEWYRK NEUBUS33 XXX NEUB
BI M IT056868 UNICREDI VIA LIBHIA UNCRIT2B XXX UNCR

# 6  
Old 02-18-2013
but i want to keep all the record of same BIC code with duplicate branch code if XXX is not present among all the branch code of same BIC code..

---------- Post updated at 06:30 PM ---------- Previous update was at 06:10 PM ----------

Thanks zazzybob
but , if i am executing this query its running but output is not coming on the screen ..can u provide me in details or can u give me a query with output should store in file.
# 7  
Old 02-18-2013
As long as you specify the correct path to your input file in the file variable in the script, it should work. In the example below the input file is foo.txt and it exists in the same directory as foo.sh, which is also your present working directory.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate values in a column(not in the file)

Hi Gurus, I have a file(weblog) as below abc|xyz|123|agentcode=sample code abcdeeess,agentcode=sample code abcdeeess,agentcode=sample code abcdeeess|agentadd=abcd stereet 23343,agentadd=abcd stereet 23343 sss|wwq|999|agentcode=sample1 code wqwdeeess,gentcode=sample1 code... (4 Replies)
Discussion started by: ratheeshjulk
4 Replies

2. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

3. Shell Programming and Scripting

Remove the duplicate content in a file

Here is the contents of test.txt Dependencies Resolved Changes in packages about to be updated: ChangeLog for: 1:perl-Archive-Extract-0.38-131.el6_4.x86_64, - Resolves: #915692 - CVE-2013-1667 (DoS in rehashing code) Dependencies Resolved Changes in packages about to be updated: ... (5 Replies)
Discussion started by: ashokvpp
5 Replies

4. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

5. Shell Programming and Scripting

remove of duplicate line from a file

I have a file a.txt having content like deepak ram sham deepram sita kumar I Want to delete the first line containing "deep" ... I tried using... grep -i 'deep' a.txt It gives me 2 rows...I want to delete the first one.. + need to know the command to delete the line from... (5 Replies)
Discussion started by: saluja.deepak
5 Replies

6. Shell Programming and Scripting

Formatting a file - Remove Duplicate

Hi I have a file in the following format. Basically the file contains tablename and their aliases: TABLE1 TABLE1 A TABLE2 TABLE2 B TABLE3 TABLE4 TABLE4 C TABLE4 Upon formatting an sql statement I am getting such output. Problem: Whenever a tablename appears with alias, it has... (5 Replies)
Discussion started by: freakygs
5 Replies

7. Shell Programming and Scripting

Sort and Remove Duplicate on file

How do we sort and remove duplicate on column 1,2 retaining the record with maximum date (in feild 3) for the file with following format. aaa|1234|2010-12-31 aaa|1234|2010-11-10 bbb|345|2011-01-01 ccc|346|2011-02-01 bbb|345|2011-03-10 aaa|1234|2010-01-01 Required Output ... (5 Replies)
Discussion started by: mabarif16
5 Replies

8. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

9. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
Login or Register to Ask a Question