Delete only if duplicates found in each record


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete only if duplicates found in each record
# 1  
Old 08-06-2014
Delete only if duplicates found in each record

Hi,

i have another problem. I have been trying to solve it by myself but failed.

inputfile
Code:
;;
ID	T08578
NAME	T08578
SBASE	30696
EBASE	32083
TYPE	P
func	just test
func	chronology
func	cholesterol
func	null
INT	30765-37333
INT	37154-37318
Link	5546
Link	8142
Link	5485
@@
ID	T09378
NAME	T09378
SBASE	35275
EBASE	35282
TYPE	W
func	Dito and barney
func	null
CODE	2.6
INT	21783-35274
Link	3899
@@
ID	T09386
NAME	T09386
SBASE	3505918
EBASE	3506467
TYPE	R
func	null
INT	5974-6088
@@
ID	T08594
NAME	T08594
SBASE	95156
EBASE	95174
TYPE	W
func	null
INT	9585-9562
@@


I need to remove any duplicate for "func" with "null" (red color) if there is another func in each record separated by "@@". If there is no duplicate (like the blue ones), then, it will remains as it is. The output should be:-

Code:
;;
ID	T08578
NAME	T08578
SBASE	30696
EBASE	32083
TYPE	P
func	just test
func	chronology
func	cholesterol
INT	30765-37333
INT	37154-37318
Link	5546
Link	8142
Link	5485
@@
ID	T09378
NAME	T09378
SBASE	35275
EBASE	35282
TYPE	W
func	Dito and barney
CODE	2.6
INT	21783-35274
Link	3899
@@
ID	T09386
NAME	T09386
SBASE	3505918
EBASE	3506467
TYPE	R
func	null
INT	5974-6088
@@
ID	T08594
NAME	T08594
SBASE	95156
EBASE	95174
TYPE	W
func	null
INT	9585-9562
@@

I tried couple of ways to do it but the results is either it deleted all with "func" or it will remains as it is. The last code that i tried is the one that i got from one of the thread here but it just deleted "func" with "null" for the first record only. The rest it remains the same. The code is below:-

Code:
awk '!NF{$0=x}1' inputfile | awk 'gsub (/func/, "&") > 1 {sub (/\n func\tnull\n/,"\n")}1' RS=\n

I tried to use sed too as i wanted to update the input file instead of creating another output file. but i did not get what i want. Any help is appreciated.
# 2  
Old 08-06-2014
Will the func<TAB>null always follow other func occurrences? Then try
Code:
awk '/@@/{CNT=0} /func.*null/ && CNT {next} /func/ {CNT++}1' file

This User Gave Thanks to RudiC For This Post:
# 3  
Old 08-06-2014
Quote:
Originally Posted by RudiC
Will the func<TAB>null always follow other func occurrences? Then try
Code:
awk '/@@/{CNT=0} /func.*null/ && CNT {next} /func/ {CNT++}1' file

Hi RudiC,

Thanks so much for your prompt response. The func <tab> null is always on the last occurrence for duplicate func in each record. Tried your code, but it removes func<tab>null in other records that are not duplicate.
Thanks.

Last edited by redse171; 08-06-2014 at 03:31 PM.. Reason: mistakes. remove the results as the codes work great.
# 4  
Old 08-06-2014
Not when I tried it:
Code:
;;
ID    T08578
NAME    T08578
SBASE    30696
EBASE    32083
TYPE    P
func    just test
func    chronology
func    cholesterol
INT    30765-37333
INT    37154-37318
Link    5546
Link    8142
Link    5485
@@
ID    T09378
NAME    T09378
SBASE    35275
EBASE    35282
TYPE    W
func    Dito and barney
CODE    2.6
INT    21783-35274
Link    3899
@@
ID    T09386
NAME    T09386
SBASE    3505918
EBASE    3506467
TYPE    R
func    null
INT    5974-6088
@@
ID    T08594
NAME    T08594
SBASE    95156
EBASE    95174
TYPE    W
func    null
INT    9585-9562
@@

So - where's the difference between "your file" and "your file copied and pasted to my system"?
This User Gave Thanks to RudiC For This Post:
# 5  
Old 08-06-2014
Quote:
Originally Posted by RudiC
Not when I tried it:
Code:
;;
ID    T08578
NAME    T08578
SBASE    30696
EBASE    32083
TYPE    P
func    just test
func    chronology
func    cholesterol
INT    30765-37333
INT    37154-37318
Link    5546
Link    8142
Link    5485
@@
ID    T09378
NAME    T09378
SBASE    35275
EBASE    35282
TYPE    W
func    Dito and barney
CODE    2.6
INT    21783-35274
Link    3899
@@
ID    T09386
NAME    T09386
SBASE    3505918
EBASE    3506467
TYPE    R
func    null
INT    5974-6088
@@
ID    T08594
NAME    T08594
SBASE    95156
EBASE    95174
TYPE    W
func    null
INT    9585-9562
@@

So - where's the difference between "your file" and "your file copied and pasted to my system"?
I am so sorry, the code worked great! it was a typo when i write the code into my real file. So, pls ignore my previous post (will delete it though). Thanks a ton for you help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To Delete the duplicates using Part of File Name

I am using the below script to delete duplicate files but it is not working for directories with more than 10k files "Argument is too long" is getting for ls -t. Tried to replace ls -t with find . -type f \( -iname "*.xml" \) -printf '%T@ %p\n' | sort -rg | sed -r 's/* //' | awk... (8 Replies)
Discussion started by: gold2k8
8 Replies

2. Shell Programming and Scripting

Delete duplicates in CA bundle

I do have a big CA bundle certificate file and each time if i get request to add new certificate to the existing bundle i need to make sure it is not present already. How i can validate the duplicates. The alignment of the certificate within the bundle seems to be different. Example: Cert 1... (7 Replies)
Discussion started by: diva_thilak
7 Replies

3. Shell Programming and Scripting

Remove Duplicates on multiple Key Columns and get the Latest Record from Date/Time Column

Hi Experts , we have a CDC file where we need to get the latest record of the Key columns Key Columns will be CDC_FLAG and SRC_PMTN_I and fetch the latest record from the CDC_PRCS_TS Can we do it with a single awk command. Please help.... (3 Replies)
Discussion started by: vijaykodukula
3 Replies

4. Shell Programming and Scripting

delete from line and remove duplicates

My Input.....file1 ABCDE4435 Connected to 107.71.136.122 (SubNetwork=ONRM_RootMo_R SubNetwork=XYVLTN29CRBR99 MeContext=ABCDE4435 ManagedElement=1) ABCDE4478 Connected to 166.208.30.57 (SubNetwork=ONRM_RootMo_R SubNetwork=KLFMTN29CR0R04 MeContext=ABCDE4478 ManagedElement=1) ABCDE4478... (5 Replies)
Discussion started by: pareshkp
5 Replies

5. Shell Programming and Scripting

Fastest way to delete duplicates from a large filelist.....

OK I have two filelists...... The first is formatted like this.... /path/to/the/actual/file/location/filename.jpg and has up to a million records The second list shows filename.jpg where there is more then on instance. and has maybe up to 65,000 records I want to copy files... (4 Replies)
Discussion started by: Bashingaway
4 Replies

6. Shell Programming and Scripting

prompt to delete each record when pattern is found

Hello!. I am working on a very simple program and I have been trying different things. This is so far what I have done and there is one small detail that still does not work. It finds all the records in a phonebook per say: ./rem Susan More than one match; Please select the one to remove: ... (3 Replies)
Discussion started by: bartsimpsong
3 Replies

7. Shell Programming and Scripting

Delete duplicates via script?

Hello, i have the following problem: there are two folders with a lot of files. Example: FolderA contains AAA, BBB, CCC FolderB contains DDD, EEE, AAA How can i via script identify AAA as duplicate in Folder B and delete it there? So that only DDD and EEE remain, in Folder B? Thank you... (16 Replies)
Discussion started by: Y-T
16 Replies

8. Shell Programming and Scripting

how can I delete duplicates in the log?

I have a log file and I am trying to run a script against it to search for key issues such as invalid users, errors etc. In one part, I grep for session closed and get a lot of the same thing,, ie. root username etc. I want to remove the multiple root and just have it do a count, like wc -l ... (5 Replies)
Discussion started by: taekwondo
5 Replies

9. Shell Programming and Scripting

An interactive way to delete duplicates

1)I am trying to write a script that works interactively lists duplicated records on certain field/column and asks user to delete one or more. And finally it deletes all the records the used has asked for. I have an idea to store those line numbers in an array, not sure how to do this in... (3 Replies)
Discussion started by: chvs2000
3 Replies

10. Shell Programming and Scripting

How can I parse a record found in /etc/passwd into variables?

I am working with the Oracle 10.2.0.3 job scheduler on Solaris 10, and unfortunately, the scheduler executes scripts in such a way that several default shell environment variables are not defined. For example, $HOME, $USER, and $LOGNAME are missing. How can I parse the appropriate record in... (7 Replies)
Discussion started by: shew01
7 Replies
Login or Register to Ask a Question