Formatting a file - Remove Duplicate


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Formatting a file - Remove Duplicate
# 1  
Old 06-07-2011
Question Formatting a file - Remove Duplicate

Hi I have a file in the following format. Basically the file contains tablename and their aliases:

TABLE1
TABLE1 A
TABLE2
TABLE2 B
TABLE3
TABLE4
TABLE4 C
TABLE4

Upon formatting an sql statement I am getting such output.

Problem: Whenever a tablename appears with alias, it has repeated entries in the file. One without alias, and one with alias. I want to remove the first occurance of the tablename (without aliases). but there might be some entries with tablename without any alias, don't wanna delete those. also there can be a repetition of same table, and one could be with alias one could be without alias.

Solution: Basically just want to delete the line which preceeds with tablename with alias. so the output should be like:

TABLE1 A
TABLE2 B
TABLE3
TABLE4 C
TABLE4

Hope I am clear enough.
Your help would be much appreciated. Thanks.
# 2  
Old 06-07-2011
Java

Try sort -u

Code:
shell>less file
TABLE1
TABLE1 A
TABLE2
TABLE2 B
TABLE3
TABLE4
TABLE4 C
TABLE4

shell> sort -u file

TABLE1
TABLE1 A
TABLE2
TABLE2 B
TABLE3
TABLE4
TABLE4 C

I noticed that in your output you dropped TABLE1 but kept TABLE4. Was that intentional?
# 3  
Old 06-07-2011
try this.. not tested..

Code:
awk '{ first_line = $0; getline next_line; if (index($next_line,$first_line) > 0) print $next_line; else print $0; }' filename

# 4  
Old 06-07-2011
Quote:
Originally Posted by ni2
Try sort -u

I noticed that in your output you dropped TABLE1 but kept TABLE4. Was that intentional?
Yeah... Second call for TABLE4 is not having any alias, so that entry shouldn't be deleted
# 5  
Old 06-07-2011
try sed..
Code:
  sed 'N;/\(.*\)\n\(\1 .\)/{s//\2/}' inputfile | sort -u

This User Gave Thanks to michaelrozar17 For This Post:
# 6  
Old 06-07-2011
Quote:
Originally Posted by freakygs
Yeah... Second call for TABLE4 is not having any alias, so that entry shouldn't be deleted
Try the sed example michaelrozar17 posted. Looks like it meets your requirement.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

2. Shell Programming and Scripting

Remove the duplicate content in a file

Here is the contents of test.txt Dependencies Resolved Changes in packages about to be updated: ChangeLog for: 1:perl-Archive-Extract-0.38-131.el6_4.x86_64, - Resolves: #915692 - CVE-2013-1667 (DoS in rehashing code) Dependencies Resolved Changes in packages about to be updated: ... (5 Replies)
Discussion started by: ashokvpp
5 Replies

3. Shell Programming and Scripting

How to Remove duplicate value from file?

if different branch code is available for same BIC code and one of the branch code is XXX.only one row will be stored and with branch code as XXX .rest of the rows for the BIC code will not be stored. for example if $7 is BIC code and $8 is branch code INPUT file are following... (9 Replies)
Discussion started by: mohan sharma
9 Replies

4. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

5. Shell Programming and Scripting

remove of duplicate line from a file

I have a file a.txt having content like deepak ram sham deepram sita kumar I Want to delete the first line containing "deep" ... I tried using... grep -i 'deep' a.txt It gives me 2 rows...I want to delete the first one.. + need to know the command to delete the line from... (5 Replies)
Discussion started by: saluja.deepak
5 Replies

6. Shell Programming and Scripting

Sort and Remove Duplicate on file

How do we sort and remove duplicate on column 1,2 retaining the record with maximum date (in feild 3) for the file with following format. aaa|1234|2010-12-31 aaa|1234|2010-11-10 bbb|345|2011-01-01 ccc|346|2011-02-01 bbb|345|2011-03-10 aaa|1234|2010-01-01 Required Output ... (5 Replies)
Discussion started by: mabarif16
5 Replies

7. Shell Programming and Scripting

Remove all instances of duplicate records from the file

Hi experts, I am new to scripting. I have a requirement as below. File1: A|123|NAME1 A|123|NAME2 B|123|NAME3 File2: C|123|NAME4 C|123|NAME5 D|123|NAME6 1) I have 2 merge both the files. 2) need to do a sort ( key fields are first and second field) 3) remove all the instances... (3 Replies)
Discussion started by: vukkusila
3 Replies

8. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

9. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
Login or Register to Ask a Question