Remove duplicate lines after ignoring case and spaces between


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate lines after ignoring case and spaces between
# 1  
Old 08-07-2015
Remove duplicate lines after ignoring case and spaces between

Oracle Linux 6.5

Code:
$ cat someStrings.txt
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on  MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.emp to john;
grant select on scott.dept to hr;

If you ignore the case and the empty space between the characters , there are only 3 distinct lines in the above .txt file and they are

Code:
### Distinct output
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.dept to hr;

How can I remove the duplicate lines after ignoring the case and the empty space between the characters and get the above mentioned distinct output ?
# 2  
Old 08-07-2015
Any attempts from your side?

---------- Post updated at 11:41 ---------- Previous update was at 11:40 ----------

Howsoever, try
Code:
awk '
                        {(gsub(/ +/," "))}
!T[toupper($0)]++
' file
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.dept to hr;

This User Gave Thanks to RudiC For This Post:
# 3  
Old 08-07-2015
Thank you very much Rudic. Your command works (although I didn't understand anything in it ).
Need to do some googling on the basics of awk.


It can be put in one line as well as shown below . Right ?

Code:
# awk '{(gsub(/ +/," "))}!T[toupper($0)]++' somestrings.txt
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.dept to hr;

# 4  
Old 08-07-2015
Yes and it can even be reduced a little still and such that it also works with TAB characters:
Code:
awk '{$1=$1}!A[toupper($0)]++' file


Last edited by Scrutinizer; 08-08-2015 at 02:23 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 08-08-2015
Display encountered lines in its original form.
Code:
perl -ne 'print unless $seen{uc(s/\s+//gr)}++' someStrings.txt

# 6  
Old 08-10-2015
python

Code:
cache={}
with open("a.txt") as file:
	for line in file:
		line=line.replace("\n","")
		key=" ".join([i.lower() for i in filter(lambda x: x!="",line.split(" "))])
		if key not in cache:
			print(key)
			cache[key]=1

# 7  
Old 08-10-2015
Quote:
Originally Posted by kraljic
Thank you very much Rudic. Your command works (although I didn't understand anything in it ).
Need to do some googling on the basics of awk.


It can be put in one line as well as shown below . Right ?

Code:
# awk '{(gsub(/ +/," "))}!T[toupper($0)]++' somestrings.txt
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.dept to hr;

Hello Kraljic,

Following is the explanation for command mentioned by RudiC sir.
Code:
 awk '
                        {(gsub(/ +/," "))}         ##### gsub is used for substitute operation, like here we are replacing the spaces which are unequal to a single spaces, like in row number 2 you have showed us in input space is NOt a single space. So that we can make equal length in between fields of each line.
!T[toupper($0)]++                                  ##### toupper is a utility by which we can covert any string/line to completly capital form. Here we are creating an array named T whose index is the complete line which  has been changed toupper cases now, !T[toupper($0)]++ means if the line haven't occur even a single time than make that specfici line's count as 1 and ! sign before aray T makes sure no lines should have count more than 1, so that we can have unique single time lines only. As we know awk works on 
                                                         condition and action format, means if any condition is RUE then action mentioned next to it should be perfoemed, here when any lines comes first time into array T then it will print it too because we haven't given any action and default action in awk is to print.
' file                                             ##### mentioning input file name here

Hope this helps.

Thanks,
R. Singh
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies

2. Shell Programming and Scripting

Remove lines containing 2 or more duplicate strings

Within my text file i have several thousand lines of text with some lines containing duplicate strings/words. I would like to entirely remove those lines which contain the duplicate strings. Eg; One and a Two Unix.com is the Best This as a Line Line Example duplicate sentence with the word... (22 Replies)
Discussion started by: martinsmith
22 Replies

3. Shell Programming and Scripting

Count duplicate lines ignoring certain columns

I have this structure: col1 col2 col3 col4 col5 27 xxx 38 aaa ttt 2 xxx 38 aaa yyy 1 xxx 38 aaa yyy I need to collapse duplicate lines ignoring column 1 and add values of duplicate lines (col1) so it will look like this: col1 col2 col3 col4 col5 27 xxx 38 aaa ttt ... (3 Replies)
Discussion started by: coppuca
3 Replies

4. UNIX for Dummies Questions & Answers

Remove Duplicate Lines

Hi I need this output. Thanks. Input: TAZ YET FOO FOO VAK TAZ BAR Output: YET VAK BAR (10 Replies)
Discussion started by: tara123
10 Replies

5. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

6. Shell Programming and Scripting

sed ignoring case for search but respecting case for subtitute

Hi I want to make string substitution ignoring case for search but respecting case for subtitute. Ex changing all occurences of "original" in a file to "substitute": original becomes substitute Origninal becomes Substitute ORIGINAL becomes SUBSTITUTE I know this a little special but it's not... (1 Reply)
Discussion started by: kmchen
1 Replies

7. UNIX for Dummies Questions & Answers

deleteing duplicate lines sing uniq while ignoring a column

I have a data set that has 4 columns, I want to know if I can delete duplicate lines while ignoring one of the columns, for example 10 chr1 ASF 30 15 chr1 ASF 20 5 chr1 ASF 30 6 chr2 EBC 15 4 chr2 EBC 30 ... I want to know if I can delete duplicate lines while ignoring column 1, so the... (5 Replies)
Discussion started by: japaneseguitars
5 Replies

8. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

9. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies

10. UNIX for Dummies Questions & Answers

Removing duplicate lines ignore case

hi, I have the following input in file: abc ab a AB b c a C B When I use uniq -u file,the out put file is: abc ab AB c v B C (17 Replies)
Discussion started by: hellsd
17 Replies
Login or Register to Ask a Question