Sponsored Content
Top Forums Shell Programming and Scripting Remove duplicate lines after ignoring case and spaces between Post 302951722 by RavinderSingh13 on Monday 10th of August 2015 04:28:37 AM
Old 08-10-2015
Quote:
Originally Posted by kraljic
Thank you very much Rudic. Your command works (although I didn't understand anything in it ).
Need to do some googling on the basics of awk.


It can be put in one line as well as shown below . Right ?

Code:
# awk '{(gsub(/ +/," "))}!T[toupper($0)]++' somestrings.txt
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.dept to hr;

Hello Kraljic,

Following is the explanation for command mentioned by RudiC sir.
Code:
 awk '
                        {(gsub(/ +/," "))}         ##### gsub is used for substitute operation, like here we are replacing the spaces which are unequal to a single spaces, like in row number 2 you have showed us in input space is NOt a single space. So that we can make equal length in between fields of each line.
!T[toupper($0)]++                                  ##### toupper is a utility by which we can covert any string/line to completly capital form. Here we are creating an array named T whose index is the complete line which  has been changed toupper cases now, !T[toupper($0)]++ means if the line haven't occur even a single time than make that specfici line's count as 1 and ! sign before aray T makes sure no lines should have count more than 1, so that we can have unique single time lines only. As we know awk works on 
                                                         condition and action format, means if any condition is RUE then action mentioned next to it should be perfoemed, here when any lines comes first time into array T then it will print it too because we haven't given any action and default action in awk is to print.
' file                                             ##### mentioning input file name here

Hope this helps.

Thanks,
R. Singh
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing duplicate lines ignore case

hi, I have the following input in file: abc ab a AB b c a C B When I use uniq -u file,the out put file is: abc ab AB c v B C (17 Replies)
Discussion started by: hellsd
17 Replies

2. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies

3. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

4. UNIX for Dummies Questions & Answers

deleteing duplicate lines sing uniq while ignoring a column

I have a data set that has 4 columns, I want to know if I can delete duplicate lines while ignoring one of the columns, for example 10 chr1 ASF 30 15 chr1 ASF 20 5 chr1 ASF 30 6 chr2 EBC 15 4 chr2 EBC 30 ... I want to know if I can delete duplicate lines while ignoring column 1, so the... (5 Replies)
Discussion started by: japaneseguitars
5 Replies

5. Shell Programming and Scripting

sed ignoring case for search but respecting case for subtitute

Hi I want to make string substitution ignoring case for search but respecting case for subtitute. Ex changing all occurences of "original" in a file to "substitute": original becomes substitute Origninal becomes Substitute ORIGINAL becomes SUBSTITUTE I know this a little special but it's not... (1 Reply)
Discussion started by: kmchen
1 Replies

6. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

7. UNIX for Dummies Questions & Answers

Remove Duplicate Lines

Hi I need this output. Thanks. Input: TAZ YET FOO FOO VAK TAZ BAR Output: YET VAK BAR (10 Replies)
Discussion started by: tara123
10 Replies

8. Shell Programming and Scripting

Count duplicate lines ignoring certain columns

I have this structure: col1 col2 col3 col4 col5 27 xxx 38 aaa ttt 2 xxx 38 aaa yyy 1 xxx 38 aaa yyy I need to collapse duplicate lines ignoring column 1 and add values of duplicate lines (col1) so it will look like this: col1 col2 col3 col4 col5 27 xxx 38 aaa ttt ... (3 Replies)
Discussion started by: coppuca
3 Replies

9. Shell Programming and Scripting

Remove lines containing 2 or more duplicate strings

Within my text file i have several thousand lines of text with some lines containing duplicate strings/words. I would like to entirely remove those lines which contain the duplicate strings. Eg; One and a Two Unix.com is the Best This as a Line Line Example duplicate sentence with the word... (22 Replies)
Discussion started by: martinsmith
22 Replies

10. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies
TEXT2PS(L)																TEXT2PS(L)

NAME
text2ps - convert text files to PostScript SYNOPSIS
text2ps [ options ] [ files ] DESCRIPTION
Text2ps reads the input files (standard input if none are specified) and produces PostScript code which, when fed to a PostScript printer, will print the files. With text2ps it is possible to select any font, point size and number of columns. Options and files can be inter- mixed on the command line. Options are effective for all following files until they are overridden. Options Here follows a list of options that text2ps recognizes. Most numeric arguments are significant to one decimal place. Options are evalu- ated from left to right. Later options override earlier ones. -# n Print n copies of each page. (Default 1.) -c n Print in n columns. (Default 1.) -f font Print using font font. (Default Courier.) -p n Print with point size n. (Default 9.) -v n Use a vertical spacing of n points. If the vertical spacing is set to 0, the spacing will be 1.2 times the point size. (Default 0.) -l n Print n lines per column. When the line count is 0, print as many lines as will fit. (Default 0.) -r [p|l] Set the orientation to either portrait mode (p) or landscape mode (l). (Default p.) -b [+|-] Set page break mode. An argument + will force new files to be always printed on a new page (this is the default). After - new files will be put on the same page if there are still empty columns and the number of columns, the orientation or the number of copies didn't change. New files always start new columns. (Default -.) -mt n The top margin is n points. (Default 63.) -mb n The bottom margin is n points. (Default 63.) -ml n The left margin is n points. (Default 59.) -mr n The right margin is n points. (Default 59.) -mg n The inter-column gap is n points. (Default 25.) -t [+|-] If the argument is + the name of the file being printed will be printed on each page. If the argument is - the file name will not be printed. -t + implies -b +. -T text Print text as title on each page. This implies -t - and -b +. This option can be switched off by specifying -t - or -t +. (Default no title.) -F font Set the title font to font. (Default Helvetica.) -P n Set the title point size to n. (Default 12.) -B n Draw borders around each page. The number n specifies how to draw borders. N can have any of the following values or-ed in: 1 Draw a line along the left of the page. 2 Draw a line along the bottom of the page. 4 Draw a line along the right of the page. 8 Draw a line along the top of the page. 16 Draw a line between columns. This line does not connect to the lines along the top or bottom. 32 Draw a connecting line between the line between columns and the line along the top. 64 Draw a connecting line between the line between columns and the line along the bottom. When n is 0, no border lines are drawn. (Default no bordering lines.) -w n Tab stops are set every n spaces. Set the width of the TAB character. (Default 8.) -1 Sets up options to print in one column in portrait mode with the Courier font, so that you get 66 lines on a page. Equivalent to specifying the options -c 1 -f Courier -p 9 -v 0 -r p -l 0 -mt 63 -mb 63 -ml 59 -mr 59. This is the default. -2 Sets up options to print in two columns in landscape mode with the Courier font, so that you get two 66-line columns on a page. Equivalent to specifying the options -c 2 -f Courier -p 6 -v 0 -r l -l 0 -mt 63 -mb 63 -ml 59 -mr 59 -mg 25. Together with the -1 option, this is probably the most useful option. The name - means standard input. BUGS
Too many options. There is no way to specify where the title will be placed. If the font being used is not a constant width font and there are other characters than just tabs and spaces in front of a tab, the next character may not align properly. TEXT2PS(L)
All times are GMT -4. The time now is 07:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy