Avoid Duplicates in a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Avoid Duplicates in a file
# 1  
Old 03-24-2008
Avoid Duplicates in a file

Hi Gurus,

I had a question regarding avoiding duplicates.i have a file abc.txt
abc.txt
-------
READER_1_1_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_3_1> Sun Mar 23 23:52:48 2008
READER_1_3_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_2_1> Sun Mar 23 23:52:48 2008
READER_1_2_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_1_1> Sun Mar 23 23:52:48 2008
READER_1_1_1> HIER_28058 XML Reader Error
READER_1_2_1> Sun Mar 23 23:52:48 2008
READER_1_2_1> HIER_28058 XML Reader Error
READER_1_3_1> Sun Mar 23 23:52:48 2008
READER_1_3_1> HIER_28058 XML Reader Error.


I will get this file when i egrep the session log file for error specific messages.

here i want to remove the repeated lines in this file?
i am try to use uniq command, but here READER_1_3_1> will be varying in the above abc.txt file
i.e., in every line we have 1,2,3 in place of READER_1_**_1>.

So i am not getting the desired output.

Can any one let me know how to get the requirement done.

My output should be
READER_1_1_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_1_1> HIER_28058 XML Reader Error

Smilie

Thanks & Regards,
San
# 2  
Old 03-24-2008
use :
Code:
sort file1 | uniq -u > new-file-only-unique-lines

"uniq -q" prints only the unique lines in a file.
# 3  
Old 03-24-2008
My suggestion would be to write a Perl script incorporating Regular Expression to solve your issue. It may be easier to pull out your unique values that way.
# 4  
Old 03-24-2008
My question is: Does every message have 2 duplicates in input data?

You gave abc.txt example, where every messages is duplicated with 'READER_1_2_1>' and 'READER_1_3_1>' headers added. If all logs are presented in such way you can:

Code:
grep 'error message string' abc.txt | grep 'READER_1_1_1>'

I could misunderstand you. It can be too obvious answer for your question. I don't know, if "header" (before '>' sign) changes.
# 5  
Old 03-25-2008
Quote:
Originally Posted by sysgate
use :
Code:
sort file1 | uniq -u > new-file-only-unique-lines

"uniq -q" prints only the unique lines in a file.
Thanks for ur reply
But here all the lines are unique since reader_1 will be changed Reader_2.only that variation will be there,

so if i use uniq command it not getting the desired output.
# 6  
Old 03-25-2008
Quote:
Originally Posted by xonicman
My question is: Does every message have 2 duplicates in input data?

You gave abc.txt example, where every messages is duplicated with 'READER_1_2_1>' and 'READER_1_3_1>' headers added. If all logs are presented in such way you can:

Code:
grep 'error message string' abc.txt | grep 'READER_1_1_1>'

I could misunderstand you. It can be too obvious answer for your question. I don't know, if "header" (before '>' sign) changes.
Thnaks for ur reply

No since it reads from 3 sources it will be giving out 3 errors.But the source is same so i get 3 similar messages with only reader variation.
Here in reader place i may get any other word .
# 7  
Old 03-25-2008
All you need to trim a file down to unique lines is:

Code:
sort -u file

However, your lines aren't straight duplicates, so you'll need to pre-process the file. If your lines will all be the same except for the initial "reader" part, then just strip out any occurrence of READER_X_X_X.

ShawnMilo
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Write only changes to file - avoid duplicates

I want to create a file, to save a list of fail2ban blocked ip addresses. So I thought I'd create a loop that will check with fail2ban every minute, and write the ip addresses to a file. while true; do echo $(fail2ban-client status asterisk-iptables | grep 'IP list' | sed 's/.*://g' | sed -e... (4 Replies)
Discussion started by: aristosv
4 Replies

2. Shell Programming and Scripting

Avoid overwriting backup file when multiple entries need to replace in one file input from another

Hello, I have been working on script in which search and replace the multiple pattern. 1. update_params.sh read the multiple pattern from input file ParamMapping.txt(old_entry|New_entry) and passing this values one by one to change_text.sh 2. change_text.sh read... (0 Replies)
Discussion started by: ketanraut
0 Replies

3. Programming

[c]Why first file is creating after the second. How to avoid

Hi, My Code is as below: nbECRITS = fwrite(strstr(data->buffer, ";") + 1, sizeof(char), (data->buffsize) - LEN_NOM_FIC, fic_sortie); fclose(fic_sortie); sprintf(PATH_BALISE, "%s.balise", PATH); fic_balise_data = fopen(PATH_BALISE, "a+"); if (fic_balise_data == NULL) {... (1 Reply)
Discussion started by: ezee
1 Replies

4. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

5. Shell Programming and Scripting

How to avoid Newline character in generated text file?

Hi All, Just need small help in resolving the special new line character in generated output file. In one of my shell script I am using following lines to get the spool file (i.e. sfile.txt) and AAA_XXXX_p111_n222.txt AAA_YYYY_p111_n222.txt Here assuming v_pnum="p111" v_nid="n222" ... (1 Reply)
Discussion started by: shekharjchandra
1 Replies

6. Shell Programming and Scripting

avoid open file to check field.

Hi Everyone, # cat a.txt 94,aqqc,62345907, 5,aeec,77, # cat 1.pl #!/usr/bin/perl use strict; use warnings; use Date::Manip; open(my $FA, "/root/a.txt") or die "$!"; while(<$FA>) { chomp; my @tmp=split(/\,/, $_); if (index($tmp, "qq") ne -1) { ... (4 Replies)
Discussion started by: jimmy_y
4 Replies

7. Shell Programming and Scripting

Avoid file creation in a script...achive same result

Guys following lines help me in getting numbers from PID column ,to be thrown into first column of a CSV file. COLUMNS=2047 /usr/bin/ps -eo pid,ppid,uid,user,args | grep -v "PID" > /tmp/masterPID.txt cat /tmp/masterPID.txt|while read line do PID=`echo $line|awk '{print $1}'` echo "$PID"... (4 Replies)
Discussion started by: ak835
4 Replies

8. HP-UX

Parameter to avoid file being deleted by SAM

Good afternoon. I am a newbie. We just had a potentially big problem (negated to having good backups). Basically, there is an option in SAM, to delete all the data from the system that a user ever created. Lo and behold, silly me, I choose that option, and all sorts of needed files... (5 Replies)
Discussion started by: instant000
5 Replies

9. Shell Programming and Scripting

How to avoid a temp file

Hi all. I want to check the free space on a given FS and process the output. Right now, I'm using a temp file to avoid using df twice. This is what I'm doing #!/usr/bin/ksh ... df -k $FS_NAME > $TMP_FILE 2>&1 if ]; then RESULT="CRITICAL - $(cat $TMP_FILE)" else cat $TMP_FILE | ...... (3 Replies)
Discussion started by: fox1212
3 Replies

10. Ubuntu

Avoid creating temporary files on editing a file in Ubuntu

Hi, My ubuntu flavor always create temporary files having filename followed by ~ on editing. For eg: if I am editing a file called "sip.c", automatically a temporary (bkup) file is getting created with the name "sip.c~". How to avoid this file creation? (7 Replies)
Discussion started by: royalibrahim
7 Replies
Login or Register to Ask a Question