How to remove duplicated lines?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove duplicated lines?
# 1  
Old 06-24-2013
How to remove duplicated lines?

Hi, if i have a file like this:
Query=1
a
a
b
c
c
c
d
Query=2
b
b
b
c
c
e
.
.
.

How could i remove the duplicated lines under each query with the duplicated numbers like this:
Query=1
a 2
b 1
c 3
d 1
Query=2
b 3
c 2
e 1

Thanks!!!
# 2  
Old 06-24-2013
An awk approach:
Code:
awk '
        /Query/ {
                for ( k in A )
                        print k, A[k]
                split ( "", A )
                print $0
        }
        !/Query/ {
                A[$1]++
        }
        END {
                for ( k in A )
                        print k, A[k]
        }
' file

# 3  
Old 06-24-2013
uniq -c file will return the same kind of results as long as the list is sorted per Query:
Code:
$ uniq -c file
   1 Query=1
   2 a
   1 b
   3 c
   1 d
   1 Query=2
   3 b
   2 c
   1 e

# 4  
Old 06-24-2013
Sorry. Actually my file is not sorted.
The input should like this:

Query=1
a
c
a
d
c
b
c
Query=2
...

How should i do then?

Thanks!!


Quote:
Originally Posted by Subbeh
uniq -c file will return the same kind of results as long as the list is sorted per Query:
Code:
$ uniq -c file
   1 Query=1
   2 a
   1 b
   3 c
   1 d
   1 Query=2
   3 b
   2 c
   1 e

# 5  
Old 06-24-2013
You can just use Yoda's approach, should work fine.
# 6  
Old 06-24-2013
Thanks.
Could i ask another question?
If i have file1:
a
c
d
b

and file 2:
a 33
b 55
c 66
d 77

How could i replace file1 according to file2 and get the output like this:
33
66
77
55

?

Thanks.


Quote:
Originally Posted by Subbeh
You can just use Yoda's approach, should work fine.
# 7  
Old 06-24-2013
I think we've missed the point with the file note being wrapped in CODE tags. If this is one file, then I'm assuming that you want the count for each block. The output requested has the values for b and c in each section.

Not the prettiest solution, but this might work:-
Code:
#!/bin/ksh

mkdir /tmp/$$
while read line
do
   if [ "$line" != "${line#Query=}" ]
   then
      Section="${line#Query=}"
   else
      echo "$line" >> /tmp/$$/$Section
   fi
done < file_name

for file in /tmp/$$/*
do
   echo "Query=$file"
   uniq -c /tmp/$$/$file
done


Does that get you any closer? The output is the wrong way round for the counts, but that could be handled thus:-
Code:
....

for file in /tmp/$$/*
do
   echo "Query=$file"
   uniq -c /tmp/$$/$file | while read col1 col2
   do
      echo "$col2 $col1"
   done
done


I hope that this helps.


Robin
Liverpool/Blackburn
UK

---------- Post updated at 03:55 PM ---------- Previous update was at 03:52 PM ----------

Oh, an update before I posted!

Okay, well addressing the post number 6, try something like:-
Code:
#!/bin/ksh

while read line
do
   grep "^$line" file2 | cut -f2- -d" "
done < file1


Does that do it?



Robin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicated records and update last line record counts

Hi Gurus, I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record. I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count. awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies

2. AIX

Remove duplicated bootlist entries

Hello. I have a server with 2 boot disk but in the bootlist there are 5 paths of one disk but no path of the other. How can I remove paths from one disk to insert paths from the other disk? Thanks in advance. server074:root:/# bootlist -om normal hdisk0 blv=hd5 pathid=0 hdisk0... (7 Replies)
Discussion started by: Gabriander
7 Replies

3. Shell Programming and Scripting

How to remove duplicated column in a text file?

Dear all, How can I remove duplicated column in a text file? Input: LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC... (1 Reply)
Discussion started by: huiyee1
1 Replies

4. Shell Programming and Scripting

Merge files and remove duplicated rows

In a folder I'll several times daily receive new files that I want to combine into one big file, without any duplicate rows. The file name in the folder will look like e.q: MissingData_2014-08-25_09-30-18.txt MissingData_2014-08-25_09-30-14.txt MissingData_2014-08-26_09-30-12.txt The content... (9 Replies)
Discussion started by: Bergans
9 Replies

5. UNIX for Dummies Questions & Answers

Removing duplicated lines??

Hi Guys.. I have a problem for some reason my database has copied everything 4 times. My Database looks like this: >BAC233456 rhjieaheiohjteo tjtjrj6jkk6k6 j54ju54jh54jh >ANI124365 afrhtjykulilil htrjykuk rtkjryky ukrykyrk >BAC233456 rhjieaheiohjteo tjtjrj6jkk6k6 j54ju54jh54jh... (6 Replies)
Discussion started by: Iifa
6 Replies

6. Shell Programming and Scripting

Remove rows with first 4 fields duplicated in awk

Hi, I am trying to use awk to remove all rows where the first 4 fields are duplicates. e.g. in the following data lines 6-9 would be removed, leaving one copy of the duplicated row (row 5) Borgarhraun FH9822 ol24 FH9822_ol24_m20 ol Deformed c Borgarhraun FH9822 ol24 ... (3 Replies)
Discussion started by: tomahawk
3 Replies

7. Shell Programming and Scripting

Help with remove duplicated content

Input file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hcmv-UL70-3p hsa-4486 hcms-US25 hsa-360-5 hcms-US25 hsa-4 hcms-US25 hsa-458 hcms-US25 hsa-44812 . . Desired Output file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hsa-4486... (3 Replies)
Discussion started by: perl_beginner
3 Replies

8. Shell Programming and Scripting

remove duplicated columns

hi all, i have a file contain multicolumns, this file is sorted by col2 and col3. i want to remove the duplicated columns if the col2 and col3 are the same in another line. example fileA AA BB CC DD CC XX CC DD BB CC ZZ FF DD FF HH HH the output is AA BB CC DD BB CC ZZ FF... (6 Replies)
Discussion started by: kamel.seg
6 Replies

9. Shell Programming and Scripting

remove duplicated lines without sort

Hi Just wondering whether or not I can remove duplicated lines without sort For example, I use the command who, which shows users who are logging on. In some cases, it shows duplicated lines of users who are logging on more than one terminal. Normally, I would do who | cut -d" " -f1 |... (6 Replies)
Discussion started by: lalelle
6 Replies

10. Shell Programming and Scripting

remove duplicated xml record in a file under unix

Hi, If i have a file with xml format, i would like to remove duplicated records and save to a new file. Is it possible...to write script to do it? (8 Replies)
Discussion started by: happyv
8 Replies
Login or Register to Ask a Question