remove one of each similar lines in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting remove one of each similar lines in a file
# 1  
Old 11-23-2010
remove one of each similar lines in a file

Hello folks
I have a question for you gurus of sed or grep (maybe awk, but I would prefer the first two)
I have a file (f1) that says:
(actually, these are not numbers but md5sum, but for simplicity, let's assume these numbers.)
Code:
1
2
3
4
5

And I have a file (f2) that says
Code:
1|a
1|b
1|c
2|d
2|e
2|f
2|g
3|h
3|i
4|j
4|k
4|l
5|m
5|n

I would like to keep either
- one of each line starting with the same number
Code:
1|a
2|d
3|h
4|j
5|m

- or all other lines starting with the same number (I'll chose the most efficient)
Code:
1|b
1|c
2|e
2|f
2|g
3|i
4|k
4|l
5|n

I already accomplished miracles with sed and grep on previous steps of my final script, so I hope someone will get something simple for this problem.

Here is what I get with bash (It works but is slow...). Only f2 is needed in this example
Code:
while read l; do
    n="$md5"; md5="${l%%|*}"
    [ "$n" = "$md5" ] && { echo "$l" >> "$TMP1"; }
done < "f2"

In this script, all the second and later lines of similar md5 go to $TMP1 file to be processed later.

All datas are sorted by 1st field

Thank you in advance.

Last edited by tukuyomi; 11-23-2010 at 02:06 PM.. Reason: Added informations
# 2  
Old 11-23-2010
Code:
awk -F\| 'A[$1]++==1' f1 f2

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 11-23-2010
Wow I think I'll keep this awk solution after all Smilie
Thank you Scrutinizer
# 4  
Old 11-23-2010
Assuming both files are presorted just like in your example
here is... the hard way ...

Code:
# cat f1
1
2
3
4
5
# cat f2
1|azoep
1|fskl
1|gjldfk
1|gjldiropez
1|gmlds
2|dsfgk
2|jgkfdls
3|fjsdk
3|jkflsdql
# >output
# ksh mtst
# cat output
1|azoep
2|dsfgk
3|fjsdk

Code:
# cat mtst
exec 3<f1
exec 4<f2
read -u3 n
read -u4 l
while [[ -n "$n" && -n "$l" ]]
do
        if [[ $n = "${l%%\|*}" ]]
        then
                echo $l >>output
                read -u3 n
                read -u4 l
        elif [[ $n < "${l%%\|*}" ]]; then
                read -u3 n
        else
                read -u4 l
        fi
done
exec 3<&-
exec 4<&-
#

If any idea to make the code quicker , please suggest, i am curious to see how fast we can tweak it (keeping at shell level)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Want to remove all lines but not latest 50 lines from a file

Hi, I have a huge file which has Lacs of lines. File system got full. I want your guys help to suggest me a solution so that I can remove all lines from that file but not last 50,000 lines. I want solution which can remove lines from existing file so that I can have some space left with. (28 Replies)
Discussion started by: prashant2507198
28 Replies

2. Shell Programming and Scripting

Reducing text file using similar lines

Hello, I am a java programmer but want to try unix for a purpose where I need to reduce a file using its first field.. Here is the sample data: admin;2;0;; admission;8;0;; aman;1;0;; caroline;0;4;; cook;0;4;; cook;2;0;; far;0;3;; far;1;5;; I am explaining the dataset first. There... (5 Replies)
Discussion started by: shekhar2010us
5 Replies

3. Shell Programming and Scripting

removing lines with similar values from file

Hello, got a file with this structure: 33274 171030 02/29/2012 37897 P_GEH 2012-02-29 10:31:26 33275 171049 02/29/2012 38132 P_GEH 2012-02-29 10:35:27 33276 171058 02/29/2012 38515 P_GEH 2012-02-29 10:43:26 33277 170748 02/29/2012 40685 P_KOM ... (3 Replies)
Discussion started by: krecik28
3 Replies

4. Shell Programming and Scripting

extracting lines from a file with similar first name

consider i have two files cat onlyviews1.sql CREATE VIEW V11 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V22 AS SELECT id, name, FROM etc etc WHERE etc etc; CREATE VIEW V33 AS (10 Replies)
Discussion started by: vivek d r
10 Replies

5. UNIX for Dummies Questions & Answers

Matching and reporting near-similar lines in a file

Hi, I have a file with the lines as below: C_10_A05_T7 C_10_A06_SP6 C_10_B05_SP6 C_10_B05_T7 C_10_B01_SP6 C_10_B01_T7 C_12_G07_SP6 C_12_G11_SP6 C_12_G11_T7 C_2_H18_T7 C_2_I02_SP6 C_2_I02_T7 C_2_I13_SP6 C_2_I17_SP6 The four segments of each line are connected by '_' symbols. I... (7 Replies)
Discussion started by: Fahmida
7 Replies

6. UNIX for Dummies Questions & Answers

merge lines within a file that start with a similar pattern

Hello! i have a text file.. which contains the data as follows i want to merge the declarations lines pertaining to one datatype in to a single line as follows i've searched the forum for help.. but couldn't find much help.. how can i do this?? (1 Reply)
Discussion started by: a_ba
1 Replies

7. Shell Programming and Scripting

Counting similar lines from file UNIX

I have a file which contains data as below: nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/common/index.jsf nbk1j7o pageName=/jsp/common/index.jsf nbk1wqe... (6 Replies)
Discussion started by: mohsin.quazi
6 Replies

8. Infrastructure Monitoring

Remove Similar entries in a File

-------------------------------------------------------------------------------- I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared... (2 Replies)
Discussion started by: Nysif Steve
2 Replies

9. Infrastructure Monitoring

Remove Similar Lines from a File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead... (4 Replies)
Discussion started by: Nysif Steve
4 Replies

10. Shell Programming and Scripting

How to sort a file and then print similar lines once

Hi! I have a trouble with the sort and the uniq. I know I have to use them, I just have trouble with putting them in the right order. I have a text file with unsorted lines (each line has a few words, the first word in the line is a number.). I need to sort this file in order to be... (6 Replies)
Discussion started by: shira
6 Replies
Login or Register to Ask a Question