concatenate all duplicate line in a file.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting concatenate all duplicate line in a file.
# 1  
Old 04-29-2008
Question concatenate all duplicate line in a file.

Hi All,

i have a zip file like the format

794051400123|COM|24|0|BD|R|99.98

794051413727|COM|11|0|BD|R|28.99

794051415622|COM|23|0|BD|R|28.99

883929004676|COM|0|0|BD|R|28.99
794051400123|MOM|62|0|BD|R|99.98

794051413727|MOM|4|0|BD|R|28.99

794051415622|MOM|80|0|BD|R|28.99

883929004676|MOM|0|0|BD|R|28.99

883929017164|MOM|0|0|BD|R|39.99
794051400123|RNO|73|0|BD|R|99.98

794051413727|RNO|8|0|BD|R|28.99

794051415622|RNO|84|0|BD|R|28.99

883929004676|RNO|0|0|BD|R|28.99
794051400123|SOM|25|0|BD|R|99.98

794051415622|SOM|80|0|BD|R|28.99

883929004676|SOM|0|0|BD|R|28.99

883929017164|SOM|0|0|BD|R|39.99
.................................

i need concate all duplicate line like

794051400123|COM|24|MOM|62|SOM|25|RNO|73
794051413727|COM|11||MOM|4|RNO|8
............
...

the file size is nearly 30 MB.So it takes lot of time.I have to do it with in 15 min.

please help me.

Thanks!
vaskar
# 2  
Old 04-29-2008
What happened to the dollar amount at the end of the line?
# 3  
Old 04-29-2008
Tested and working.

Code:
#!/usr/bin/env python

import sys

input = open(sys.argv[1], 'r')

dict = {}

for line in input:

    line = line.rstrip()

    if line.count("|"):
        line = line.split("|")
        if dict.has_key(line[0]):
            dict[line[0]].extend(line[1:3])
        else:
            dict[line[0]] = line[1:3]
            
input.close()

for key, value in dict.items():

    print("%s|%s" % (key, "|".join(value)))

# 4  
Old 04-30-2008
I don't know python script.Is it possible write simple shell script? I have done it but it takes lot of time.
# 5  
Old 04-30-2008
Gift horse and all that ...

Assuming you mean take all values with an identical first field, and paste together fields 2 and 3 from all those lines, something like

Code:
sort file |
awk -F '|' '$1 == prev { collect=collect "|" $2 "|" $3; next }
{ if (collect) print collect; prev = $1; collect = $1 "|" $2 "|" $3; next }
END { if (collect) print collect }'

The sort might kill you if it's a very small or old system, but having sorted input makes the awk script very simple.
# 6  
Old 04-30-2008
Thanks!! It is working....Smilie
# 7  
Old 08-25-2008
Hi,

I am facing a big problem...It gives me wrong result

like

|COM|24|MOM|62|SOM|25|RNO|73
794051413727|COM|11||MOM|4|RNO|8

pls help me.

Thanks!
vaskar
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Log file - Delete duplicate line & keep last date

Hello All ! I need your help on this case, I have a csv file with this: ITEM105;ARI FSR;2016-02-01 08:02;243 ITEM101;ARI FSR;2016-02-01 06:02;240 ITEM032;RNO TLE;2016-02-01 11:03;320 ITEM032;RNO TLE;2016-02-02 05:43;320 ITEM032;RNO TLE;2016-02-01 02:03;320 ITEM032;RNO... (2 Replies)
Discussion started by: vadim-bzh
2 Replies

2. Shell Programming and Scripting

Help with concatenate multiple line into one line

Hi, Do anybody experience how to concatenate multiple line into one line by using awk or perl command? Input file: >set1 QAWEQRQ@EWQEASED ASDAEQW QAWEQRQTQ ASRFQWRGWQ From the above Input file, it got 5 lines Desired output file: >set1... (6 Replies)
Discussion started by: perl_beginner
6 Replies

3. Shell Programming and Scripting

Honey, I broke awk! (duplicate line removal in 30M line 3.7GB csv file)

I have a script that builds a database ~30 million lines, ~3.7 GB .cvs file. After multiple optimzations It takes about 62 min to bring in and parse all the files and used to take 10 min to remove duplicates until I was requested to add another column. I am using the highly optimized awk code: awk... (34 Replies)
Discussion started by: Michael Stora
34 Replies

4. Shell Programming and Scripting

Concatenate small line with next line perl script

Hello to all, I'm new to perl, I have input file that contains the string below: 315350535ff450000014534130101ff4500ff45453779ff450ff45545f01ff45ff453245341ff4500000545000This string has as line separator "ff45". So, I want to print each line but the code below is not working. perl -pe '... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

5. Shell Programming and Scripting

Delete Duplicate line (not really) from the file

I need help in figuring out hoe to delete lines in a data file. The data file is huge. I am currently using "vi" to search and delete the lines - which is cumbersome since it takes lots of time to save that file (due to its huge size). Here is the issue. I have a data file with the following... (4 Replies)
Discussion started by: GosarJunk
4 Replies

6. Shell Programming and Scripting

awk concatenate every line of a file in a single line

I have several hundreds of tiny files which need to be concatenated into one single line and all those in a single file. Some files have several blank lines. Tried to use this script but failed on it. awk 'END { print r } r && !/^/ { print FILENAME, r; r = "" }{ r = r ? r $0 : $0 }' *.txt... (8 Replies)
Discussion started by: sdf
8 Replies

7. Shell Programming and Scripting

remove of duplicate line from a file

I have a file a.txt having content like deepak ram sham deepram sita kumar I Want to delete the first line containing "deep" ... I tried using... grep -i 'deep' a.txt It gives me 2 rows...I want to delete the first one.. + need to know the command to delete the line from... (5 Replies)
Discussion started by: saluja.deepak
5 Replies

8. Shell Programming and Scripting

How to find duplicate line in log file?

Hi guys, I'm really happy to find this forum I have a log file, and I have to find all lines that have "error" word, and then save this output in file, the output file has to have just only one line to any Duplicated lines and counter that show how many time this lines duplicated? I already... (2 Replies)
Discussion started by: wax_light
2 Replies

9. Shell Programming and Scripting

Concatenate strings line by line

Hi, I have a noob question . Can someone help me how to concatenate line by line using this variables? var1: Apple| Banana| var2: Red Yellow then how can I concatenate both line by line? in which the result would be: Apple|Red Banana|Yellow just to generate a row result i was... (6 Replies)
Discussion started by: hagdanan
6 Replies

10. Shell Programming and Scripting

duplicate line in a text file

i would like to scan file in for duplicate lines, and print the duplicates to another file, oh and it has to be case insensitive. example line1 line2 line2 line3 line4 line4 outputfile: line2 line4 any ideas (5 Replies)
Discussion started by: nixguy
5 Replies
Login or Register to Ask a Question