Remove duplicate lines from a 50 MB file size


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate lines from a 50 MB file size
# 1  
Old 11-29-2011
Question Remove duplicate lines from a 50 MB file size

hi,
Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file.
# 2  
Old 11-29-2011
https://www.unix.com/shell-programmin...ines-file.html
I didn't read the post completely. But you might get some tips.
This User Gave Thanks to balajesuri For This Post:
# 3  
Old 11-29-2011
Please, post what defines that a line is duplicated (an example line) and we will be able to help you! =o)
This User Gave Thanks to felipe.vinturin For This Post:
# 4  
Old 11-29-2011
the problem is that the file size is 100 MB - 150 MB

input file:
---------
1,2, ,TTT,DDFG,
1,2, ,TTT,DDFG,
1,2, ,TTT,DDFG,
7,8, ,TTT,DDFG,
1,2, ,TTT,DDFG,
1,2, ,TTT,DDFG,

output file should be like:
1,2, ,TTT,DDFG,
7,8, ,TTT,DDFG,
# 5  
Old 11-29-2011
The solution proposed in the post balajesuri wrote should solve your problem:
Code:
awk '!x[$0]++' file > file.new

Let us know the result!
# 6  
Old 11-29-2011
Thanks all of you...awk '!x[$0]++' file its working

Smilie
# 7  
Old 11-30-2011
Code:
sort -u <file_name>


Last edited by Franklin52; 11-30-2011 at 08:48 AM.. Reason: Please use code tags for code and data samples, thank you
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines, sort it and save it as file itself

Hi, all I have a csv file that I would like to remove duplicate lines based on 1st field and sort them by the 1st field. If there are more than 1 line which is same on the 1st field, I want to keep the first line of them and remove the rest. I think I have to use uniq or something, but I still... (8 Replies)
Discussion started by: refrain
8 Replies

2. Shell Programming and Scripting

Remove duplicate lines from file based on fields

Dear community, I have to remove duplicate lines from a file contains a very big ammount of rows (milions?) based on 1st and 3rd columns The data are like this: Region 23/11/2014 09:11:36 41752 Medio 23/11/2014 03:11:38 4132 Info 23/11/2014 05:11:09 4323... (2 Replies)
Discussion started by: Lord Spectre
2 Replies

3. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

4. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

5. Shell Programming and Scripting

Remove duplicate lines from first file comparing second file

Hi, I have two files with below data:: file1:- 123|aaa|ppp 445|fff|yyy 999|ttt|jjj 555|hhh|hhh file2:- 445|fff|yyy 555|hhh|hhh The records present in file1, not present in file 2 should be writtent to the out put file. output:- 123|aaa|ppp 999|ttt|jjj Is there any one line... (3 Replies)
Discussion started by: gani_85
3 Replies

6. Shell Programming and Scripting

remove duplicate lines from file linux/sh

greetings, i'm hoping there is a way to cat a file, remove duplicate lines and send that output to a new file. the file will always vary but be something similar to this: please keep in mind that the above could be eight occurrences of each hostname or it might simply have another four of an... (2 Replies)
Discussion started by: crimso
2 Replies

7. UNIX for Dummies Questions & Answers

How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file. I have a file having huge number of lines. i want to remove selected lines in it. And also if there exists duplicate lines, I want to delete the rest & just keep one of them. Please help me with any unix commands or even fortran... (7 Replies)
Discussion started by: reva
7 Replies

8. Shell Programming and Scripting

Command/Script to remove duplicate lines from the file?

Hello, Can anyone tell Command/Script to remove duplicate lines from the file? (2 Replies)
Discussion started by: Rahulpict
2 Replies

9. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

10. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
Login or Register to Ask a Question