I have a large file 1.5 gb and want to sort the file.
I used the following AWK script to do the job
The script works but it is very slow and takes over an hour to do the job. I suspect this is because the file is not sorted.
Hi,
I presume you mean you want to dedupe the file (because that is what your script does and that is in the title), not necessarily sort it.
You can try the difference between
and
The awk version is typically a lot faster because the file does not have to be sorted. Whether the file is sorted or not should make no difference for the awk command.
They both dedupe, but the second one sorts as well.
------- Edit ---------
I just did a test with a 1.6 GiB file and it took under 3 minutes to dedup it, so I would examine what you are doing exactly.
Are you deduping and then sorting?
Are you running out of memory and is your system paging/swapping?
Otherwise can you post the exact script/command that you are using?
Last edited by Scrutinizer; 10-13-2018 at 07:14 AM..
This User Gave Thanks to Scrutinizer For This Post:
Hi
on solaris and oracle 10g2, I have number of users created in Oracle, I wonder if I have a list of the usernames will it be possible to remove the users quickly ?
I want to keep the users access to system but oracle.
some thing like shell script may be ?:confused:
I am trying to... (4 Replies)
I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in... (6 Replies)
Hi guys,
i have a really big file, and i want to remove a specific line.
sed -i '5d' fileThis doesn't really work, it takes a lot of time...
The whole script is supposed to remove every word containing less than 5 characters and currently looks like this:
#!/bin/bash
line="1"... (2 Replies)
Hello Gurus,
O/S RHEL4
I have a requirement to compare two linux based directories for duplicate filenames and remove them. These directories are close to 2 TB each. I have tried running a:
Prompt>diff -r data1/ data2/
I have tried this as well:
jason@jason-desktop:~$ cat script.sh ... (7 Replies)
Hello. I was wondering if anyone could help. I have a file containing a large table in the format:
marker1 marker2 marker3 marker4
position1 position2 position3 position4
genotype1 genotype2 genotype3 genotype4
with marker being a name, position a numeric... (2 Replies)
Hi,
I have the following command in place
nawk -F, '!a++' file > file.uniq
It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error:
bash-3.2$ nawk -F, '!a++'... (17 Replies)
Hi everybody,
I am trying to remove bunch of lines from web pages between two tags:
one is <h1> and the other is <table
it looks like
<h1>Anniversary cards roses</h1>
many
lines here
<table summary="Free anniversary greeting cards." cellspacing="8" cellpadding="8" width="70%">my goal... (5 Replies)
Hello,
I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header :
#DATA
#VALID 1
and ends with a footer as shown below
#END
The data between the Header and the Footer consists of... (6 Replies)
Hello,
I have a script which removes duplicates in a database with a single delimiter
=
The script is given below:
# script to remove dupes from a row with structure word=word
BEGIN{FS="="}
{for(i=1;i<=NF;i++){a++;}for(i in a){b=b"="i}{sub("=","",b);$0=b;b="";delete a}}1
How do I modify... (6 Replies)