Just wanted to check , if the below sort be faster , if I give the temp folder path or should I change the path to some other folder?
See, every single disk can do only one thing at a time: reading a byte somewhere means it can't read (or write) a byte somewhere else at that time.
Temporary files are (at least) written once and (at least) read once, your input file is (at least) read once and your output file is written once. For all these tasks you want to involve different disks, so that, while one file is being read or written, another might also be read or written at the same time.
This should answer your question: you want (ideally) for all three involved files separate disks. Perhaps the fastest disk should be assigned to the temporary file because it is probably read and written the most often.
I could see NLAP_TEMP is fastest directory , I added the same in sort command , but it seems not working, the awk command just takes 7 minutes , the issue is only with sort, which is taking long time .
---------- Post updated at 05:30 AM ---------- Previous update was at 03:29 AM ----------
Please let me know how can I make sort faster, the file size is 4 GB and the sorting is taking 3 hours. we have only one disk in TEMP folder and 50GB space.
You received several hints in this thread on how to accelerate the sort process. What are the results of either? Did you consult man sort for additional options?
Did you check that the directories are on different physical disks? By that, you need to check that they are separate filesystems and where those filesystems are built from, not just that the directories are different. What you may think of as a single update to a file will cause multiple updates on the disk. There is at least:-
the actual disk block for the data
the file's inode update with the last modified time
the directory (for a new file or rename) and it's inode
the filesystem superblock (usually plural) when you get a new disk block from the free list by creating or extending the file
You also have to consider contention from other processing and if this is using NFS mounted filesystems, then you have the overhead of network traffic to bring into it.
I don't know how you have your disks provisioned. Can you explain it? If it is SAN, then that might be more difficult to speed up and depends on the disk at the back-end, the fibre capacity etc. At the other extreme, a PC with a single disk is just going to have contention even if you have a large disk cache.
Overall, if you have lots of data it is just going to take a while. I doubt I will be able to better the suggestions from my fellow learned members. How big is your input file anyway? (in bytes and records) If you try to do too much processing in one chunk, then you may also exhaust memory and cause your server to page/swap. Keeping this to discreet steps may alleviate that bottleneck but may cost more in disk IO. It is difficult to tell.
If you don't mind the 13th field still being there (given that they are all to be 9999) you might be able to save a little by stripping it right back and doing this:-
That -u flag on the sort saves the process and therefore all the memory (risk of paging/swapping) and passing the data between them, so that might help.
I hope that this is useful, but there will always be a limit we will hit.
I have nginx web server logs with all requests that were made and I'm filtering them by date and time.
Each line has the following structure:
127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br)
These text files are... (21 Replies)
I have script like below, who is picking number from one file and and searching in another file, and printing output.
Bu is is very slow to be run on huge file.can we modify it with awk
#! /bin/ksh
while read line1
do
echo "$line1"
a=`echo $line1`
if
then
echo "$num"
cat file1|nawk... (6 Replies)
Hi,
I have a large number of input files with two columns of numbers.
For example:
83 1453
99 3255
99 8482
99 7372
83 175
I only wish to retain lines where the numbers fullfil two requirements. E.g:
=83
1000<=<=2000
To do this I use the following... (10 Replies)
awk "/May 23, 2012 /,0" /var/tmp/datafile
the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file.
now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Hi,
I have a script below for extracting xml from a file.
for i in *.txt
do
echo $i
awk '/<.*/ , /.*<\/.*>/' "$i" | tr -d '\n'
echo -ne '\n'
done
.
I read about using multi threading to speed up the script.
I do not know much about it but read it on this forum.
Is it a... (21 Replies)
Hi all,
In bash scripting, I use to read files:
cat $file | while read line; do
...
doneHowever, it's a very slow way to read file line by line.
E.g. In a file that has 3 columns, and less than 400 rows, like this:
I run next script:
cat $line | while read line; do ## Reads each... (10 Replies)
Hi All,
I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Hi,
Can any one help me out in solving the problem i have a linux database server it is tooo slow that i am unable to open even the terminial is there any solution to get rid of this problem.How to make this server faster.
Thanks & Regards
Venky (0 Replies)
hii everyone ,
i have a file in which i have line numbers.. file name is file1.txt
aa bb cc "12" qw
xx yy zz "23" we
bb qw we "123249" jh
here 12,23,123249. is the line number
now according to this line numbers we have to print lines from other file named... (11 Replies)
One of our servers runs Solaris 8 and does not have "ls -lh" as a valid command. I wrote the following script to make the ls output easier to read and emulate "ls -lh" functionality. The script works, but it is slow when executed on a directory that contains a large number of files. Can anyone make... (10 Replies)