Multi thread awk command for faster performance


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Multi thread awk command for faster performance
# 15  
Old 04-30-2012
Hi Corona,
Thanks a lot.!! Smilie
Actually filename was not getting printed.
I had a similar problem for which i had found the soultion on this forum itself.
The code looks like this now.

Code:
awk -v ORS="" 'FNR==1 { printf(FILENAME"\n") }; /<.*/ , /.*<\/.*>/' *.txt

Big thanks to all you guys here.

Thanks,
Chetan.C
# 16  
Old 04-30-2012
Oh, I didn't realize you wanted the filename either, sorry. Smilie Your original didn't have that. This is why I prefer people tell me what they actually want, rather than "how do I make this piece of code run faster" -- I'm liable to make bad guesses about their requirements.

Glad you got it working!
# 17  
Old 04-30-2012
Hi.
Quote:
Originally Posted by Corona688
... This is why I prefer people tell me what they actually want, rather than "how do I make this piece of code run faster" -- I'm liable to make bad guesses about their requirements ...!
+1 ... cheers, drl
# 18  
Old 05-01-2012
Thanks Corona.Smilie

Yes will make sure i will post it right next time.
# 19  
Old 05-03-2012
Hi,

The script which im running is executed simultaneously for multiple folders.
So is there a chance of overlapping of data beacause of this multiple process?

Im seeing overlapping of data here.
am i doing something wrong?

Thanks,
Chetan.C

---------- Post updated at 06:04 AM ---------- Previous update was at 04:56 AM ----------

This is the code

Code:
 
#!/bin/bash 
 
function fast {
cd $1
awk -v ORS="" 'FNR==1 { printf("\n"FILENAME"\n") }; /<.*/ , /.*<\/.*>/' *.txt
}
for dir in `find /opt/app/idss/data01/cc002h/computes/test_scripts/Test_files/ -type d -name "TLT*"`; do
fast $dir &
done
wait


Last edited by chetan.c; 05-03-2012 at 10:05 AM.. Reason: Missed "&" while pasting the code.
# 20  
Old 05-03-2012
Quote:
Originally Posted by chetan.c
Hi,

The script which im running is executed simultaneously for multiple folders.
So is there a chance of overlapping of data beacause of this multiple process?
It's overlapping because they're running literally at the same time -- i.e, what you asked for. Save their output to separate files, combine them later. That will possibly negate any benefit of paralleling them, though, since you'll be doing two to three times as much disk access for the same amount of work!

I continue to not believe there's incredibly large benefits to parallelizing this. Having 9 programs instead of 1 won't let the 9 programs read from your disk 9 times faster. Measure what throughput you have already, first.
# 21  
Old 05-03-2012
Hi Corona,

Thanks a lot.
So overlapping does happen in these cases.

The problem here is i want to process a lot number of files so i wanted to try to make process more number of files.

I completely understand your take on I/O but have to speed up the processing,so im trying out the I/O itself.

Please let me know if there is something else that i can try.

Thanks a lot for you replies.

Thanks,
Chetan.C
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies

2. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>... (13 Replies)
Discussion started by: Peu Mukherjee
13 Replies

3. Shell Programming and Scripting

How to substract selective values in multi row, multi column file (using awk or sed?)

Hi, I have a problem where I need to make this input: nameRow1a,text1a,text2a,floatValue1a,FloatValue2a,...,floatValue140a nameRow1b,text1b,text2b,floatValue1b,FloatValue2b,...,floatValue140b look like this output: nameRow1a,text1b,text2a,(floatValue1a - floatValue1b),(floatValue2a -... (4 Replies)
Discussion started by: nricardo
4 Replies

4. Shell Programming and Scripting

Making a faster alternative to a slow awk command

Hi, I have a large number of input files with two columns of numbers. For example: 83 1453 99 3255 99 8482 99 7372 83 175 I only wish to retain lines where the numbers fullfil two requirements. E.g: =83 1000<=<=2000 To do this I use the following... (10 Replies)
Discussion started by: s052866
10 Replies

5. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

6. Shell Programming and Scripting

Multi thread shell programming

I have a unix directory where a million of small text files getting accumulated every week. As of now there is a shell batch program in place which merges all the files in this directory into a single file and ftp to other system. Previously the volume of the files would be around 1 lakh... (2 Replies)
Discussion started by: vk39221
2 Replies

7. Programming

Multi thread data sharing problem in uclinux

hello, I have wrote a multi thread application to run under uclinux. the problem is that threads does not share data. using the ps command it shows a single process for each thread. I test the application under Ubuntu 8.04 and Open Suse 10.3 with 2.6 kernel and there were no problems and also... (8 Replies)
Discussion started by: mrhosseini
8 Replies

8. UNIX for Dummies Questions & Answers

Which command will be faster? y?

i)wc -c/etc/passwd|awk'{print $1}' ii)ls -al/etc/passwd|awk'{print $5}' (4 Replies)
Discussion started by: karthi_g
4 Replies

9. Programming

Multi threading using posix thread library

hi all, can anyone tell me some good site for the mutithreading tutorials, its application, and some code examples. -sushil (2 Replies)
Discussion started by: shushilmore
2 Replies
Login or Register to Ask a Question