Multithreading in reading file


 
Thread Tools Search this Thread
Top Forums Programming Multithreading in reading file
# 1  
Old 11-23-2011
Multithreading in reading file

Dear all,

I am having a huge XML file, as below structure
<EMPLOYEE>
<RECORD id =aaa>
<Salary>99999</Salary>
<section>ssss</section>
</RECORD>
<RECORD id =bbb>
<Salary>77777</Salary>
<section>ssss</section>
</RECORD>
</EMPLOYEE>

This is a 50 GB file I want to read this file in multithreading mode and write to a multiple files(one for each thread) with the Salary and section (salary~section) . Trying to do it in C++

After that i will have to merge this file, i am pretty new to threading concepts in C++. Can any one please suggest me a way in doing this

Thanks
Arun
# 2  
Old 11-23-2011
For such simple processing as this, a single-threaded program is going to be far faster than your disk by far.

You can only process a file as fast as the disk can read it no matter how many threads you have. Files don't have a multithreading "go-faster" mode.

What exactly are you trying to do? What do you mean by "merge"? Describe what you're doing in more detail, we may be able to help track down the slow step.

P.S: Is this really what your XML looks like, or did you pretty it up for posting?
# 3  
Old 11-23-2011
Thanks for your reply. below are the task I am asked to do

1.The file is about 100GB, I was asked to read the file by multiple thread using producer-consumer scenario and populate the salary~Section in multiple file to split the 100GB load.
2.After it done , I need to merge the file and I need to report the date in the order Section.


Thanks,
Arun
# 4  
Old 11-23-2011
Your CPU's going to be faster than your disk whether you have one thread or 100 threads. Disks do not have a multithreading "go faster" mode.

If you were doing some difficult processing on the data, multithreading would make sense -- a multicore CPU can process several sets of data at once -- but there isn't. The toughest part is just splitting into records, which can't be multithreaded anyway since you must read the file in order to figure out where records begin and end.

Sounds a bit contrived honestly. Is this homework?
# 5  
Old 11-23-2011
Thanks for your reply and BTW this is not a homework. If reading in multithreading in not a feasible solution can you suggest me a way and algorithm to do this in a single threaded mode

Thanks,
Arun
# 6  
Old 11-23-2011
The data you have right now is extremely easy. If the data's actually different that could make it very hard. So please answer this question from earlier:
Quote:
Originally Posted by Corona688
Is this really what your XML looks like, or did you pretty it up for posting?
# 7  
Old 11-23-2011
I know that reading the file record by record and using the xmlparcer we can print this into another file. The real thing is that i was given this file and asked to do in multithreading mode using XML parser. And then I will be given the original XML file which I have to modify this code and make that to work

Also to your note that this a 100GB file.so if we process by a single read it will take time , this is what my thinking I may be wrong.

Last edited by arunkumar_mca; 11-23-2011 at 05:11 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Programming

Help with multithreading

I take this question of the The Linux Programming Interface: A Linux and Unix System Programming page 652 exercise 30.1 I want someone to explain the under line statement because it sounds complex to me couldn't understand anything 30-1 Modify the program (thread_incr.c) so that each loop in... (3 Replies)
Discussion started by: fwrlfo
3 Replies

2. What is on Your Mind?

Alarm interrupt and multithreading

Hi Friends any know how became a friend in this Android Programming Language (0 Replies)
Discussion started by: ljarun
0 Replies

3. Programming

how to do udp broadcast with multithreading

hello to all i want to use multithreading to my UDP broadcast server client program. will anyone help me by proving C code. i am working in fedora. also my requirement is POSIX compliance.please help me..... (6 Replies)
Discussion started by: moti12
6 Replies

4. IP Networking

how to do udp broadcast with multithreading

hello to all i want to use multithreading to my UDP broadcast server client program. will anyone help me by proving C code. i am working in fedora. also my requirement is POSIX compliance.please help me..... (0 Replies)
Discussion started by: moti12
0 Replies

5. Programming

MultiThreading using Pthreads

Situation: i have multiple pthread_create calls like this: pthread_create(...., ThreadFunc1,.....); pthread_create(...., ThreadFunc2,.....); . . which i am using to create multiple threads.All the "ThreadFunc<i>" functions are actually calling same function "Receive" of a class using same... (3 Replies)
Discussion started by: Sastra
3 Replies

6. Shell Programming and Scripting

Multithreading program

Hi I need to insert 1million records into MySQL database, but it is taking lot of time as there is no bulk insert support. I want to spawn 10 processes which will insert 100k records each parallely. Can somebody help me with a example program to execute this task through shell scripting. (5 Replies)
Discussion started by: sach_roger
5 Replies

7. UNIX for Advanced & Expert Users

multithreading in UNIX

Hi, Can you please give me a suitable reference to learn multithreading programming in C in UNIX? Thanks (3 Replies)
Discussion started by: naan
3 Replies

8. Programming

multithreading on OSX

Hi all, I have a query about multithreading. What I would like to do is, at the start of my main update() function, start a couple of threads in parallel, once they are all complete carry on with my main update function. void update() { thread1->update(); // fluid solver ... (3 Replies)
Discussion started by: memoid
3 Replies

9. Programming

Multithreading in Pro*C

:confused: Hi! I have created a Multhreaded Application in Pro*C (using pthreads) with about 5 Threads running simultaneously. The Application is basically to Update a Centralized Table in Oracle, which updates different rows in the Table (Each Thread updates different rows!). The... (16 Replies)
Discussion started by: shaik786
16 Replies
Login or Register to Ask a Question