Need to optimize the loop to load parallelly


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need to optimize the loop to load parallelly
# 1  
Old 05-22-2017
Need to optimize the loop to load parallelly

Hi,

I'm trying to load the huge amount of records in the database. I did the successful load but it took more time to load as numbers are huge.

Here what I have -

1. create a database table (t) with 2 columns- Not an issue

2. create a script to load huge amount of data - Here I would like to the parallel execution for faster load.

Code:
perl -e 'for ($i=0; $i<=4294968230; $i++) { printf "%d,%d,\n", $i,$i*10; }' | vsql -p $PORT -c "COPY t FROM STDIN  DELIMITER ',' direct";

The main command that does the load for me is
Code:
perl -e 'for ($i=0; $i<=4294968230; $i++) { printf "%d,%d,\n", $i,$i*10; }'

I would like my load to be faster, either by running this parallelly or using some other technique.

Please suggest something and let me know the best possible way to optimize it.

Thanks,
Mannu

Last edited by Mannu2525; 05-22-2017 at 09:01 AM.. Reason: To rectify the number of columns used
# 2  
Old 05-22-2017
So, you first have to consider why your data load is deemed to be slow. Each record being written will need several things to happen to actually get it committed into the database. You will be building a very large memory buffer to shovel this through a pipe, so this may slow you down a little. There are global server resources to consider too, so I will deal with them next:-

Available memory
If there is little free memory, you server might start paging/swapping when you drive your import. This is costly to your elapse time because disk access is relatively slow compared in real memory

Disk access time
Using local disks can have various issues with performance contention for actual media, controllers etc. There is also the issue that updates to local disks (not SAN or RAID protected) are likely to be slower because the OS will have to wait for the data to really be written to the physical media. Using a SAN or RAID controller usually has the benefits of write cache, so the actual physical write is done asynchronously and the OS can carry on.

You can also cause contention if you are reading a large file and the updates are being written to the same physical (local) disk or, of course, other processing is running.

--
When considering the database, you have to consider the operations it will have to perform based on how you have it operating.

Appropriate Memory allocations
There are probably several setting for this, but an important one is that there is very likely to be a fixed/real/pinned value of memory that can never be swapped. Setting it too low can cause your database to have performance problems as it wrestles with the limitations, however setting it too high can cause memory issues for other needs on your server and slow the whole OS down (usually exhibited with paging)

If there are no resource issues, the DB will still need to:-
  • read/accept the record
  • validate the input matches the target table
  • check for constraints
  • initiate the disk write to the log(s) to allow either transaction rollback or redo/recovery
  • disk write to the table
  • disk write to related indexes
  • disk write to related tables (triggers, etc.)
There's lots for a database to do to create 4,294,968,230 records, so yes, it will take a while. Why so many records anyway?

I'm not sure if you can give database hints on loading data like you can when parallelising SELECT statements, but splitting your data and forcing in several parallel updates may not make any improvement.


I hope this gives you something to consider,
Robin
# 3  
Old 05-23-2017
I am going to guess: your code is an example but not what you are really loading.

If really it is what you are loading, then it appears to be a total waste of resources.
You could write a simple database procedure or trigger to take a number and multiply it by 10.

There are database optimizations you can easily do, example:
Does the table have index(es)? If yes, turn them off during the load. After, rebuild them.

You will have to check resources during a trail run - rbatte explained that really well.

Since you do not mention what UNIX and db you are using, we cannot help you on measuring resource usage.

If you want that help, tell us about your system: OS, shell, and database (like oracle, sybase, mysql, etc.).

It is up to you to help us to help you.
This User Gave Thanks to jim mcnamara For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Looking to optimize code

Hi guys, I feel a bit comfortable now doing bash scripting but I am worried that the way I do it is not optimized and I can do much better as to how I code. e.g. I have a whole line in a file from which I want to extract some values. Right now what I am doing is : STATE=`cat... (5 Replies)
Discussion started by: Junaid Subhani
5 Replies

2. Shell Programming and Scripting

Optimize my mv script

Hello, I'm wondering if there is a quicker way of doing this. Here is my mv script. d=/conversion/program/out cd $d ls $d > /home/tempuser/$$tmp while read line ; do a=`echo $line|cut -c1-5|sed "s/_//g"` b=`echo $line|cut -c16-21` if ;then mkdir... (13 Replies)
Discussion started by: whegra
13 Replies

3. Shell Programming and Scripting

To execute scripts parallelly

Hi I have set two set of scripts sets in a file which perform similar operations but with different script names for e.g.: 1st set of script1.txt: 1.sh 2.sh 3.sh 4.sh 2nd set of script2.txt: 1_1.sh 2_1.sh 3_3.sh 4_4.sh I want to execute these set of scripts parallelly in such... (16 Replies)
Discussion started by: rohit_shinez
16 Replies

4. Shell Programming and Scripting

Executes scripts parallelly based on their success

Hi Team , I have one Master.sh file which call X,Y,Z scripts , but here X may call again some sub scripts X_sub1.sh , X_sub2.sh Y calls Y_sub1.sh,Y_sub2.sh and similarly Z script also . Now requirement is Both X and Y should execute parallel bcz X and Y are independent... (9 Replies)
Discussion started by: chandini
9 Replies

5. Shell Programming and Scripting

Search patterns in multiple logs parallelly.

Hi All, I am starting a service which will redirect its out put into 2 logs say A and B. Now for succesful startup of the service i need to search pattern1 in log A and pattern2 in log B which are writen continuosly. Now my requirement is to find the patterns in the increasing logs A and B... (19 Replies)
Discussion started by: Girish19
19 Replies

6. UNIX for Dummies Questions & Answers

VPS has load 200, httpd load no activity, netstat nothing

Hello, on my hostserver i see one VPS of mine got load of 200.00 and netstat nothing (not a single blank line on netstat command) after some time, netstat started showing connections, but i see no excessive IP connections. tail -f /var/log/httpd/access_log shows no activity /var/log/messages ;... (1 Reply)
Discussion started by: postcd
1 Replies

7. Shell Programming and Scripting

pl help me to Optimize the given code

Pl help to me to write the below code in a simple way ... i suupose to use this code 3 to 4 places in my makefile(gnu) .. **************************************** @for i in $(LIST_A); do \ for j in $(LIST_B); do\ if ;then\ echo "Need to sign"\ echo "List A = $$i , List B =$$j"\ ... (2 Replies)
Discussion started by: pk_arun
2 Replies

8. Shell Programming and Scripting

Need help in wrting Load Script for a Load-Resume type of load.

hi all need your help. I am wrting a script that will load data into the table. then on another load will append the data into the existing table. Regards Ankit (1 Reply)
Discussion started by: ankitgupta
1 Replies

9. Shell Programming and Scripting

Running three scripts parallelly

Hi All, We have three shell script batch, which extract data from three different systems(oracle, db2, db2/400). By running each shell script batch, the data is extracted from respective systems. while the batch is running, job date, system_name, start_date and end_date will be inserted into... (1 Reply)
Discussion started by: anwarsait
1 Replies
Login or Register to Ask a Question