Samba trouble shoot / workaround ?

 
Thread Tools Search this Thread
Special Forums Windows & DOS: Issues & Discussions Samba trouble shoot / workaround ?
# 1  
Old 05-11-2011
Samba trouble shoot / workaround ?

Hello,

I've setup a ubuntu 10.04 server running samba 3.4.7 as domain controler / file server at a customer site, that works great most of the time but I face a random problem. Of course I'm never on the site when the problem occurs, so I cannot investigate in real time.

What happens is that one smb process is taking ~100% cpu time and keep opened files locked. I did not figure out how this occurs, so I would need to setup some higher debugging level in the samba log. However I'm pretty sure the problem is caused by the windows machines running a cad software (Solidworks) that create lots of temporary files that should be deleted when the assembly files are properly closed, however it seems it does not always properly close those files (and moreover the CAD softwares tends to crash regularly accoding to the customer, but nothing to do with samba, same things happens with windows file servers).

So what does it cause on the samba server : the load averrage is 1.0 or above, with one particular smb process using ~100% cpu time, and the process owner is "root" instead of the samba user that opened the files...

So far the only solution is to kill the "hung" process and delete all the messy temp files.

Since I will not be on this site in the near future to investigate this more in details, I was thinking o a script that would monitor the avg load, if the avg load is 1 or above, would try to identify the smb process causing the heavy load (based on owner = "root" and cpu time > threshould value) and kill it (kill -15 and kill -9 if the kill -15 fails after a timeout of lets say 1 minute).

I know the best solution would be to fix the root cause of the problem, but as I said I will not have time to investigate in the near future, so a workaround is needed.

I found various scripts, to monitor the avg load, some to kill a process, but I'm not sure how to properly identify the "hung" smb process. (there could be several of them and the cpu usage is of course depending of other running processes).

By the way if anyone has already faced a similar issue with samba, any advise would be appreciated.

Last edited by Manu.b; 05-11-2011 at 12:35 PM..
# 2  
Old 05-13-2011
Are you sure it's SMB that's hung and not the client? Does the problem persist even when the client machine is turned off?
# 3  
Old 05-13-2011
I ran some test yesterday at the customer site (and increased log level) :
- the CAD software crashed after saving an assembly and that did not cause the smb process to "hang", so I was wrong on this point.
- I realised I cannot use the avg. load as a trigger in the script, as the server will regularly go above 1.0 avg load when several users are saving/accessing large files at the same time or when the backup is running. I'll have to monitor the smb process (and now I reconsidering using such "kill process" script, I'll focus more on solving the root cause).

One other thing the customer complains about is that the smb server access is sometime very slow, while the avg load is quite low (no more than 0.1), independently from the client (there are some mix of win XP/32 and XP/64, seven/32 and seven/64, all user experience the same slowliness at the same time when this happens).
I was on site once when this happened and could not see any reason for this from the smb logs. Access to other file shares on other server was normal, no error reported on the network switch on the ports of the server nor the clients. The situation returned to normal few minutes after, and I ran some tests on the file server (copying/reading small/medium/larg files and measuring the bandwidth) and the performances were as expected (60~70 MB/s).
But I'm not sure that both problems are related.
I'm still convinced one cause of the problem is the CAD software as only the files accessed by this software are impacted, none of the office softwares or calculation software are experiencing the same issue. And of course they are using different flavor of the software (different versions and some 32 and/or 64 bits).
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. Programming

Msgget(2) returns 0 - a workaround fix

Greetings: I am posting this because my searches for this problem only came up with two posts and no helpful suggestions. I have a "solution" (read work-around hack) and have not tried yet to find a root cause, and may never because I am busy doing other things (read working to pay the bills). ... (10 Replies)
Discussion started by: mr_bandit
10 Replies

2. AIX

memory is more than 80% how will u trouble shoot it?

memory is more than 80% how will u trouble shoot it? (1 Reply)
Discussion started by: ramraj731
1 Replies

3. AIX

cpu used more than 80% how will u trouble shoot?

cpu used more than 80% how will u trouble shoot? (1 Reply)
Discussion started by: ramraj731
1 Replies

4. UNIX for Advanced & Expert Users

stuck in CLOSE_WAIT Solaris 10 - Patch and workaround

Solaris 10 Sparc: When you got a connection locking a tcp/port, and the status is CLOSE_WAIT (for ever :wall:), you just use the tcpdrop, to close the connection. This is a OS bug. I wrote the bug id bellow: BUG-ID 6468753 connections stuck in CLOSE_WAIT The patch that's correct the bug:... (0 Replies)
Discussion started by: thiagofborn
0 Replies

5. UNIX for Dummies Questions & Answers

Workaround for macros in sftp command

Hi, I've some existing scripts wherein am using ftp + .netrc. I've defined my macros in .netrc file. I want to switch to sftp now but it seems it doesn't support macros and .netrc and it gives "command invalid" error. Is there any other alternative? Note: I don't want help for... (1 Reply)
Discussion started by: ps51517
1 Replies

6. UNIX for Advanced & Expert Users

ldap+samba+gdm trouble

I'm having troubles setting up a client(with Ubuntu 8.10) for a ldap+samba server. I can't authenticate through the client with gdm, the messages I have in /etc/auth.log at the client is Dec 4 14:21:56 myuser-mydesktop gdm: nss_ldap: failed to bind to LDAP server ldap://192.168.0.1: Invalid... (5 Replies)
Discussion started by: capibolso
5 Replies
Login or Register to Ask a Question