Sponsored Content
Special Forums IP Networking Serious un-pingable stumper of a problem... Post 302161471 by jjinno on Thursday 24th of January 2008 07:02:13 PM
Old 01-24-2008
Serious un-pingable stumper of a problem...

I have been busting my head over a network issue at work recently. I believe the problem to be in the L2 domain, but "the powers that be" believe that it looks more like a server port related problem. And the biggest problem of all is that EVERYBODY in the Engineering Department uses this file-server...

The symptoms are as follows:
  • A samba connection is shared out from "FileServ_1" to my desktop. While having a file open for read/write, I will lose the file (aka. the persistence of connection), and will be prompted by my App to save a local copy (lucky me).
  • From that point, I immediately (being prepared) switch to a shell in which I kick off a ping to "FileServ_1"... then another shell I bypass DNS & go straight for the IP... then another shell I have a remote connection from a totally different subnet, also pinging "FileServ_1"... and finally a trace-route running from both my desktop and the remote connection.
  • From ALL pings I receive timeouts & from all traces I find the last hop is the dead-zone.

Although "the powers that be" make a strong case for their point, I have noticed "network topology changes" being reported at the switch (indicating a loop?) and I have been able to serial-console "FileServ_1" and watch it while it is supposedly "down"... only problem is: It never thinks that it is down.
  • Eth1 (till last week was the only port plugged in) never reports any issues (at least not at any default log levels) and from what I can see there is no way to tell if the ICMP packets are dying on the way in or on the way out.

Finally, as if things were not bad enough, they decided last week to make Eth0 a redundant fail-over for Eth1... which amazingly seemed to lighten the problem from "a few minutes of un-ping" to "a few seconds of un-ping"... and now, instead of happening 10 times a day it happens only once or twice.

So first things first (unless you have better ideas), I am wondering how to turn up the logging of ICMP (thats kernel level right?) and possibly Eth* logging so that I don't have to resort to sniffing for the entire day till it happens. Cause if nothing else, I would like to diagnose this problem correctly and get something done about it.

Any Help?
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

SSH Problem auth problem

Hi, Just recently we seem to be getting the following error message relating to SSH when we run the UNIX script in background mode: warning: You have no controlling tty. Cannot read confirmation.^M warning: Authentication failed.^M Disconnected; key exchange or algorithm negotiation... (1 Reply)
Discussion started by: budrito
1 Replies

2. Shell Programming and Scripting

ssh script problem problem

Hi Please help me with the following problem with my script. The following block of code is not repeating in the while loop and exiting after searching for first message. input_file ========== host001-01 host001-02 2008-07-23 13:02:04,651 ConnectionFactory - Setting session state... (2 Replies)
Discussion started by: pcjandyala
2 Replies

3. Solaris

problem in finding a hardware problem

Hi I am right now facing a strange hardware problem. System get booted with the following error: Fatal Error Reset CPU 0000.0000.0000.0003 AFSR 0100.0000.0000.0000 SCE AFAR 0000.07c6.0000.1000 SC Alert: Host System has Reset It happen 4 or 5 times and get the same error every time.I... (8 Replies)
Discussion started by: girish.batra
8 Replies

4. Shell Programming and Scripting

need to check whether a sever is pingable or not inside the script

Hi, need to write a script which will check number of ip address are able to ping or not .. (2 Replies)
Discussion started by: mail2sant
2 Replies

5. AIX

user login problem & Files listing problem.

1) when user login to the server the session got colosed. How will resolve? 2) While firing the command ls -l we are not able to see the any files in the director. but over all view the file system using the command df -g it is showing 91% used. what will be the problem? Thanks in advance. (1 Reply)
Discussion started by: pernasivam
1 Replies

6. UNIX for Dummies Questions & Answers

host not booting, but is pingable

hi there. im having a problem with a host at the moment, i can ping the host and responds with host is alive. i cannot telnet, rsh or anything else to it... it tells me connection refused. when i run a ckport on it i get answers from : *** successful - smtp *** successful - sunrpc ... (6 Replies)
Discussion started by: brian112
6 Replies

7. Solaris

[Help] - 2 VM solaris pingable

Hi, I have 2 VM of Solaris ( 2nd one full clone ) 1st VM - 192.168.1.30 2nd VM - 192.168.1.31 My need : ping both VM from each other I have added host entry in /etc/hosts of both server but unable to ping each other from solaris console... Pls advice (4 Replies)
Discussion started by: saurabh84g
4 Replies

8. IP Networking

Problem with forwarding emails (SPF problem)

Hi, This is rather a question from a "user" than from a sys admin, but I think this forum is apropriate for the question. I have an adress with automatic email forwarding and for some senders (two hietherto), emails are bouncing. This has really created a lot of problems those two time so I... (0 Replies)
Discussion started by: carwe
0 Replies

9. Shell Programming and Scripting

validating(pingable or not) remote ip address in shell script

i need to verify whether the ip adress given as input to the shell script is pingable or not... that is whether the ip is alive and responding.. ping $ip_adress the above wont work in script because the execution is continuous... so the shell script keeps will dwell in this pinging process...... (8 Replies)
Discussion started by: vivek d r
8 Replies

10. UNIX for Dummies Questions & Answers

sed Or Grep Problem OR Terminal Problem?

I don't know if you guys get this problem sometimes at Terminal but I had been having this problem since yesterday :( Maybe I overdid the Terminal. Even the codes that used to work doesn't work anymore. Here is what 's happening: * I wanted to remove lines containing digits so I used this... (25 Replies)
Discussion started by: Nexeu
25 Replies
IBV_UC_PINGPONG(1)						   USER COMMANDS						IBV_UC_PINGPONG(1)

NAME
ibv_uc_pingpong - simple InfiniBand UC transport test SYNOPSIS
ibv_uc_pingpong [-p port] [-d device] [-i ib port] [-s size] [-r rx depth] [-n iters] [-l sl] [-e] HOSTNAME ibv_uc_pingpong [-p port] [-d device] [-i ib port] [-s size] [-r rx depth] [-n iters] [-l sl] [-e] DESCRIPTION
Run a simple ping-pong test over InfiniBand via the reliable connected (RC) transport. OPTIONS
-p, --port=PORT use TCP port PORT for initial synchronization (default 18515) -d, --ib-dev=DEVICE use IB device DEVICE (default first device found) -i, --ib-port=PORT use IB port PORT (default port 1) -s, --size=SIZE ping-pong messages of size SIZE (default 4096) -r, --rx-depth=DEPTH post DEPTH receives at a time (default 1000) -n, --iters=ITERS perform ITERS message exchanges (default 1000) -l, --sl=SL use SL as the service level value of the QP (default 0) -e, --events sleep while waiting for work completion events (default is to poll for completions) SEE ALSO
ibv_rc_pingpong(1), ibv_ud_pingpong(1), ibv_srq_pingpong(1) AUTHORS
Roland Dreier <rolandd@cisco.com> BUGS
The network synchronization between client and server instances is weak, and does not prevent incompatible options from being used on the two instances. The method used for retrieving work completions is not strictly correct, and race conditions may cause failures on some systems. libibverbs August 30, 2005 IBV_UC_PINGPONG(1)
All times are GMT -4. The time now is 11:44 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy