This weekend there was a sudden application crash in the server.
I did not know where to start to investigate the problem, so I first looked into the /var/adm/syslog/syslog.log, and this was what I found :
Code:
Dec 17 00:38:02 L28bi01 sshd[126]: error: accept: No buffer space available
Dec 17 00:38:02 L28bi01 sshd[24333]: error: setsockopt SO_KEEPALIVE: Invalid argument
Dec 17 00:38:07 L28bi01 sshd[24379]: error: setsockopt SO_KEEPALIVE: Invalid argument
Dec 17 00:38:21 L28bi01 sshd[24445]: error: PAM: No account present for user for illegal user UlGLXBTX from 10.61.1.55
Dec 17 00:38:21 L28bi01 sshd[24447]: error: PAM: No account present for user for illegal user anonymous from 10.61.1.55
Dec 17 00:38:26 L28bi01 sshd[24511]: error: PAM: No account present for user for illegal user guest from 10.61.1.55
Dec 17 00:38:27 L28bi01 sshd[24515]: error: PAM: No account present for user for illegal user IyoYLEnT from 10.61.1.55
Dec 17 00:38:28 L28bi01 sshd[24517]: error: PAM: No account present for user for illegal user shelladmin from 10.61.1.55
Dec 17 00:38:31 L28bi01 sshd[24524]: error: PAM: Authentication failed for root from 10.61.1.55
Dec 17 00:38:31 L28bi01 sshd[24525]: error: PAM: No account present for user for illegal user netscreen from 10.61.1.55
Dec 17 00:38:33 L28bi01 sshd[24528]: error: PAM: No account present for user for illegal user admin from 10.61.1.55
Dec 17 00:38:38 L28bi01 sshd[24534]: error: PAM: Authentication failed for root from 10.61.1.55
Dec 17 00:38:58 L28bi01 sshd[24542]: error: PAM: No account present for user for illegal user admin1 from 10.61.1.55
Dec 17 00:39:06 L28bi01 sshd[24552]: error: PAM: No account present for user for illegal user admin from 10.61.1.55
Dec 17 00:39:18 L28bi01 sshd[24561]: error: PAM: No account present for user for illegal user emailswitch from 10.61.1.55
Dec 17 00:39:22 L28bi01 sshd[24584]: error: PAM: No account present for user for illegal user product from 10.61.1.55
Dec 17 00:39:23 L28bi01 sshd[24599]: error: PAM: No account present for user for illegal user admin from 10.61.1.55
Dec 17 00:39:27 L28bi01 sshd[24621]: error: PAM: Authentication failed for root from 10.61.1.55
Dec 17 00:39:29 L28bi01 sshd[24626]: error: PAM: No account present for user for illegal user n3ssus from 10.61.1.55
Dec 17 00:39:31 L28bi01 sshd[24632]: error: PAM: Authentication failed for root from 10.61.1.55
Dec 17 00:41:01 L28bi01 sshd[126]: error: accept: No buffer space available
Dec 17 00:41:01 L28bi01 sshd[25366]: error: setsockopt SO_KEEPALIVE: Invalid argument
Dec 17 00:41:55 L28bi01 sshd[26128]: error: PAM: No account present for user for illegal user cisco from 10.61.1.55
Dec 17 00:42:00 L28bi01 sshd[26134]: error: PAM: No account present for user for illegal user Cisco from 10.61.1.55
Dec 17 00:42:02 L28bi01 sshd[26142]: error: PAM: No account present for user for illegal user admin from 10.61.1.55
Dec 17 00:42:04 L28bi01 sshd[26175]: error: PAM: No account present for user for illegal user from 10.61.1.55
Dec 17 00:42:10 L28bi01 sshd[26254]: error: PAM: No account present for user for illegal user manage from 10.61.1.55
Dec 17 00:42:15 L28bi01 sshd[26273]: error: PAM: No account present for user for illegal user monitor from 10.61.1.55
Dec 17 00:42:19 L28bi01 sshd[26280]: error: PAM: No account present for user for illegal user ftp from 10.61.1.55
Dec 17 00:42:54 L28bi01 sshd[26792]: error: PAM: No account present for user for illegal user Fortimanager_Access from 10.61.1.55
Dec 17 00:42:54 L28bi01 sshd[26791]: error: PAM: No account present for user for illegal user nessus_oJgOWh46 from 10.61.1.55
Dec 17 00:42:56 L28bi01 sshd[26791]: error: PAM: No account present for user for illegal user nessus_oJgOWh46 from 10.61.1.55
Dec 17 00:43:27 L28bi01 sshd[26926]: error: setsockopt SO_KEEPALIVE: Invalid argument
The error that is most related to this problem is "No buffer space available".
When I googled this error, there was no solid solution, some say memory pressure, and some say check the kernel value "tcp_conn_request_max" but I do not see this value present at all in the server.
However, the application logs present this error :
Code:
File: data.c, Line: 2963, Time: 2017.12.17 00:36:56, RC: -23
Text: CL_receive_message failed
Error during 'read'
System error: Connection timed out
File: data.c, Line: 2963, Time: 2017.12.17 00:37:46, RC: -23
Text: CL_receive_message failed
Error during 'read'
System error: Connection timed out
File: data.c, Line: 2963, Time: 2017.12.17 00:37:46, RC: -23
Text: CL_receive_message failed
Error during 'read'
System error: Connection timed out
File: data.c, Line: 825, Time: 2017.12.17 00:38:52, RC: -28
Text:
Connection between client and server was terminated
File: data.c, Line: 918, Time: 2017.12.17 00:38:52, RC: -28
Text:
Connection between client and server was terminated
File: data.c, Line: 3564, Time: 2017.12.17 00:43:27, RC: -20
Text:
Socket option error
System error: Invalid argument
File: dta_ids.c, Line: 4027, Time: 2017.12.17 00:43:27, RC: 0
Text: DaTA shutting down: ids clients finished
File: dta_ids.c, Line: 4052, Time: 2017.12.17 00:43:28, RC: 0
Text: DaTA shutting down: std clients finished
File: dta_ids.c, Line: 4078, Time: 2017.12.17 00:43:31, RC: 0
Text: DaTA shutting down: file queues synchronized
Could this be a network issue?
How do I investigate this problem, I need to know the RCA of it. Please help.
Possibly the system has run out of network buffer resources. Has the load/number of users been steadily increasing?
Network configurations determine the number of buffers available for network packets (of various different sizes) arriving and departing, and also the maximum number of connections.
Exactly and some security scanning devices though said "non intrusive" manage to get you in such embarrassing situation...Why do these messages occurs?:
Code:
No account present for user for illegal user admin from 10.61.1.55
Tons of login attempts to various users would make me very suspicious, even though they seem to occur from the local area network. Even more as they include four failed attempts for root. Identify the machine that attempts come from and check it for malware.
I have seen almost exactly this before, it was a product called "Foundstone" - the Wintel team had deployed this straight out of the box and it caused mayhem on the Unix Estate.
I would think that this one of two things.
You have a security breach and you're going to have a significant issue.
Or from what I can see there is some kind of Scanner running and it needs to be configured or stopped before it starts locking accounts and causing other issues.
I would be tempted to speak to the other teams and find out what has changed. Also watch for it happening on a regular basis "Weekly, Monthly etc".
It could also be that someone has got something to evaluate and don't understand the implications.
As advised earlier in the thread, find the machine and beat the user up - you have an excuse!
I encounter the following crash on RHEL 7.0 when I run a multithreaded video rendering application using GLFW and OpenGL. OpenGL version is 2.1 and MESA version is 9.3.0
Following is the back trace of the multi-threaded program I am working on:... (0 Replies)
Hi there,
We have a Solaris 10 machine which has been up and running for more than 400 days. A strange behaviour happened. The system date defaulted to epoch timestamp. Oracle stopped and application failed causing management to parade. We managed to reset the date. All other servers and... (8 Replies)
Problem
- Linux Client/Server Socket Application: Preventing Client from quitting on server crash
Hi,
I am writing a Linux socket Server and Client using TCP protocol on Ubuntu 9.04 x64.
I am having problem trying to implement a scenario where the client should keep running even when the... (2 Replies)
hello all,
I have developed a server application in C for ulinux kernel 2.6.It works very fine; creating a socket, binding it to a port, listening for incoming sockets and accepting them ,all finish without any error.
But there is a problem regarding application crash.After an intentionally... (1 Reply)
Hi all
I am running a major script of my application in development for implementing code changes for process improvement in time. The script runs in production once in a month . It takes 8 hours 30 mins in Production server . what surprice me is , when I run the same script in development server... (9 Replies)
Hi,
When most of the server applications get installed, they create their own user. I believe this is to not use the "root" account. For example, Apache when installed creates a user called "apache". And the directories which it uses are all owned by this user. This seems to be the... (2 Replies)
Is it common in the Unix/Linux environment to install compute intensive applications on a Server system and have the client machines download the executables into memory at runtime to run locally? This model seems taxing to the network, and as I understand, has been largely abandoned in the... (1 Reply)
Hi I am using unix for last few days. Here is my problem
during boot the machine stop giving video signal and I don't know what's happening.
When I ping (during boot) it from another machine it comes alive then goes out.
The power on the CPU is on all the time.
please help. (9 Replies)