We had some alerts go off noting socket exceptions for a Web App that is hosted on Azure. This service makes a connection to an SFTP server and this is where the exception is occurring:
{
"Type":"System.AggregateException",
"HResult":-2146233088,
"Message":"One or more errors occurred. (Unable to establish the socket.)",
"Source":null,
"StackTrace":null,
"InnerException":{
"HResult":-2146233088,
"Message":"Unable to establish the socket.",
"Source":"SomeCompnay.SuperSecret..WebApi",
"StackTrace":" at async Task<JobStatusCode> SomeCompnay.SuperSecret..WebApi.Services.UploadJobService.RunJobAsync(JobStartInfo jobInfo, CancellationToken ct) in /home/jenkins/agent/workspace/some-service/UploadJobService/UploadJobService.cs:line 156",
"InnerException":{
"HResult":-2147467259,
"Message":"A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.",
"Source":"Renci.SshNet",
"StackTrace":" at void Renci.SshNet.Abstractions.SocketAbstraction.ConnectCore(Socket socket, IPEndPoint remoteEndpoint, TimeSpan connectTimeout, bool ownsSocket)\r\n at void Renci.SshNet.Abstractions.SocketAbstraction.Connect(Socket socket, IPEndPoint remoteEndpoint, TimeSpan connectTimeout)\r\n at Socket Renci.SshNet.Connection.ConnectorBase.SocketConnect(string host, int port, TimeSpan timeout)\r\n at Socket Renci.SshNet.Connection.DirectConnector.Connect(IConnectionInfo connectionInfo)\r\n at void Renci.SshNet.Session.Connect()\r\n at ISession Renci.SshNet.BaseClient.CreateAndConnectSession()\r\n at void Renci.SshNet.BaseClient.Connect()\r\n at Task SomeCompany.SuperSecret.SftpClient.RenciSftpClient.Connect(CancellationToken ct) in /home/jenkins/agent/workspace/settlement-file-processor_master/utility/SftpClient/RenciSftpClient.cs:line 89\r\n at async Task<JobStatusCode> SomeCompnay.SuperSecret..WebApi.Services.UploadJobService.RunJobAsync(JobStartInfo jobInfo, CancellationToken ct) in /home/jenkins/agent/workspace/some-service/UploadJobService/UploadJobService.cs:line 133",
"SocketErrorCode":"TimedOut",
"ErrorCode":10060,
"NativeErrorCode":10060,
"Type":"System.Net.Sockets.SocketException"
},
"HttpStatusCode":503,
"Error":null,
"Type":"SomeCompnay.SuperSecret..WebApi.JobException"
},
"InnerExceptions":[
{
"HResult":-2146233088,
"Message":"Unable to establish the socket.",
"Source":"SomeCompnay.SuperSecret..WebApi",
"StackTrace":" at async Task<JobStatusCode> SomeCompnay.SuperSecret..WebApi.Services.UploadJobService.RunJobAsync(JobStartInfo jobInfo, CancellationToken ct) in /home/jenkins/agent/workspace/some-service/UploadJobService/UploadJobService.cs:line 156",
"InnerException":{
"HResult":-2147467259,
"Message":"A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.",
"Source":"Renci.SshNet",
"StackTrace":" at void Renci.SshNet.Abstractions.SocketAbstraction.ConnectCore(Socket socket, IPEndPoint remoteEndpoint, TimeSpan connectTimeout, bool ownsSocket)\r\n at void Renci.SshNet.Abstractions.SocketAbstraction.Connect(Socket socket, IPEndPoint remoteEndpoint, TimeSpan connectTimeout)\r\n at Socket Renci.SshNet.Connection.ConnectorBase.SocketConnect(string host, int port, TimeSpan timeout)\r\n at Socket Renci.SshNet.Connection.DirectConnector.Connect(IConnectionInfo connectionInfo)\r\n at void Renci.SshNet.Session.Connect()\r\n at ISession Renci.SshNet.BaseClient.CreateAndConnectSession()\r\n at void Renci.SshNet.BaseClient.Connect()\r\n at Task SomeCompany.SuperSecret.SftpClient.RenciSftpClient.Connect(CancellationToken ct) in /home/jenkins/agent/workspace/settlement-file-processor_master/utility/SftpClient/RenciSftpClient.cs:line 89\r\n at async Task<JobStatusCode> SomeCompnay.SuperSecret..WebApi.Services.UploadJobService.RunJobAsync(JobStartInfo jobInfo, CancellationToken ct) in /home/jenkins/agent/workspace/some-service/UploadJobService/UploadJobService.cs:line 133",
"SocketErrorCode":"TimedOut",
"ErrorCode":10060,
"NativeErrorCode":10060,
"Type":"System.Net.Sockets.SocketException"
},
"HttpStatusCode":503,
"Error":null,
"Type":"SomeCompnay.SuperSecret..WebApi.JobException"
}
]
}
Restarting the service does not resolve the issue.
Workaround
The suggested workaround by Microsoft was to scale-up then scale-down the service app. This makes sense since the service will run on a different pool of different SKU servers.
Consequences
This fixed our immediate issue, however it broke our connections to Azure SQL and other resources since the Firewall rules that include the possible outbound IP addresses of the service have now changed. This stop-gap will include a Logic App to automatically scale-up and scale-down on a set schedule and another app to automatically update firewall rules for our dependencies.
TODO: Add the gist of the firewall rule update script.
Long-term Fix
Ideally, these exceptions shouldn't be occurring. We suspect this may have something to do with the SSH.NET FTP library we're using and Azure. The code does not inject the client as a singleton so it's not long-lived and it is disposed of when done. It seems there are a few issues open related to socket exceptions (common for any kind of socket level library). There are relevant long-standing issues filed.
We plan to evaluate other FTP clients.