Inconsistent IIS ARR 502 error for NuGet push from VSTS

We’re getting inconsistent IIS ARR 502 errors with NuGet pushes from VSTS. I haven’t seen this before until earlier this week. When I run the push from my laptop, I don’t have any issues.

From VSTS, it’s farily consistent in not working, but on occasion it will work once - but only for this instance (running Octopus v2018.6.1). Most of our clusters still run a much older version so we’re experimenting in production with a select few Octopus instances and this is one issue that’s come up that I don’t even know how to effectively troubleshoot.

Here’s the error from VSTS:

We have AWS ELB in front of 2 IIS ARR servers. Both servers have 50% distribution and URL rewrite down to the Octopus 2-way HA cluster. I’ve witnessed thru IIS monitoring both HA nodes throw an error so it’s not just one of the servers. Both servers have the exact same URL bindings within Octopus as well as SSL certs (as do IIS ARR and ELB). I’m not a whiz with IIS logs but the inconsistency is downright confusing.

One odd thing I’ve seen in the Octopus logs is this:

Unhandled error on request: https://octopus.accenture.com/DevArchMaturity_9347/api/octopusservernodes/VW106830 817a6b64976b4774ab1294c8d216699f by gates.yuxiang.chi@accenture.com : ‘VW106830’ is not a valid document Id. It should have the format ‘-’. Example: ‘Projects-101’. System.Exception: ‘VW106830’ is not a valid document Id. It should have the format ‘-’. Example: ‘Projects-101’. at Octopus.Core.Resources.DocumentIdParser.AssertValidDocumentId(String documentId) at Octopus.Server.Web.Infrastructure.Api.ModifyResponseDescriptor2.Responder.Execute() at Octopus.Server.Web.Infrastructure.Api.Responder1.ExecuteRegistered() at Octopus.Server.Web.Infrastructure.Api.Responder1.Respond(TDescriptor options, NancyContext context) at System.Dynamic.UpdateDelegates.UpdateAndExecute3[T0,T1,T2,TRet](CallSite site, T0 arg0, T1 arg1, T2 arg2) at CallSite.Target(Closure , CallSite , Object , Object , NancyContext ) at Octopus.Server.Web.Infrastructure.OctopusNancyModule.<>c__DisplayClass14_0.<get_Routes>b__1(Object x) at CallSite.Target(Closure , CallSite , Func2 , Object ) at Nancy.Routing.Route.<>c__DisplayClass4.b__3(Object parameters, CancellationToken context)

I’m not sure if this related to our issue or not, but it’s very pecuilar and I’ve never seen an issue like this before. I don’t know if this is a related issue or not.

Any thoughts/ideas/logs to hunt down? This is very similar to this reported post here: Bad Gateway Error but no solution was ever reported.

Thanks,

Ian

Adding server diagnostic logs.OctopusDeploy-636646812487936449.zip (750.3 KB)

Never an easy one with your instances, right Ian? :smile:

About the package push issue

The hive mind discussed this one in a call today and we couldn’t think of anything in Octopus that might be causing this. Also the fact that 1) You are running behind IIS ARR and 2) the HTML error in the Octo.exe call comes from IIS, makes us feel its an environmental issue related to the network setup.

Unless I missed something, in your zip file there’s only Octopus logs. Is there any chance you can also send the IIS logs of the time of the issue? The html code we see in the VSTS screenshot is coming straight from IIS, so odds are we’ll find more info in its logs rather than in Octopus’. The ideal thing here would be to get the IIS and Octopus logs of the exact same time of the issue.

You also mentioned that this runs just fine from your VM vs when it runs from a build agent. What’s the difference between these two machines network and domain wise?

About the Unhandled error in the Octopus Logs

In the error it seems that someone tried to POST to https://octopus.accenture.com/DevArchMaturity_9347/api/octopusservernodes/VW106830, which is not a valid URL. If you check at the end of it, it references a VM FQDN instead of an OCtopus ID like Projects-1. This makes me believe gates.yuxiang.chi (or someone using his API Key) wrote a custom script with a bug on it OR that the Octopus.clients version that the script is using is out of date.

For the sake of having a organized conversation, you might wanna split these 2 into separate tickets as they don’t seem related :slight_smile:

Looking forward to see those IIS logs (might be the first person in history uttering those words)

Regards,
Dalmiro.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.