The archive entry was compressed using an unsupported compression method

hardKOrr · 22 October 2020 13:46

I recently started receiving a strange error on some (otherwise working) deployments to certain machines. It was 1 machine on 1 deployment first, and we did several checks (dism /checkhealth, sfc scannow, dotnet repair, chkdsk, etc) to make sure there was nothing wrong with the machine. We found nothing from any of our checks. We thought maybe a corrupt transfer, so deleted the cache file for the nupkg yet received the same error. Since it was 1 machine in 1 release deployment we didn’t fret much.

Then a new machine in a new deployment for a completely different project threw this error.

Then 2 new machines had this error when running a script from task console.

Since its happened on package deployment, and ad-hoc task scripts execution I don’t know that there is a solid process or step or something I can point to for some direct culprit.

We haven’t been able to determine anything special, or specific about these machines. Its the same VC redist and .net installations across all of them. Any guidance would be very helpful, thanks!

Activity failed with error 'The archive entry was compressed using an unsupported compression method. Server exception: System.IO.InvalidDataException: The archive entry was compressed using an unsupported compression method. at System.IO.Compression.Inflater.Inflate(FlushCode flushCode) at System.IO.Compression.Inflater.ReadInflateOutput(Byte* bufPtr, Int32 length, FlushCode flushCode, Int32& bytesRead) at System.IO.Compression.Inflater.ReadOutput(Byte* bufPtr, Int32 length, Int32& bytesRead) at System.IO.Compression.Inflater.InflateVerified(Byte* bufPtr, Int32 length) at System.IO.Compression.DeflateStream.ReadCore(Span1 buffer) at System.IO.Compression.DeflateStream.Read(Byte[] array, Int32 offset, Int32 count) at System.IO.BinaryReader.InternalRead(Int32 numBytes) at System.IO.BinaryReader.ReadInt32() at Newtonsoft.Json.Bson.BsonDataReader.ReadNormal() at Newtonsoft.Json.Bson.BsonDataReader.Read() at Newtonsoft.Json.JsonReader.ReadForType(JsonContract contract, Boolean hasConverter) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent) at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType) at Newtonsoft.Json.JsonSerializer.Deserialize[T](JsonReader reader) at Halibut.Transport.Protocol.MessageExchangeStream.ReadBsonMessage[T]() at Halibut.Transport.Protocol.MessageExchangeStream.Receive[T]() at Halibut.Transport.Protocol.MessageExchangeProtocol.ProcessReceiverInternalAsync(IPendingRequestQueue pendingRequests, RequestMessage nextRequest)'. The archive entry was compressed using an unsupported compression method. Server exception: System.IO.InvalidDataException: The archive entry was compressed using an unsupported compression method. at System.IO.Compression.Inflater.Inflate(FlushCode flushCode) at System.IO.Compression.Inflater.ReadInflateOutput(Byte* bufPtr, Int32 length, FlushCode flushCode, Int32& bytesRead) at System.IO.Compression.Inflater.ReadOutput(Byte* bufPtr, Int32 length, Int32& bytesRead) at System.IO.Compression.Inflater.InflateVerified(Byte* bufPtr, Int32 length) at System.IO.Compression.DeflateStream.ReadCore(Span1 buffer) at System.IO.Compression.DeflateStream.Read(Byte array, Int32 offset, Int32 count) at System.IO.BinaryReader.InternalRead(Int32 numBytes) at System.IO.BinaryReader.ReadInt32() at Newtonsoft.Json.Bson.BsonDataReader.ReadNormal() at Newtonsoft.Json.Bson.BsonDataReader.Read() at Newtonsoft.Json.JsonReader.ReadForType(JsonContract contract, Boolean hasConverter) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent) at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType) at Newtonsoft.Json.JsonSerializer.Deserialize[T](JsonReader reader) at Halibut.Transport.Protocol.MessageExchangeStream.ReadBsonMessageT at Halibut.Transport.Protocol.MessageExchangeStream.ReceiveT at Halibut.Transport.Protocol.MessageExchangeProtocol.ProcessReceiverInternalAsync(IPendingRequestQueue pendingRequests, RequestMessage nextRequest)

jeremy.miller · 22 October 2020 14:36

Hey @hardKOrr ,

Thanks for reaching out.

This definitely seems like strange behavior. I’m not sure I’ve seen this yet.

Are there any commonalities between the machines exhibiting this behavior that sets them apart from the working ones? Antivirus, OS, hardware, etc?

Were these machines working at some point and now they aren’t? If so, did anything change between them?

Could you please direct message me a full task log of the failure?

Are there any commonalities with the steps/scripts that may be causing this to happen?

Please let me know what you think.

Thanks,
Jeremy

hardKOrr · 22 October 2020 15:23

I spent a little more time digging around, and it seems like I get the error on task scripts when the C drive is full. I cleared space off of 1 of the failing task machines and that cleared up the problem there. However the release deployment machines do not have a full disk issue. So I think I will focus on the deployment issue for now with information (I will include a full task log for the script console, but again that seems to just be disk full). I haven’t been able to single out any differences between the project deployment failing machines and any other successful machines.

I suppose it may not have been 100% clear from my previous post, but it is just these specific actions (Project A + Machine A or Project B + Machine B) that are failing. I can execute different actions against those machines and not receive the error. So these machines currently are working for all deployments except the individual combination that fails with this error. There have been no changes to the machines that I know of, but I am normally only called in after a problem occurs and I do not regularly check machine states.

The 2 projects both install an MSI, both use the same script to do so, however there is almost 10 gb of free space for relatively small installations. I do not believe its a disk space issue for the project deployments. There are commonalities between the MSIs, however those commonalities are not unique to those 2 projects, its things like SQL and certificate handling that we do in several other products.

jeremy.miller · 22 October 2020 16:04

Hey @hardKOrr,

Thanks for the further information.

Do those projects have other machines in them that do deploy successfully, or are the projects deploying to individual machines?

Thanks,
Jeremy

hardKOrr · 22 October 2020 16:18

Both projects use roles, that role designation for one of the projects is very specific and expected to be 1 machine in a tenant. The other project deploys to the role ‘All’ which is as encompassing as it sounds, and so that deployment was pushing to more than 1 machine at a time in a tenant.

jeremy.miller · 22 October 2020 17:24

Thanks for the clarification. In the project deploying to All, are you able to discern any difference between the 1 machine with the issue vs the others that are successfully deploying? Would you be able to send through a task log of that (or is it one of the three you sent to me)? Is there anything in the OctopusTentacle.txt logs at the time of the failures?

hardKOrr · 22 October 2020 17:52

One of the logs sent was for the ‘All’ Deployment (64654).

There is nothing in the Tentacle log other than the CheckServices calls coming from the watchdog.

jeremy.miller · 22 October 2020 18:26

Hey @hardKOrr,

I took a look at the log and a couple things were a bit different.

Would you be able to put a command at the beginning of the script that checks the sha of the file to make sure it’s getting there without any issue? Here is a link for that in case you need it: https://support.microsoft.com/en-us/help/889768/how-to-compute-the-md5-or-sha-1-cryptographic-hash-values-for-a-file

I also noticed that the physical memory is quite a bit lower than the one where it succeeds. It does have ~5GB remaining, but do you think its possible the task you’re asking it to do is eating that up? I’m guessing that’s unlikely but wanted to mention it.

Would you be able to copy the file to the machine with a method other than Octopus, and run the script locally outside of Octopus and see if it succeeds or fails? If you can do this as the service account Octopus is running as, that could be helpful information as well. I see it’s running as NT SERVICE\OctopusDeploy Tentacle

Although it’s unlikely to be the issue, the CLR version of the machine that failed was 4.0.30319.42000 while the working one was 4.0.30319.36543. Is the other machine you were talking about with the issue at version 42000?

Please let me know if you have any questions.

Thanks,
Jeremy

hardKOrr · 22 October 2020 19:20

I have manually confirmed the files on both of the failing machines previously.
The installer does not have a very heavy overhead, I can’t imagine that it would be taking up the remaining 5 GB either.

Manual execution of the MSI, as well as scripted execution of the MSI has been successful for both failing machines. I have not been able to run the script as the virtual service user, as there’s no login method for it outside of the tentacle service itself running.

The other failing machine(s) were of the 30319.42000 CLR version, however there are also (several) succeeding machines with the same version.

/e : One last note I had forgotten, when the deployment starts for these failing machines, the Octopus Server UI reports (nearly instantly) a VERY high estimated time for the deployment to complete. Anywhere from 2 hours up to 9 hours for completion. It seems the Octopus Server may know something is awry when it starts, whether its aware that it knows or not. True to form the script will act as if its executing for hours and hours and hours, however there is no indication that the scripts are executing at all.

jeremy.miller · 22 October 2020 19:49

Hey @hardKOrr,

Thanks for all the information. I’m going to speak with some engineers regarding this and see if they have any avenues they’d like us to try.

Please let me know if you have any questions or concerns in the meantime.

Thanks,
Jeremy

hardKOrr · 22 October 2020 19:51

Thanks, I’ll see if I can poke at a few more things in the meantime myself. All of our systems are nearly identical (or groups of identicality at least) so its been estranged to find these randomly failing machines.

jeremy.miller · 23 October 2020 16:13

Hey @hardKOrr,

Just to confirm, when you said you’ve manually confirmed the files, does that mean you’ve done a SHA checksum and confirmed the server and the local versions are the same?

As a test, can you disable delta compression and see if the issue goes away?

Please let me know what you think.

Thanks,
Jeremy

hardKOrr · 23 October 2020 17:33

I confirmed the SHA checksum on the downloaded package on the tentacle, as well as the extracted files from that package to the package (and extracted files) from the build output. They match all SHA checksums.

I had previously deleted all existing packages from c:\octopus\files for the failing project and forced a full redownload like that, I am running one now with the same deletion + disabling compression its listing 5 hours for expected finish time, so it appears it will run into the same issue again.

jeremy.miller · 23 October 2020 18:04

Hi @hardKOrr,

Thanks for confirming the checksum.

Please let me know how that test goes.

Best,
Jeremy

hardKOrr · 23 October 2020 18:36

Failed again in the same way

jeremy.miller · 23 October 2020 18:38

Hey @hardKOrr,

Thanks for testing it.

Let me get back to the engineers and see what they think regarding next steps. This does seem environmental in some way since some targets work but others don’t, but the question is what is it.

I did ask the question regarding the immediate long estimated time on the step, and they said its calculated based on previous attempts of that step.

Please let me know if you have any questions in the meantime.

Thanks,
Jeremy

jeremy.miller · 27 October 2020 14:31

Hey @hardKOrr,

I have some more stuff for you to test. I had some colleagues look at the logs and they think the package stuff might be a red herring. They’re thinking this may be network related due to the Halibut messages in the log.

Would you be able to set up Tentacle Ping? This will help us figure out if there are any communications errors after some time running.

In addition to that, you could also do a test by running curl on the target to download the package from the Octopus feed. If you can do this a bunch, we may eliminate the tentacle as a possible issue here and maybe shed some light on a possible network issue.

Please let me know what you think.

Thanks,
Jeremy

hardKOrr · 27 October 2020 15:52

I’ll get working on this as soon as that server frees up. Thanks again for the assistance here!

jeremy.miller · 27 October 2020 15:53

You’re very welcome! Please let me know how it goes.

hardKOrr · 28 October 2020 18:47

Well I didn’t see the complete run and didnt output to file, but last I checked on the server running Tentacle Ping it was at 5000 successful pings and 0 failed.