Hi Team,
We are facing intermittent failures while acquiring packages during deployment in few environments having 50+ deployment targets. There are 26 packages being acquired on these machines. During few deployments, package acquire fails for random package, while same package downloaded successfully on other machines in same deployment.
Error:
Downloading NuGet package ABC v1.0.0 from feed: ‘http://abc:8081/xxx/xxx/xxx/’
Failed to download package ABC v1.0.0 from feed: ‘http://abc:8081/xxx/xxx/xxx/’
Could not find Zip file Directory at the end of the file. File may be corrupted.
SharpCompress.Common.ArchiveException
at SharpCompress.Common.Zip.SeekableZipHeaderFactory.SeekBackToHeader(Stream stream, BinaryReader reader, UInt32 headerSignature)
at SharpCompress.Common.Zip.SeekableZipHeaderFactory.d__3.MoveNext()
at SharpCompress.Archives.Zip.ZipArchive.d__16.MoveNext()
at SharpCompress.LazyReadOnlyCollection1.LazyLoader.MoveNext() at Calamari.Integration.Packages.NuGet.LocalNuGetPackage.ReadMetadata(String filePath) at System.Lazy
1.CreateValue()
at System.Lazy`1.LazyInitValue()
at Calamari.Integration.Packages.PackageName.FromFile(String path)
at Calamari.Integration.Packages.Download.NuGetPackageDownloader.DownloadPackage(String packageId, IVersion version, Uri feedUri, ICredentials feedCredentials, String cacheDirectory, Int32 maxDownloadAttempts, TimeSpan downloadAttemptBackoff)
at Calamari.Integration.Packages.Download.NuGetPackageDownloader.DownloadPackage(String packageId, IVersion version, String feedId, Uri feedUri, ICredentials feedCredentials, Boolean forcePackageDownload, Int32 maxDownloadAttempts, TimeSpan downloadAttemptBackoff)
at Calamari.Integration.Packages.Download.PackageDownloaderStrategy.DownloadPackage(String packageId, IVersion version, String feedId, Uri feedUri, FeedType feedType, ICredentials feedCredentials, Boolean forcePackageDownload, Int32 maxDownloadAttempts, TimeSpan downloadAttemptBackoff)
at Calamari.Commands.DownloadPackageCommand.Execute(String commandLineArguments)
We have noticed that in such errors, the package file is present on the machine where it notified to be failed, but the size of package is not correct and the package file is faulty. to fix the error, we have to manually delete the file and re-try the deployment. We have found idle timeouts errors on our feed endpoint logs, though there is no network contention or IOPS issues. There should be some better error handling instead of runtime error. May be addition of logic to delete the corrupt package and re-try (like we do in case of normal timeout to fetch package from feed endpoint). We will also like to get some help in to understand about the idle timeouts if possible, ideally if package file is not downloaded completely, the file name should be different indicating download is not complete, thats not the case in these errors.
Server details:
Octo version: 2020.2.15
Instance type: C5.2xlarge
number of nodes in cluster: 1
task cap: 50
Octopus.Acquire.MaxParallelism : 50
Octopus.Action.MaxParallelism : 100
Old case raised for same issue by us:
Our final package repository behind cache/forwarder machine has been changed from Proget to Nexus. And we are using nexus at our forwarder machine as well (even when we were using Proget).
Thanks and Regards,
Devan