Poor compression of >20GB exe/msi/cab sample
-
@spwolf said in Poor compression support:
@diskzip a lot of repeat data that works well with 1.5GB dictionary… you can set plzma to 2000M in settings, and mt to 1, and lets see how it works.
I tested with 720m dictionary and that got it down another 400MB. But thats about the limit of my 12GB laptop.
When it comes to testing, having a test case that requires 18GB-25GB of free RAM is just too hard and obscure to test. It would be better to have some sample that can use proper multithreading and reasonable dictionary that users will actually end up using - for instance 128m and 8t.
When it comes to comparing lzma2 to our plzma with lzmarec entropy coder, you should see around 2%-4% improvement all other things being equal.
For us, while lzmarec is nice and can always show improvement over same settings for lzma2, it is not the main point of the PA format… more important are all these other codecs - mp3, lepton/jpeg, reflate for pdf/docx/deflate, bwt for text, mt ppmd for some multimedia files, deduplication filter for everything that works 50-60MBs, etc, etc… and how it all works automatically and multithreaded.


So following the order of these instructions, the custom dictionary setting was lost. I had to repeat that step - glad I double-checked. Not the most intuitive UI, if you are open to a bit of negative feedback.
Another negative tidbit, it took about 1 minute for the operation to initiate (for the compressing files window to appear) after I clicked the Finish button.
Not the best user experience really, but I am excited to see what actual compression savings will result.
-
@diskzip yeah, i noticed i posted wrong order but i figured you will figure it out… we have to reset the settings so users who enter wrong ones can go back to defaults, but otherwise users can easily save a profile with those settings and then always use that profile.
-
@diskzip said in Poor compression support:
@spwolf Is it necessary to restrict PA to only one thread? DiskZIP obtained this result on two threads, not one.
How about other filters - do I need to override any of those settings as well, especially in light of the large number of binaries included in my distribution?
no, you dont need to do anything else… you are actually using 7z.exe and lzma2, right? lzma2 uses 2 threads per dictionary when it comes to memory - so in this case it is 11.5 x 1536M. Plzma is different not only due to different entropy coder, but also it is parallel version of lzma. So multiple threads are used for both compression and extraction. It also has larger maximum dictionary at 2000M.
Of course, even with mt1, there are multiple threads being used, depending on files, size, extension - for instance lzmarec entropy coder uses more than 1 thread anyway, and we also always use some extra filters.
In any case, what is maximum dictionary you use in your product? I am sure it is not 1.5G since thats 18GB of ram usage?
-
@spwolf said in Poor compression support:
@diskzip said in Poor compression support:
@spwolf Is it necessary to restrict PA to only one thread? DiskZIP obtained this result on two threads, not one.
How about other filters - do I need to override any of those settings as well, especially in light of the large number of binaries included in my distribution?
no, you dont need to do anything else… you are actually using 7z.exe and lzma2, right? lzma2 uses 2 threads per dictionary when it comes to memory - so in this case it is 11.5 x 1536M. Plzma is different not only due to different entropy coder, but also it is parallel version of lzma. So multiple threads are used for both compression and extraction. It also has larger maximum dictionary at 2000M.
Of course, even with mt1, there are multiple threads being used, depending on files, size, extension - for instance lzmarec entropy coder uses more than 1 thread anyway, and we also always use some extra filters.
In any case, what is maximum dictionary you use in your product? I am sure it is not 1.5G since thats 18GB of ram usage?
DiskZIP doesn’t invoke 7z.exe, we have our own low-level wrapper around 7-Zip; unlike PowerArchiver though, we don’t actually implement our own custom algorithm(s) or change the default 7-Zip compression in any way (other than exposing 7-Zip functionality in a nice, structured API with callbacks, etc.) - we also license this 7-Zip library to third parties for their use.
The results with PA using your exact settings are 2.86 GB, I am at a loss to understand why PA has performed so poorly on this data set.
Our dictionary is indeed exactly 1.5 GB - this is the 7-Zip maximum for present time (and even already this presents some problems with extraction on 32 bit systems due to memory fragmentation). It is LZMA2, of course, and with 2 threads.
I may have misreported the memory requirements - but don’t blame me, blame the Windows Task Manager! I see it going up to 17.X GB (so cap it at 18 GB) with the 1.5 GB dictionary. With a 1 GB dictionary, it goes up to 10 GB (give or take a gigabyte).
-
@diskzip interesting, i got 2.83G with 720m dictionary… It just has a lot of similar files so large dictionary with lzma does the wonders there. Doesnt seem like there is anything else to it.
Memory usage is 11.5x the dictionary size each 2 threads in mt setting for lzma2.
But how many of users have =>24GB required for such setting though?
-
@nikkho said in Poor compression support:
PowerArchiver 17.00.90 (Optimize Strong): 3,398,179,937 bytes
I used 1GB dictionary on PA, and things went reduced to 2,79GB:
- PowerArchiver 17.00.91 (Optimize Strong 1GB): 3,004,242,466 bytes
- PowerArchiver 17.00.90 (Optimize Strong): 3,398,179,937 bytes
-
@nikkho said in Poor compression support:
@nikkho said in Poor compression support:
PowerArchiver 17.00.90 (Optimize Strong): 3,398,179,937 bytes
I used 1GB dictionary on PA, and things went reduced to 2,79GB:
- PowerArchiver 17.00.91 (Optimize Strong 1GB): 3,004,242,466 bytes
- PowerArchiver 17.00.90 (Optimize Strong): 3,398,179,937 bytes
I tried both 7zip - Ultra and PA Strong at 128m, and 7z was 4.59GB while PA was 3.17GB.
This large difference is likely due to rep working on similar files. But rep has a limit of 2GB, so it likely misses a lot when it comes to 20GB samples. But sure is nice to work at mt8 and have it done in 3x less time :)
-
Well our target goal here is at least 2.48 GB, which is what DiskZIP is able to achieve with an out-of-the-box 7-Zip compression engine under the hood.
I was hoping for PA to reduce that further to the neighborhood of 2 GB even, or at least a symbolic reduction over the “raw” upload size.
It is great to see my own product outperforming all else, but in the interest of advancing the state-of-the-art in compression, I would hope for more third party competition :)
-
It should be possible to reach a better result with PA.
- plzma should support a larger window than 1536M
- a1/lzmarec mode might provide a few % better compression than a0/lzma
- rep1 dedup filter has parameters that can be tweaked too.
Or it might be better to disable it instead, when p/lzma with huge window is used. - reflate might work on some files.
- x64flt3 exe filter should be better than bcj2
- deltb filter should have some effect on exes too
- we can tweak file ordering
Atm we don’t have a PC with >20GB of memory around, so we can’t do these experiments.
And anyway, I’d not expect that much gain here, because we don’t have LZX recompression atm,
which is what is necessary for many of these cab/msi files.
As to .7z files, I guess I can integrate my existing lzma recompressor easily enough, but it won’t have that much effect.
-
@eugene said in Poor compression support:
It should be possible to reach a better result with PA.
- plzma should support a larger window than 1536M
- a1/lzmarec mode might provide a few % better compression than a0/lzma
- rep1 dedup filter has parameters that can be tweaked too.
Or it might be better to disable it instead, when p/lzma with huge window is used. - reflate might work on some files.
- x64flt3 exe filter should be better than bcj2
- deltb filter should have some effect on exes too
- we can tweak file ordering
Atm we don’t have a PC with >20GB of memory around, so we can’t do these experiments.
And anyway, I’d not expect that much gain here, because we don’t have LZX recompression atm,
which is what is necessary for many of these cab/msi files.
As to .7z files, I guess I can integrate my existing lzma recompressor easily enough, but it won’t have that much effect.
Interesting thoughts. I myself have lost the 32 GB RAM machine access for the next 10 days or so, but I will be glad to retest as soon as I have that access. In the meanwhile, I have a 16 GB RAM machine which I will try to retest on.
Some thoughts:
- I tried with 2 GB per the instructions.
- How to configure these?
- I was counting on dedup for huge savings. Would it conflict with LZMA or would it be best to enable it?
- I don’t think there’s many ZIP streams in the dataset.
- That sounds very exciting. Is it a custom PA filter? Is it for 64-bit binaries only, or does it also cover 32-bit binaries?
- Same as #5.
- This must be tweaked, even DiskZIP cannot compress well unless the file ordering is sorted instead of “random”.
For LZX recompression, you probably won’t be hampered by digital signatures (for when you end up having it), right?
On that note - some of the LZX’s may have Microsoft’s delta repacks, which may be more problematic than just ordinary LZX decompression.
The one good news for the .7z files is that they are all stored uncompressed/raw - so the only benefit lost is proper file sorting across the bigger data set.
-
@diskzip do you plan to add full MT support for 7z? I think that is a must have if you want people to use your tool over 7z. Otherwise, it is much easier to test 7z with just using 7zFM since we can use 8t cpus properly and it cuts down testing on 20gb files by significant margin (35m vs 140m for this test on my computer).
Or does DiskZip do anything else for 7zip that affects compression, are results different between 7z and diskzip using 7z?
-
In the meanwhile, I have a 16 GB RAM machine which I will try to retest on.
You should be able to use 1G dictionary there, at least.
I tried with 2 GB per the instructions.
2GB is wrong, I suggested 2000M; 2GB is 2048M.
Problem is, bt4 matchfinder uses 32-bit indexes and there’s a dual buffer for window
(to avoid special handling for wrap-around).
Then, there’re also some special margins, so using precisely 2^32/2 for window size
is also impossible.
I’m not sure about the precise maximum for window size, so you can start with 1536M
and try increasing it, I guess.How to configure these?
Try looking around in all options windows/tabs?
Otherwise, just try testing with 7zdll/7zcmd, you should have the links?I was counting on dedup for huge savings.
Would it conflict with LZMA or would it be best to enable it?Current dedup filter (rep1) only supports up to 2000M window too,
due to the same issues as lzma, so it would only hurt lzma compression,
when it has the same window.Compression-wise, srep should be better atm, but its also slower
and relies on temp files too much.
And in any case, you should understand that we can’t just use Bulat’s
tools in a commerical app.In fact, dedup filter improvement is planned, I’m just busy
with other codecs atm.I don’t think there’s many ZIP streams in the dataset.
There’re plenty of cab archives with MSZip compression though.
Like all .msu files, for example.- That sounds very exciting. Is it a custom PA filter?
Is it for 64-bit binaries only, or does it also cover 32-bit binaries?
It does more or less the same as bcj2 for 32-bit binaries (hopefully better),
and it also adds support for RIP addressing in x64 binaries.- Same as #5.
Yes. Normal delta filter in 7z simply subtracts all bytes with a given step.
(For example, it would be delta:4 for 16-bit stereo wavs.)While deltb is an adaptive delta filter, which tries to detect binary tables
in the data. Its not very good for multimedia, but can be quite helpful for exes.For LZX recompression, you probably won’t be hampered by digital signatures
(for when you end up having it), right?All our recompression is lossless, so hashes/crcs/signatures should still match
on decoding, because extracted archive should be exactly the same.There’s a bigger problem with LZX though - it supports window size up to 2M,
and does optimal parsing, so a reflate equivalent for LZX might appear too slow,
or would generate too much recovery data (if optimal parsing is not reproduced in recompressor).But at least for LZX it might still be possible, while for LZMA it likely isn’t.
On that note - some of the LZX’s may have Microsoft’s delta repacks, which
may be more problematic than just ordinary LZX decompression.Yes, there’s also LZMS, which is a newer LZX upgrade with support for >2M windows,
x64 code preprocessing, etc.
And then, MS also uses quite a few other compression algorithms (xpress,quantum,LZSS,…).
But its a lot of work to write a recompressor even for a single format,
so we don’t have any plans for these atm.Its much more interesting to look into direct applications of what we already have first,
like reflate-based recompression for png/zip/pdf, adding level/winsize detector to reflate, etc.The one good news for the .7z files is that they are all stored
uncompressed/raw - so the only benefit lost is proper file sorting across
the bigger data set.Yes, it could be a good idea to write recompressors for popular archive formats,
even without support for their codecs - just turn archive into a folder
and extract whatever data in archive corresponding to files with names from archive. - That sounds very exciting. Is it a custom PA filter?
-
@spwolf said in Poor compression support:
@diskzip do you plan to add full MT support for 7z? I think that is a must have if you want people to use your tool over 7z. Otherwise, it is much easier to test 7z with just using 7zFM since we can use 8t cpus properly and it cuts down testing on 20gb files by significant margin (35m vs 140m for this test on my computer).
Or does DiskZip do anything else for 7zip that affects compression, are results different between 7z and diskzip using 7z?
DiskZIP is fully multi-threaded, but the default compression profiles all favor smaller archive size over processing speed, so you would need to edit your compression settings in the DiskZIP GUI to spread usage over more cores. I am escalating this request internally to see where the magic happens here.
Note that with standard 7-Zip (or DiskZIP that consumes standard 7-Zip from a structured DLL interface), you need to limit thread counts to two for obtaining the best results. While LZMA2 has been optimized to spread the workload across multiple threads, doing so always does very substantial harm to the compression savings realized.
DiskZIP does not do anything that affects compression, so results should be 100% identical between 7-Zip and DiskZIP.
-
@diskzip said in Poor compression support:
@spwolf said in Poor compression support:
@diskzip do you plan to add full MT support for 7z? I think that is a must have if you want people to use your tool over 7z. Otherwise, it is much easier to test 7z with just using 7zFM since we can use 8t cpus properly and it cuts down testing on 20gb files by significant margin (35m vs 140m for this test on my computer).
Or does DiskZip do anything else for 7zip that affects compression, are results different between 7z and diskzip using 7z?
DiskZIP is fully multi-threaded, but the default compression profiles all favor smaller archive size over processing speed, so you would need to edit your compression settings in the DiskZIP GUI to spread usage over more cores. I am escalating this request internally to see where the magic happens here.
Note that with standard 7-Zip (or DiskZIP that consumes standard 7-Zip from a structured DLL interface), you need to limit thread counts to two for obtaining the best results. While LZMA2 has been optimized to spread the workload across multiple threads, doing so always does very substantial harm to the compression savings realized.
DiskZIP does not do anything that affects compression, so results should be 100% identical between 7-Zip and DiskZIP.
i could not find anything, even for smaller sets and less memory needed, it only uses around 20% of my cpu (8t cpu), while with same files and settings 7z would use up to 100%.
With only dictionary changed to d128M, i get 60MB smaller file by using 7zip vs using DiskZip. Something you can try on your end as well, maybe some other setting to be changed?
At that point, PA is smaller by 960M… with big lzma dictionary, it is basically used as dedup. I tested d768m with 7z and difference went down to 30-40M.
It will be interesting to see more results on my test computers once I am back from vacation, in some 15 days. I will be able to get it to work with various settings at that point while right now I can only use laptop and it takes more than 2hrs.
-
@spwolf said in Poor compression support:
end
OK, DiskZIP uses all available CPU cores with a 16 MB dictionary or smaller, and a maximum of 3 CPU cores with a 32 MB dictionary. A 64 MB dictionary or larger results in a core limit of 2.
Apparently these numbers are heuristic limits from a long time ago. Do you think we should move up the dictionary limits somewhat?
-
@spwolf said in Poor compression support:
Memory usage is 11.5x the dictionary size each 2 threads in mt setting for lzma2.
LZMA decoder is simple. But PPMd decoder is complex. LZMA2 is better than LZMA. LZMA2 compression does not replace (supersede) LZMA compression, but LZMA2 is merely an additional “wrapper” around LZMA. With LZMA2, data is split into blocks, but each block is still compressed by “normal” LZMA. Because individula blocks are compressed separately, processing the blocks can be parallelized, which allows for multi-threading. LZMA2 also allows “uncompressed” blocks, to better deal with “already compressed” inputs.
-
@winstongel said in Poor compression support:
@spwolf said in Poor compression support:
Memory usage is 11.5x the dictionary size each 2 threads in mt setting for lzma2.
LZMA decoder is simple. But PPMd decoder is complex. LZMA2 is better than LZMA. LZMA2 compression does not replace (supersede) LZMA compression, but LZMA2 is merely an additional “wrapper” around LZMA. With LZMA2, data is split into blocks, but each block is still compressed by “normal” LZMA. Because individula blocks are compressed separately, processing the blocks can be parallelized, which allows for multi-threading. LZMA2 also allows “uncompressed” blocks, to better deal with “already compressed” inputs.
That’s a dramatic oversimplification. LZMA2 very substantially hurts compression ratios when greater than two threads are used (which is why DiskZIP limits to two threads in virtually all compression scenarios with a non-minuscule [at least, by today’s standards] dictionary size).
-
@diskzip said in Poor compression support:
@spwolf said in Poor compression support:
end
OK, DiskZIP uses all available CPU cores with a 16 MB dictionary or smaller, and a maximum of 3 CPU cores with a 32 MB dictionary. A 64 MB dictionary or larger results in a core limit of 2.
Apparently these numbers are heuristic limits from a long time ago. Do you think we should move up the dictionary limits somewhat?
We’re pushing out an update soon, which adds a new “All” parameter to the multi-threading/hyper-threading setting.
This new “All” parameter will be the default in the regular and high compression profiles. Only the extreme compression profiles will stick to the previous “Yes” setting.
When “All” is selected here, all cores will be used. When “Yes” is selected here, the previous heuristics will apply (2 cores in most scenarios as above).
-
@diskzip said in Poor compression support:
@diskzip said in Poor compression support:
@spwolf said in Poor compression support:
end
OK, DiskZIP uses all available CPU cores with a 16 MB dictionary or smaller, and a maximum of 3 CPU cores with a 32 MB dictionary. A 64 MB dictionary or larger results in a core limit of 2.
Apparently these numbers are heuristic limits from a long time ago. Do you think we should move up the dictionary limits somewhat?
We’re pushing out an update soon, which adds a new “All” parameter to the multi-threading/hyper-threading setting.
This new “All” parameter will be the default in the regular and high compression profiles. Only the extreme compression profiles will stick to the previous “Yes” setting.
When “All” is selected here, all cores will be used. When “Yes” is selected here, the previous heuristics will apply (2 cores in most scenarios as above).
It is a very nice feature. But I am not sure if All should be equivalent to all cores, or to all threads in case of Hyper-Threading and similar.
-
@nikkho Oh, of course All is equal to all available logical cores - including hyperthreaded cores and physical cores.
