Filename character encoding with the zip plugin

Discuss about plugins that don't have a dedicated forum
Post Reply
User avatar
jPV
Posts: 600
Joined: Sat Mar 26, 2016 10:44 am
Location: RNO
Contact:

Filename character encoding with the zip plugin

Post by jPV »

When you create archives with zip.hwp (zip.AddFile/NewName, for instance), it seems to accept only filenames in UTF-8 encoding, for example, when using umlauts. This is problematic, because Amiga/MorphOS programs can't show them correctly and you get typical UTF-8 garbage when listing or unpacking files. The archives do work on Windows though.

For comparison, I tried to pack an archive with the commandline zip command on MorphOS, and then the filenames were readable on MorphOS (by unzip, xad, etc), but the names were garbage on Windows.

But then.. the interesting part is that when I created a new zip archive with umlauted filenames on Windows, names were shown correctly on both Windows and MorphOS!

Sooo, does anyone have any idea what character set Windows is using when it creates the files? Could that be used with the zip.hwp too? Any other solutions/suggestions to get more universal archives? Could the character set be definable on Hollywood?
User avatar
airsoftsoftwair
Posts: 5425
Joined: Fri Feb 12, 2010 2:33 pm
Location: Germany
Contact:

Re: Filename character encoding with the zip plugin

Post by airsoftsoftwair »

It's currently not possible because zip.hwp currently only supports UTF-8. IIRC the zip specification only supports the UTF-8 encoding or no encoding at all. zip.hwp could be made to store entries as whatever the system encoding is on Amiga but those archives would probably only show correct filenames if the standard 8-bit encoding on Windows is CP 1252 which is almost identical to the Amiga's ISO 8859-1 Western European charset. So I'm not sure if it's a good idea because you'll always find systems where you're running in trouble. For example, I don't think names in those archives would show correctly if the Amiga uses an Eastern European charset. AFAIU this will only work if the Windows and Amiga system use a similar default 8-bit charset like ISO 8859-1 and CP 1252 for Western European systems.

I'm also a little surprised that MorphOS's zip handler doesn't handle UTF-8 correctly. Have you reported that to the MorphOS authors? I think this should really be working in 2021...
User avatar
airsoftsoftwair
Posts: 5425
Joined: Fri Feb 12, 2010 2:33 pm
Location: Germany
Contact:

Re: Filename character encoding with the zip plugin

Post by airsoftsoftwair »

Did some more research on this and it looks like the legacy encoding for zip files is CP437, i.e. the old DOS charset. UTF-8 support was introduced later which is probably why Amiga unarchivers don't handle it correctly. I've now added support for storing entries as CP437 which should solve this problem.

Code: Select all

- New: zip.AddFile(), zip.RenameFile(), and zip.AddDirectory() support a new table tag named "Encoding"
  now; this can be used to set the charset encoding to be used; this defaults to #ZIP_FL_ENC_UTF_8 but
  you can also set it to #ZIP_FL_ENC_CP437 which is the traditional encoding used in ZIP files so you
  should use this for maximum compatibility with older platforms
Post Reply