Tool does not handle special characters in filename #2
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The tool does not work when the file name has special characters, e.g. a comma, or German Umlauts.
Hey! Thank you for the report :)
Can you give me an example file I can use for my tests? Thanks!
Hey, sure.
Sorry, I assumed it was obvious, but if you don't have access to a German keyboard, it may not be so trivial.
Attached, please find 3 files with umlauts and other special characters in their name.
The SRT files are empty, the error still reproduces.
The zip file also contains the log of me running Get-ChildItem in powershell and piping the names to subscleaner.
By the way, I was mistaken, looks like commas in filenames work.
Thanks,
schalli110
repro.zip
That is perfect, thank you very much! I'll work on this and open a PR once it's ready 🫡
assigned to @rogs
mentioned in merge request !2
mentioned in commit
cf619272d3
@schalli110 Looking a little deeper into this issue and your log, it looks like this is a problem with Windows, since I can't reproduce it in Linux or MacOS.
I have pushed a possible fix, but please let me know if this doesn't fix it!
Hey Roger,
you're right, this appears to be some sort of Windows or Python stupidity.
On cmd.exe, I can change the codepage to UTF8 and also start subscleaner with
python -X utf8 subscleaner.py
, then it passes Umlauts correctly.Unfortunately, Powershell always seems to add the BOM to a UTF8 string, even when I tell it not to, and Python does not like that.
A possible workaround would be to be able to pass the filename as a command line parameter instead of through a pipe.
That seems to work without codepage shennanigans both in cmd.exe and powershell, e.g.:
Get-ChildItem .\Tür.srt | % { python subscleaner.py $_.FullName }
I dug around a bit more and found this:
https://bugs.python.org/issue21927
The workaround is to teach Powershell to not write the BOM:
Then subscleaner works as expected:
returns
That's perfect. Thank you for debugging!