Enhancement
Critical
Major
Detail
Detail
Detail
#25035
StructuredTextControl has problems with special characters in image filenames
The StructuredTextControl has at least the following problems with images that have special characters in their names:
- Characters encoded in URLs cause the image to be deleted immediately after upload. The cause is the code that deletes images that are no longer used. This is done by comparing the name of each image with the names of the used images. But the names of the used images are URL encoded. Thus Koala - 128px.jpeg is compared with Koala%20-%20128px.jpeg. Thus the entry is not found and the image is deleted as "no longer used".
- If this is fixed, such images will still not be displayed. The URL to the image is then Koala%2520-%2520128px.jpeg. So the percent signs in the encoded name were encoded again. So the name is encoded twice. Probably that's why the image is not found.
- Characters that are not included in ISO-8859-1 cause a problem when sending the response to the client after uploading the image:
java.io.CharConversionException: Not an ISO 8859-1 character: € at javax.servlet.ServletOutputStream.print(ServletOutputStream.java:130) at com.top_logic.layout.wysiwyg.ui.StructuredTextControl.sendResponse(StructuredTextControl.java:735) at com.top_logic.layout.wysiwyg.ui.StructuredTextControl.uploadFile(StructuredTextControl.java:635)
The problem here is that the StructuredTextControl correctly sets that the response should be encoded in UTF-8. But the implementation of javax.servlet.ServletOutputStream used by the Jetty apparently ignores the character encoding, and uses ISO-8859-1 instead. (To understand the source code: Java Characters are internally encoded in UTF-16. This is equivalent to ISO-8859-1 as long as the upper byte is not set).
{{#!java
public void print(String s) throws IOException {
if (s==null) s="null";
int len = s.length();
for (int i = 0; i < len; i++) {
char c = s.charAt (i);
//
// XXX NOTE: This is clearly incorrect for many strings,
// but is the only consistent approach within the current
// servlet framework. It must suffice until servlet output
// streams properly encode their output.
//
if ((c & 0xff00) != 0) { // high order byte must be zero
String errMsg = lStrings.getString("err.not_iso8859_1");
Object[] errArgs = new Object[1];
errArgs[0] = Character.valueOf(c);
errMsg = MessageFormat.format(errMsg, errArgs);
throw new CharConversionException(errMsg);
}
write (c);
}
}
}}}
Enhancement
- Before comparing whether images can be deleted, the image names are now decoded.
- The decoding must not be done after com.top_logic.layout.wysiwyg.ui.StructuredTextControl.linkImageSource(String, Element) as suggested in the patch, because there the encoded string is already set as src value in the image, but where the decoded variant must also be taken. Therefore the decoding is now already done in com.top_logic.layout.wysiwyg.ui.StructuredTextControl.getImageID(String, String).
- In com.top_logic.layout.wysiwyg.ui.StructuredTextControl.sendResponse(DisplayContext, FileItemBinaryData, String) the PrintWriter is now used, because it can handle strings, while the OutputStream only handles binary data and then ignores the encoding.
Test
In the documentation, insert images with special characters e.g. ÄÖÜ, €, spaces etc. The image must also be preserved after saving.
Related tickets
Came up in the context of #19000