Skip to content
Advertisement

Http response decoding behaves differently from Windows to Linux

My java application is downloading some files encoded with either UTF-8 or ISO-8859-1 from a bitbucket repository.
I know in advance the charset used in those files.

My app is running fine on my Windows local machine (I use Eclipse JEE with a Tomcat 9 server).

I have deployed this application on a RedHat virtual machine running the same version of Tomcat and I ended up with unknown characters � replacing these é/è/à/ù/ï.


Here is the code I wrote to get this data:
public static String getFileContentFromRepository(String url) throws IOException {
        HttpURLConnection connection = getConnection(url);
        connection.connect();

        //The following function returns the charset of the file. (Proven to work)
        Charset repoCharset = getCharset();

        InputStream connectionDataStream = connection.getInputStream();
        String connectionStreamData = IOUtils.toString(connectionDataStream, repoCharset);
        
        connection.disconnect();

        return connectionStreamData;
}

How can I get the same results on both platforms?

Advertisement

Answer

The problem came from Linux having its default Charset set to UTF-8.
Adding the argument -Dfile.encoding=ISO-8859-1 to $CATALINA_OPTS in Tomcat’s config solved my problem.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement