Encoding filter for Java web applications

By tompson

Dealing correctly with encodings is one of the most important things in Java web applications (if not even in Java). The best way to avoid troubles with different encodings is to use only one encoding throughout the entire web application. The encoding of choice is UTF-8 which is able to deal with almost every known written language.

The first thing you have to ensure is that every content delivery from your server tells the client (browser) the correct encoding to use. You do this by setting the meta header field in your HTML pages:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

and define the contentType in the page directive of your JSP

<%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%>

The attribute pageEncoding tells the JSP compiler the encoding in which the JSP is stored on the disk.

Now your content should be delivered correctly to the client. The next important thing to do is to handle data the client sends to you with the correct encoding. Most application containers use a hard coded standard encoding do decode request data. For the Apache Tomcat the default encoding is iso-8859-1.

If you have an input form that is delivered in utf-8 the browser will submit the form data in the same encoding. Now you get a problem if you call request.getParameter in your application and the parameter contains special characters. To tell the application server the right encoding to decode your request you have to set request.setCharacterEncoding before accessing a request parameter the first time.

Therefore you should use a Filter like shown in the following code:

public class EncofingFilter implements Filter {

  private String encoding = "utf-8";

  public void doFilter(ServletRequest request,
      ServletResponse response, FilterChain filterChain)
      throws IOException, ServletException {

    request.setCharacterEncoding(encoding);
    filterChain.doFilter(request, response);
  }

  public void init(FilterConfig filterConfig)
	           throws ServletException {
    String encodingParam = filterConfig
              .getInitParameter("encoding");
    if (encodingParam != null) {
      encoding = encodingParam;
    }
  }

  public void destroy() {
    // nothing todo
  }
}

You have to configure this filter in your web.xml to be executed before every request:

<filter>
  <filter-name>EncodingFilter</filter-name>
    <filter-class>
      net.einwaller.filters.EncodingFilter
    </filter-class>
    <init-param>
      <param-name>encoding</param-name>
      <param-value>UTF-8</param-value>
    </init-param>
</filter>
<filter-mapping>
  <filter-name>EncodingFilter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>

If you mind all these tips your web application will not have any encoding problems – EXCEPT with GET requests. In GET requests the parameters are encoded into the URL not in the body of the request like in POST requests. These parameters are handled different by the application container.

Tomcat again uses iso-8859-1 per default to decode those GET parameters regardless of which encoding is set in the request by your filter. To change this behavior you have to change connector configuration inside the server.xml of your Tomcat as described here. The attribute URIEncoding defines a fixed encoding the server uses for every request. What we want to use is the attribute useBodyEncodingForURI which tells the server to use the encoding defined for the body for the GET parameters too.

13 Responses to “Encoding filter for Java web applications”

  1. Klaus Says:

    A very good summary of tips to get UTF-8 to work with Tomcat !
    Additionally I use the following attribute in HTML form tags to ensure UTF-8 encoding of form data:
    accept-charset=”UTF-8″

  2. Mirko Says:

    Hello,
    first, thank you for your good article. But I have a problem with GET parmeter encoding. POST are OK. If I use for example URL get request with Slovak letter like šč.. it is badly encoded. I use Tomcat 5.5. Thanx in advance.

  3. tompson Says:

    @Mirko

    did you write the request yourself into the address bar of your browser? which browser do you use?

  4. Idetrorce Says:

    very interesting, but I don’t agree with you
    Idetrorce

  5. Melina Says:

    very interesting. i’m adding in RSS Reader

  6. UTF-8 with Tomcat | metrixon.de Says:

    [...] Thomas Einwaller’s Blog [...]

  7. Shamim Says:

    Very good article

  8. Jordi Pradel Says:

    Very interesting article!

    Tried with Tomcat 6.0 and it worked. In fact, I’m using a Spring charset filter that does basically what you describe here.

    But I still have one small problem: GET request from forms work perfectlly, but the very same request typed directly, by hand, in the browser, does not work: Again, Tomcat doesn’t decode the request parameters properly. Any ideas?

  9. tompson Says:

    @Jordi: if you type in the URL into your browser the browser does not send any information about the encoding to the server – this is the reason why useBodyEncodingForURI has no effect in this case

    the only solution is to use URIEncoding to set the encoding to one specific encoding – the problem is that different browsers use different encodings (depending on the operating system and language settings)

  10. John Says:

    @ Mirko

    I’m sure you already found a solution – for all of us not knowing how to deal with that, the following URL contains the solution Mirko was lookin for:

    http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

  11. Ruifelfix Says:

    первый пост – жесть

  12. Bill Bartmann Says:

    Cool site, love the info.

  13. Demi Moore Says:

    “You cannot escape the responsibility of tomorrow by evading it today.” – Abraham Lincoln
    Demi Moore

Leave a Reply