Ultimate solution to weird UTF character encoding problem

Due to different formats of storing characters in the database, sometimes you may encounter problem with strange characters (like —) appearing in your WordPress posts. This can especially happen when moving the site to a different server, a process during which your database needs to be exported and imported.

There are several ready solutions to this problem like modifying the character set before and after the database restore as described in this article.

There is also a ready WordPress plugin called bbWP2UTF8, that will try the conversion to UTF charset.

If that did not help you have one more solution, try UTF8 Sanitize plugin. This plugin will exchange the weird characters with their valid counterparts on the fly. However this will also slow down your blog, as every time a page is loaded it will need to be parsed and changed by the plugin.

I have created a modification of the plugin which will add an option to change all posts in the database permanently.

Download Modified UTF8-Sanitize Plugin

To use it first check the first option 'Outputting posts' and try your site. If everything looks OK now, select the third option, 'Process all posts on saving options' and update the options. This will go through all the posts in the database and apply the changes. After this you may freely deactivate the plugin. Make sure you backup your database before trying anything!

Continue reading:


Posted in: WordPress
TAGS:, , , , , , , , , , , , , , , , , , , , , ,
Leave a comment

15 Comments

  1. Ploeter
    5 weeks ago

    Hello Vladimir,

    In many topics on our site we have a problem with character sets. However, in new written topics there's no problem at all. This might prove that our database is set correctly.
    We had a database crash last week (caused by a ' nice ' security-plugin) and had to import a backup. But from that moment strange characters, like FFFD-icons appeared on our site.

    Example with FFFD-icons (this is Dutch ;))
    �We kwamen uit de informatiebesprekingen. Bos en Balkenende zeiden tegen mij: �Jan, ga jij maar als eerste naar buiten�.
    If a word contains an accent and placed between quote signs, for instance: "Héllo!", this word appears as: Hllo .

    Is your Modified UTF8-Sanitize Plugin worthfull for us?

    By the way: since we are operating with a new theme (OneRoom) search engines are no longer able to find us, despite our site ( detelegravin.nl ) is indexed by Google.
    Hopefully you can help with this problem too. If so, I will put a 125 px by 170 px banner of your book on the frontpage of the site.

    • 4 weeks ago

      I can only suggest you try it out - it won't hurt

  2. Shawn Swanson
    Sep 29th, 2009

    Thanks for this plugin. I tried everything and this saved me a bunch of time. You are my hero.

  3. Miguel A Manrique
    Sep 18th, 2009

    Great Post, i dindt worked in my site your marvellous plugin, but i great to you for the tip: "plugin called bbWP2UTF8, that will try the conversion to UTF charset." This plugin, it be working correct! Many Thanks Vladimir! Many month with several blog with problem, and i just fix with your help!

  4. ErnestoJustiniano.org
    Aug 4th, 2009

    ok, i will try to explain in spanglish my problem:
    1) when i post to wordpress inside wordpress (mean from the administrator) any accented letter is an accented letter. example: á is á, ¿ is ¿ ...and so. [write so in database]
    2) when i post using windows live writer an á is á - ¿ is &iquest, this i can see when i go to edit the article. [write acute, rdaquo, iquest... in database]
    3) both 1 and 2 shows correct in browser. but not when i edit the article. so my database can have titles with condition 1 or 2.
    4) the interesting thing is that happens only with the title of the article. the content for both 1 and 2 is correctly written.
    5) i did notice it because my post goes to twitter (shows correct) and then goes to facebook (shows incorrect with acute, iquest, rdaquo ...)
    6) i have checked my database tables. wordpress tables are utf8_general, collate for all the database is latin. yesterday i did make a new database with tables utf8 and collate utf8 and the problems persist. so, it is not a question of database format.
    7) i did deactivate all plugins and the problem persist...
    8) information from my new dedicated server people: I have enabled suPHP. I have also installed the latest version of PHP (5.2.10) and libXML (2.7.3), provided by cPanel.
    9) i have tried changing my theme and the problem persist..

    Now what can i do to resolve this? I will appreciat a lot your answer. TY

  5. David Ma
    Jul 15th, 2009

    Just wanted to say thanks for the excellent plugin! Worked like a charm and saved me quite a bit of time and frustration. Most appreciated.

  6. Jul 14th, 2009

    You are welcome, it's amazing to save :)

  7. Sparkybarkalot
    Jul 14th, 2009

    Vladimir, thank you very, very much for your work on this. Thorsten's excellent recommendation would have helped me AVOID this problem, but since I was already HAVING the problem and couldn't go back, your modified plugin saved. None of my other attempts (MySQL search and replace, for example) worked and you saved me!!!

  8. Thorsten Albrecht
    Jul 7th, 2009

    The background, why those characters appear ("you may encounter problem with strange characters (like —) appearing in your WordPress posts") has a simple solution: Characters, that are encoded as UTF8 in a database are somehow being "reencoded" to UTF8.

    This happens, for example, if you transfer a MySql 4.0 database, which is set to latin1 but contains data from Wordpress in UTF8(!), to a MySQL5 database. The dump (generated with mysqldump) contains the correct UTF8 encoded characters, but the import uses the wrong client connection (latin1) because no information of the UTF8 characters are found in the dump. In the import process the dump is beeing converted to UTF8 again - and you get the weired characters.

    Solution: Tell the import process to use the right client connection, i.e. UTF8. This is made with the statement SET NAMES UTF-8, or in the mysqldump file: insert at the beginning.
    /*!40101 SET NAMES utf8 */;

    No obscure Wordpress plugins are beeing needed, e.g. UTF8-Sanitize Plugin.

    Thorsten

  9. Joel
    Apr 9th, 2009

    You're a genuis!!
    You saved my day. The original plugin didn't work, this worked like a charm :)

  10. lm
    Mar 3rd, 2009

    awesome worked for me
    thanks

  11. Lars
    Dec 12th, 2008

    I've tried the above mentioned plugins for version 2.7 and none worked, however UTF-8 Database Converter succeded in converting funny charcters back to æøå. See
    http://wordpress.org/extend/plugins/utf-8-database-converter/installation/

    Yours

  12. DaveM
    Dec 6th, 2008

    Vladimir - I was wondering if there was anything terribly wrong with simply commenting out:

    define('DB_CHARSET', 'utf8');

    in the wp-config.php file?

    It seems to work for me!

  13. John Locke
    Nov 30th, 2008

    Pozdrav.
    Htio sam da na drugom sajtu koji koristi charset=iso-8859-2 prikažem naslove posljednja tri posta.
    Uradio sam to tako što se eksterno poveže na moju database i uzme post_title.
    Problem je što je encoding različit pa se umjesto šđčćž pojavljuju neki simboli.
    Znaš li kako to da podesim? Pokušavao sam pomoću iconv() i mb_convert_encoding() ali nije uspjelo.
    Hvala unaprijed.

Have your say

Your email is never published nor shared. Required fields are marked *

*
*

This site rewards regular commentators with do-follows links to their site.

Subscribe without commenting

About Vladimir

vladimir prelovac Hi! My name is Vladimir Prelovac. I am a computer engineer by profession and an adventurer by state of mind.

"I would love to change the world, I just don't have the source code yet."

Books by Vladimir

WordPress Plugin Devleopment Book WordPress Plugin Development: Beginner's Guide

Published by Packt Publishing, available online through Amazon. Click the image for more information.

Consulting Services

Professional WordPress solutions based on custom developed plugins and themes

Expert on-site WordPress SEO consulting and an 'out-of-the-box thinking' approach to problems