Ultimate solution to weird UTF character encoding problem

Due to different formats of storing characters in the database, sometimes you may encounter problem with strange characters (like —) appearing in your WordPress posts. This can especially happen when moving the site to a different server, a process during which your database needs to be exported and imported.

There are several ready solutions to this problem like modifying the character set before and after the database restore as described in this article.

There is also a ready WordPress plugin called bbWP2UTF8, that will try the conversion to UTF charset.

If that did not help you have one more solution, try UTF8 Sanitize plugin. This plugin will exchange the weird characters with their valid counterparts on the fly. However this will also slow down your blog, as every time a page is loaded it will need to be parsed and changed by the plugin.

I have created a modification of the plugin which will add an option to change all posts in the database permanently.

Download Modified UTF8-Sanitize Plugin

To use it first check the first option 'Outputting posts' and try your site. If everything looks OK now, select the third option, 'Process all posts on saving options' and update the options. This will go through all the posts in the database and apply the changes. After this you may freely deactivate the plugin. Make sure you backup your database before trying anything!


More like this:


Posted in: WordPress
TAGS:, , , , , , , , , , , , , , , , , , , , , ,
Both comments and trackbacks are currently closed.

24 Comments

  1. Feb 22nd, 2012

    Worked perfectly, was having issues after a site migration, pages weren't always loading properly, garbled text on existing posts and pages, RSS feed was all messed up - this seems to have cleared the lot up, excellent work!

  2. Peter Dontak
    Jan 3rd, 2012

    some utf-8 tips can be found here:
    http://www.utf-8.de

  3. marion
    Sep 29th, 2011

    Well I tried your plugin and it works - almost. It went and changed all the strange characters like "â€" as promised but instead of changing to the proper character I now have characters like "’"

    • marion
      Sep 29th, 2011

      meant "& r s q u o ;"

  4. Aug 20th, 2010

    I just found a solution and it works. I just followed these instructions and my blog was normal again. Yay!!!
    http://trevinchow.com/blog/2007/09/19/strange-characters-after-wordpress-upgrade/

  5. Aug 20th, 2010

    Hi,

    I installed your plugin and it doesn't seem to be fixing any of the broken characters. I hit the first option and then reloaded a blog post and the characters were still there. Any help would be great!

    Josh

    • Shahar
      Aug 22nd, 2010

      Try exporting and re-importing your database via phpMyAdmin and setting 'Character set of the file:' to utf-8. Worked for me :)

  6. Mar 27th, 2010

    Many thanks for a plugin that has solved a big headache while moving my server.

    (and to commenter above, the dump file did have /*!40101 SET NAMES utf8 */ in it, but that didn't work. The weird characters still needed cleaning up)

  7. Ploeter
    Oct 9th, 2009

    Hello Vladimir,

    In many topics on our site we have a problem with character sets. However, in new written topics there's no problem at all. This might prove that our database is set correctly.
    We had a database crash last week (caused by a ' nice ' security-plugin) and had to import a backup. But from that moment strange characters, like FFFD-icons appeared on our site.

    Example with FFFD-icons (this is Dutch ;))
    �We kwamen uit de informatiebesprekingen. Bos en Balkenende zeiden tegen mij: �Jan, ga jij maar als eerste naar buiten�.
    If a word contains an accent and placed between quote signs, for instance: "Héllo!", this word appears as: Hllo .

    Is your Modified UTF8-Sanitize Plugin worthfull for us?

    By the way: since we are operating with a new theme (OneRoom) search engines are no longer able to find us, despite our site ( detelegravin.nl ) is indexed by Google.
    Hopefully you can help with this problem too. If so, I will put a 125 px by 170 px banner of your book on the frontpage of the site.

    • Oct 12th, 2009

      I can only suggest you try it out - it won't hurt

  8. Sep 29th, 2009

    Thanks for this plugin. I tried everything and this saved me a bunch of time. You are my hero.

  9. Sep 18th, 2009

    Great Post, i dindt worked in my site your marvellous plugin, but i great to you for the tip: "plugin called bbWP2UTF8, that will try the conversion to UTF charset." This plugin, it be working correct! Many Thanks Vladimir! Many month with several blog with problem, and i just fix with your help!

    • Sep 18th, 2009

      You are welcome :)

  10. Aug 4th, 2009

    ok, i will try to explain in spanglish my problem:
    1) when i post to wordpress inside wordpress (mean from the administrator) any accented letter is an accented letter. example: á is á, ¿ is ¿ ...and so. [write so in database]
    2) when i post using windows live writer an á is á - ¿ is &iquest, this i can see when i go to edit the article. [write acute, rdaquo, iquest... in database]
    3) both 1 and 2 shows correct in browser. but not when i edit the article. so my database can have titles with condition 1 or 2.
    4) the interesting thing is that happens only with the title of the article. the content for both 1 and 2 is correctly written.
    5) i did notice it because my post goes to twitter (shows correct) and then goes to facebook (shows incorrect with acute, iquest, rdaquo ...)
    6) i have checked my database tables. wordpress tables are utf8_general, collate for all the database is latin. yesterday i did make a new database with tables utf8 and collate utf8 and the problems persist. so, it is not a question of database format.
    7) i did deactivate all plugins and the problem persist...
    8) information from my new dedicated server people: I have enabled suPHP. I have also installed the latest version of PHP (5.2.10) and libXML (2.7.3), provided by cPanel.
    9) i have tried changing my theme and the problem persist..

    Now what can i do to resolve this? I will appreciat a lot your answer. TY

  11. Jul 15th, 2009

    Just wanted to say thanks for the excellent plugin! Worked like a charm and saved me quite a bit of time and frustration. Most appreciated.

  12. Jul 14th, 2009

    You are welcome, it's amazing to save :)

  13. Sparkybarkalot
    Jul 14th, 2009

    Vladimir, thank you very, very much for your work on this. Thorsten's excellent recommendation would have helped me AVOID this problem, but since I was already HAVING the problem and couldn't go back, your modified plugin saved. None of my other attempts (MySQL search and replace, for example) worked and you saved me!!!

  14. Thorsten Albrecht
    Jul 7th, 2009

    The background, why those characters appear ("you may encounter problem with strange characters (like —) appearing in your WordPress posts") has a simple solution: Characters, that are encoded as UTF8 in a database are somehow being "reencoded" to UTF8.

    This happens, for example, if you transfer a MySql 4.0 database, which is set to latin1 but contains data from WordPress in UTF8(!), to a MySQL5 database. The dump (generated with mysqldump) contains the correct UTF8 encoded characters, but the import uses the wrong client connection (latin1) because no information of the UTF8 characters are found in the dump. In the import process the dump is beeing converted to UTF8 again - and you get the weired characters.

    Solution: Tell the import process to use the right client connection, i.e. UTF8. This is made with the statement SET NAMES UTF-8, or in the mysqldump file: insert at the beginning.
    /*!40101 SET NAMES utf8 */;

    No obscure WordPress plugins are beeing needed, e.g. UTF8-Sanitize Plugin.

    Thorsten

    • Shahar
      Apr 13th, 2010

      I solved it by importing via phpMyAdmin and setting 'Character set of the file:' to utf-8

  15. Apr 9th, 2009

    You're a genuis!!
    You saved my day. The original plugin didn't work, this worked like a charm :)

  16. lm
    Mar 3rd, 2009

    awesome worked for me
    thanks

  17. Lars
    Dec 12th, 2008

    I've tried the above mentioned plugins for version 2.7 and none worked, however UTF-8 Database Converter succeded in converting funny charcters back to æøå. See
    http://wordpress.org/extend/plugins/utf-8-database-converter/installation/

    Yours

  18. DaveM
    Dec 6th, 2008

    Vladimir - I was wondering if there was anything terribly wrong with simply commenting out:

    define('DB_CHARSET', 'utf8');

    in the wp-config.php file?

    It seems to work for me!

  19. John Locke
    Nov 30th, 2008

    Pozdrav.
    Htio sam da na drugom sajtu koji koristi charset=iso-8859-2 prikažem naslove posljednja tri posta.
    Uradio sam to tako što se eksterno poveže na moju database i uzme post_title.
    Problem je što je encoding različit pa se umjesto šđčćž pojavljuju neki simboli.
    Znaš li kako to da podesim? Pokušavao sam pomoću iconv() i mb_convert_encoding() ali nije uspjelo.
    Hvala unaprijed.

  • ManageWP