Due to different formats of storing characters in the database, sometimes you may encounter problem with strange characters (like —) appearing in your WordPress posts. This can especially happen when moving the site to a different server, a process during which your database needs to be exported and imported.
There are several ready solutions to this problem like modifying the character set before and after the database restore as described in this article.
There is also a ready WordPress plugin called bbWP2UTF8, that will try the conversion to UTF charset.
If that did not help you have one more solution, try UTF8 Sanitize plugin. This plugin will exchange the weird characters with their valid counterparts on the fly. However this will also slow down your blog, as every time a page is loaded it will need to be parsed and changed by the plugin.
I have created a modification of the plugin which will add an option to change all posts in the database permanently.
Download Modified UTF8-Sanitize Plugin
To use it first check the first option 'Outputting posts' and try your site. If everything looks OK now, select the third option, 'Process all posts on saving options' and update the options. This will go through all the posts in the database and apply the changes. After this you may freely deactivate the plugin. Make sure you backup your database before trying anything!
Continue reading:
Posted in: WordPress
TAGS:character encoding problem, character encoding solution, charcter encoding, encoding problem solution, encoding strange characters, prelovac utf characters, solution problem text endcoding, strange characters encoding, ultimate seo characters, utf character, utf charactor, weird characters encoding, weird characters title tag, weird characters wordpress comments, weird chars website, wordpress character encoding problems, wordpress character problem, wordpress characters, wordpress help, wordpress rss utf, wordpress utf, wordpress utf problem, wordpress weird characters
Hi! My name is Vladimir Prelovac. I am a computer engineer by profession and an adventurer by state of mind.
15 Comments
Hello Vladimir,
In many topics on our site we have a problem with character sets. However, in new written topics there's no problem at all. This might prove that our database is set correctly.
We had a database crash last week (caused by a ' nice ' security-plugin) and had to import a backup. But from that moment strange characters, like FFFD-icons appeared on our site.
Example with FFFD-icons (this is Dutch ;))
�We kwamen uit de informatiebesprekingen. Bos en Balkenende zeiden tegen mij: �Jan, ga jij maar als eerste naar buiten�.
If a word contains an accent and placed between quote signs, for instance: "Héllo!", this word appears as: Hllo .
Is your Modified UTF8-Sanitize Plugin worthfull for us?
By the way: since we are operating with a new theme (OneRoom) search engines are no longer able to find us, despite our site ( detelegravin.nl ) is indexed by Google.
Hopefully you can help with this problem too. If so, I will put a 125 px by 170 px banner of your book on the frontpage of the site.
I can only suggest you try it out - it won't hurt
Thanks for this plugin. I tried everything and this saved me a bunch of time. You are my hero.
Great Post, i dindt worked in my site your marvellous plugin, but i great to you for the tip: "plugin called bbWP2UTF8, that will try the conversion to UTF charset." This plugin, it be working correct! Many Thanks Vladimir! Many month with several blog with problem, and i just fix with your help!
You are welcome :)
ok, i will try to explain in spanglish my problem:
1) when i post to wordpress inside wordpress (mean from the administrator) any accented letter is an accented letter. example: á is á, ¿ is ¿ ...and so. [write so in database]
2) when i post using windows live writer an á is á - ¿ is ¿, this i can see when i go to edit the article. [write acute, rdaquo, iquest... in database]
3) both 1 and 2 shows correct in browser. but not when i edit the article. so my database can have titles with condition 1 or 2.
4) the interesting thing is that happens only with the title of the article. the content for both 1 and 2 is correctly written.
5) i did notice it because my post goes to twitter (shows correct) and then goes to facebook (shows incorrect with acute, iquest, rdaquo ...)
6) i have checked my database tables. wordpress tables are utf8_general, collate for all the database is latin. yesterday i did make a new database with tables utf8 and collate utf8 and the problems persist. so, it is not a question of database format.
7) i did deactivate all plugins and the problem persist...
8) information from my new dedicated server people: I have enabled suPHP. I have also installed the latest version of PHP (5.2.10) and libXML (2.7.3), provided by cPanel.
9) i have tried changing my theme and the problem persist..
Now what can i do to resolve this? I will appreciat a lot your answer. TY
Just wanted to say thanks for the excellent plugin! Worked like a charm and saved me quite a bit of time and frustration. Most appreciated.
You are welcome, it's amazing to save :)
Vladimir, thank you very, very much for your work on this. Thorsten's excellent recommendation would have helped me AVOID this problem, but since I was already HAVING the problem and couldn't go back, your modified plugin saved. None of my other attempts (MySQL search and replace, for example) worked and you saved me!!!
The background, why those characters appear ("you may encounter problem with strange characters (like —) appearing in your WordPress posts") has a simple solution: Characters, that are encoded as UTF8 in a database are somehow being "reencoded" to UTF8.
This happens, for example, if you transfer a MySql 4.0 database, which is set to latin1 but contains data from Wordpress in UTF8(!), to a MySQL5 database. The dump (generated with mysqldump) contains the correct UTF8 encoded characters, but the import uses the wrong client connection (latin1) because no information of the UTF8 characters are found in the dump. In the import process the dump is beeing converted to UTF8 again - and you get the weired characters.
Solution: Tell the import process to use the right client connection, i.e. UTF8. This is made with the statement SET NAMES UTF-8, or in the mysqldump file: insert at the beginning.
/*!40101 SET NAMES utf8 */;
No obscure Wordpress plugins are beeing needed, e.g. UTF8-Sanitize Plugin.
Thorsten
You're a genuis!!
You saved my day. The original plugin didn't work, this worked like a charm :)
awesome worked for me
thanks
I've tried the above mentioned plugins for version 2.7 and none worked, however UTF-8 Database Converter succeded in converting funny charcters back to æøå. See
http://wordpress.org/extend/plugins/utf-8-database-converter/installation/
Yours
Vladimir - I was wondering if there was anything terribly wrong with simply commenting out:
define('DB_CHARSET', 'utf8');
in the wp-config.php file?
It seems to work for me!
Pozdrav.
Htio sam da na drugom sajtu koji koristi charset=iso-8859-2 prikažem naslove posljednja tri posta.
Uradio sam to tako što se eksterno poveže na moju database i uzme post_title.
Problem je što je encoding različit pa se umjesto šđčćž pojavljuju neki simboli.
Znaš li kako to da podesim? Pokušavao sam pomoću iconv() i mb_convert_encoding() ali nije uspjelo.
Hvala unaprijed.