Iframe Srcdoc Equals UTF-8 Issue Primer Tutorial

Iframe Srcdoc Equals UTF-8 Issue Primer Tutorial

Iframe Srcdoc Equals UTF-8 Issue Primer Tutorial

Do you remember how with Javascript document.querySelectorAll Client Pre-emptive Iframe Tutorial, recently, we said …

Why can’t we manage this new functionality in the one pass through the “onload” event logic? Well, any self-respecting webpage content will contain both apostrophe and double quote characters (let alone line feeds and carriage returns) ( but we can if we can get to a Javascript DOM statement like document.getElementById(‘ifsd’).srcdoc=atob((” + ioissrc).split(‘;base64,’)[1]).replace(‘</bo’ + ‘dy>’, ‘ <style> ‘ + selectorplusis + ‘</style> </bo’ + ‘dy>’); )

? Well, that is true, initializing an iframe’s srcdoc attribute at the same time as the iframe is created can be tricky for HTML data of any complexity. Recently, though, we realized that the …


document.getElementById(‘ifsd’).srcdoc=atob((” + ioissrc).split(‘;base64,’)[1]).replace(‘</bo’ + ‘dy>’, ‘ <style> ‘ + selectorplusis + ‘</style> </bo’ + ‘dy>’);

… can be problematic, too, with UTF-8 (unicode) data (perhaps to do with UTF-16 surrogate pairs (we are not sure)). Of course, discovering this during that recent web application “Testing out document.querySelectorAll” in the blog posting thread owning the blog post above, as well as Javascript document.querySelectorAll Textarea Placeholder Tutorial‘s penchant for using as an absolute URL (thanks Wikipedia) …


HTTP://www.wikipedia.org/wiki/Einstein

… we discovered it outputting strings like …


Kingdom of Württemberg

… rather than, the better …


Kingdom of Württemberg

… leading us to be led down an “irrelevant PHP file_get_contents encoding problem garden path” until we undertook today’s “proof of concept” fgc_utf_fix.php‘s live run simplifying (and thus paring down) the methodologies of that “Testing out document.querySelectorAll” project and decoupling it and putting it back together, plus a good hour of logical calm reasoning, led us to deduct that it was not file_get_contents that was the problem but that [iframe].srcdoc=[HTMLcontent] causing the issue when that [HTMLcontent] contains UTF-8 unicode data. That makes sense. Not all UTF-8 data fits with an initialization statement designed for character data that is made up of one byte per character, so there could be mis-mappings doing this.

But then we stumbled upon the excellent Function to fix ut8 special characters displayed as 2 characters (utf-8 interpreted as ISO-8859-1 or Windows-1252) and adapted its PHP code into a Javascript function equivalent that could help put “Humpty Dumpty back together again”. Cute, huh?!

If this was interesting you may be interested in this too.

This entry was posted in eLearning, Tutorials and tagged , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *