Jump to content

Htmlentities() Versus Markup Possibility?


son
 Share

Recommended Posts

I have built a very simple CMS system, so my friend can use a user-friendly interface to update the content on his four web pages (he is not very it-savvy). So far, all working fine. Now, a new problem arose. He would like to insert hyperlinks and lists. Until now I used for main text:$heading = htmlentities($row['heading']);echo "<h1>" . $heading . "</h1>";$text = $row['text'];echo htmlentities($text);This allows certain French characters for example to display correctly. If he enters a link as:<a href="www.external-domain.co.uk" title="Go to external domain">link text</a> the htmlentities function will display the code as such in browser display and the source code shows something like:<a href="www.external-domain.co.uk" title="Go to external domain">link text</a>For the more experienced programmer: How would you allow a user to correctly mark up text? As my friend uses a lot of special characters I cannot really remove the htmlentities () function. Any help is kindly appreciated.Thanks,Son

Link to comment
Share on other sites

What Character set are you using for the site? UTF-8 should be able to handle the French issue for you.
I have it at UTF-8, but when I tried it with £ sign for example (I know this one is not French, but also being used) it only displays correctly with htmlentities function....Son
Link to comment
Share on other sites

I have it at UTF-8, but when I tried it with £ sign for example (I know this one is not French, but also being used) it only displays correctly with htmlentities function....Son
Something is not set to UTF-8 then... either your DB, or your HTTP header, or meta tag, or the PHP file is not saved as UTF-8... SOMETHING is not using UTF-8, and it's taking down everything else with it.When you allow markup, I'd suggest two things:1. Use a WYSIWYG plug-able editor, like TinyMCE to reduce the possibility of poor coding (which is very possible when the person is not tech savvy).2. On the server, validate the input against the complete or partial XHTML DTD or Schema, wrapping it inside your own element if you have to. This will make sure that even if he decides to play programmer, he won't be able to pollute the final markup with invalid code.
Link to comment
Share on other sites

2. On the server, validate the input against the complete or partial XHTML DTD or Schema, wrapping it inside your own element if you have to. This will make sure that even if he decides to play programmer, he won't be able to pollute the final markup with invalid code.
Two things:All setting should be correctly set to UTF-8, MySQL charset is UTF-8-Unicode, MySQL connection collation is utf8_unicode_ci, the relevant fields in table show utf8_general_ci. Using Dreamweaver to produce my web pages I could verify that the php page is saved with default encoding, which is unicode (UTF-8). Header is as:<?php header('Content-Type: text/html; charset=utf-8'); ?>Metatag as:<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />I have similar issue with different website and I thought that this is why I have to use the htmlentities () function. Another friend claimed that I have to use charset=iso-8859-1 instead of UTF-8, but I am getting all confused now... When you are saying that when all you listed is set to UTF a pound sign should show properly why is it not then? Is there something else I could need to check?SonForgot to ask what the validation mentioned above means. Did not understand what you meant...
Link to comment
Share on other sites

You could always use str_replace to replace the entities you want with characters after using htmlentities.
Is this the 'normal' way of doing this? There must be a lot of website developers come across the same issues I am facing. Would you advice to use the htmlentities function for all date retrieved from database (the data entry I cannot control)? I used it for all fields retrieved, including the meta tag fields. I am so new to this, but want to make sure it is up to scratch and does not throw up unnecessary errors for my friend (and for website owner of other website I built with similar functionality). How would you build a system where another user can enter the data to be displayed on website?Son
Link to comment
Share on other sites

For my content mangagement system I give the users either a single-line text field or an HTML editor, depending on what they're editing. Since the HTML editor produces HTML content, I don't transform it further, I'm assuming that the people editing the site have some reasonable understanding about what they're doing. If they don't, then they're probably not using the HTML editor in source mode, so all they're adding is formatting stuff, links, lists, etc. If I wanted to stop them from using certain things I would use strip_tags to remove all tags except what I want to keep. I haven't seen any problems with accented characters though, but I probably wouldn't use htmlentities for that because I don't want to convert everything to HTML entities, just the accented characters. I would probably use str_replace to replace just the characters I want to replace instead of everything. Replacing all characters with HTML entities in order to only get accented characters to show up seems a little heavy-handed.

Link to comment
Share on other sites

For my content mangagement system I give the users either a single-line text field or an HTML editor, depending on what they're editing. Since the HTML editor produces HTML content, I don't transform it further, I'm assuming that the people editing the site have some reasonable understanding about what they're doing. If they don't, then they're probably not using the HTML editor in source mode, so all they're adding is formatting stuff, links, lists, etc. If I wanted to stop them from using certain things I would use strip_tags to remove all tags except what I want to keep. I haven't seen any problems with accented characters though, but I probably wouldn't use htmlentities for that because I don't want to convert everything to HTML entities, just the accented characters. I would probably use str_replace to replace just the characters I want to replace instead of everything. Replacing all characters with HTML entities in order to only get accented characters to show up seems a little heavy-handed.
I assumed that all special characters on English keyboard would also cause problems, but found that:!"$%^&*()_+-=][}{#~'@?/.>,<|display without any problems. On English keyboard it is only the £ sign (which is great). The French currency symbol also needs str_replace.Would you use str_replace on upload form or on page, which displays data? There are 5 different fields to enter/display and I hope to use a general function, which can be applied centrally to all (is easier to maintain as well when new changes are necessary).You are saying that using htmlentities is heavy-handed. When would you then make use of the function? In my ignorance I thought I found a great solution. Does it add a lot to processing time? In addition,when you give user a HTML editor, does this mean the text is entered into database for retrieval? How would you integrate the text editor with the insert queries?And last very important question: Do I really have a UTF-8 problem or would my issues arise for anyone? I really appreciate your help:-)Son
Link to comment
Share on other sites

Would you use str_replace on upload form or on page, which displays data?
I generally prefer to store the original data in the database and transform the output, but it's up to you.
You are saying that using htmlentities is heavy-handed. When would you then make use of the function?
When I want to convert all characters, not just some.
In addition,when you give user a HTML editor, does this mean the text is entered into database for retrieval? How would you integrate the text editor with the insert queries?
The interface elements don't use the database, it's just a normal interface element, like a regular <input> or textarea field. You get the value of it, and do what you want with it. The CMS I've got is using ExtJS, which includes an HTML editor field, and I can get the value of the field like any other form input. There's an HTML editor in the third example on this page:http://extjs.com/deploy/dev/examples/form/dynamic.html
And last very important question: Do I really have a UTF-8 problem or would my issues arise for anyone?
I'm not sure, I try to work with character sets as little as possible, only when it becomes a problem. I don't have that problem much, but I'm really not sure what I'm doing so that it's not an issue. Maybe after I finish optimizing my application and database structure and writing reports and adding new features I'll get around to reading about character sets. I've got my database tables using utf8, and I send output using json_encode, which expects unicode text, and everything seems to be working.
Link to comment
Share on other sites

Justsomeguy,Thanks a lot for your explanations. It really made my day! It is very helpful for me to learn from an experienced programmer like you:-)Will try to do some more reading about the licencing options for Ext JS. As far as I understood you can integrate it for free into your applications as long as you share your final source code, but am not entirely sure...Son

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...