davej Posted September 26, 2013 Share Posted September 26, 2013 (edited) When HTML is written in a foreign language, for example you can view source here... http://www.bild.de/ ...there are all sorts of accented characters in the HTML file. When do special characters NOT need to be converted to... amp whatever semicolon? http://www.w3schools.com/tags/ref_entities.asp I'm also struck by the apparent fact that all foreign programmers are stuck learning English words just to be able to program. Edited September 26, 2013 by davej Link to comment Share on other sites More sharing options...
Ingolme Posted September 26, 2013 Share Posted September 26, 2013 If you save as UTF-8, you don't need to do anything with special characters. If you use one of the single-byte encodings, then any character that's outside of that encoding needs to be converted to an entity. Single-byte encodings are ISO-8859-1, Windows-1252 and many others. Personally, I just use UTF-8 for everything. Yes, foreign programmers do need to learn english, or just memorize keywords and function names. Link to comment Share on other sites More sharing options...
justsomeguy Posted September 26, 2013 Share Posted September 26, 2013 There are also languages that are not based on English. Most of them are because most of the people who designed the first languages spoke English, but there are plenty others now. http://en.wikipedia.org/wiki/Non-English-based_programming_languages Link to comment Share on other sites More sharing options...
davej Posted September 27, 2013 Author Share Posted September 27, 2013 I had never realized that UTF-8 was a flexible multi-byte multi-lingual scheme. What about amp LT GT quote characters that occur in text? Link to comment Share on other sites More sharing options...
Ingolme Posted September 27, 2013 Share Posted September 27, 2013 < and > need to be escaped all the time because that has to do with the HTML parser and not the encoding. Link to comment Share on other sites More sharing options...
davej Posted September 27, 2013 Author Share Posted September 27, 2013 Makes me wonder what their keyboards look like to create all these accented characters. Link to comment Share on other sites More sharing options...
Ingolme Posted September 27, 2013 Share Posted September 27, 2013 Spanish keyboards have four keys right next to the "Enter" key with diacritics. Press the diacritic and whichever letter you press after that will have the diacritic on it. If the letter isn't meant to have a diacritic then the diacritic will be put on its own followed by the letter. I don't know about other countries, I'm Spanish. Link to comment Share on other sites More sharing options...
davej Posted September 28, 2013 Author Share Posted September 28, 2013 I wrote a little utility some time ago in VB.NET to convert Linux files to CR-LF Dos files. I'm going to have to take another look at that code and make sure it will warn me if it finds a multi-byte character. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now