Jump to content

Explain " & < > etc. in context


nomnex

Recommended Posts

When I use normal text, do I have to replace (&, ", etc.) by (&quote, &amp, etc.), why and when?Example 1 (which one is correct)

a. <p>Some text "Some Text" some text</p>orb. <p>Some text "Some Text" some text</p>

Case 2 (statement): Some characters do not display in normal text unless to use a special character "&n;"

example: [...]use the <start="5"> attribute[...] <= will not display as normal text because the </> characters are parsed as HTML tags.I have to use: [...]use the <start="5"> attribute[...] to display the </> characters as normal text

Case 2: Question. Which one is now the correct one and why?

c. [..]use the <start="5" attribute>[..]ord. [..]use the <"5"&gt attribute[..]

Could you specifically explain the last question before I change all the ' "" ' and '&' characters present as simple text in the paragraphs in '&quot' and '&amp'; if it is needed or not.Thanks in advance.

Link to comment
Share on other sites

The quote, ampersand and angle bracket characters are all part of the HTML syntax. Any such characters that are non-syntactic must be turned into entities. So... d. would be correct.

Link to comment
Share on other sites

The quote, ampersand and angle bracket characters are all part of the HTML syntax. Any such characters that are non-syntactic must be turned into entities. So... d. would be correct.
hi Synook,Do you mean that any of the characters present in the 2 links below cannot be typed as is (using the keyboard's correspondent key), instead they have to be replaced by {&name;} or {number;}?!?a. http://www.w3schools.com/html/html_entities.aspb. http://www.w3schools.com/tags/ref_entities.aspLink a. states:
from link a "Character Entities"Some characters are reserved in HTML. For example, you cannot use the greater than or less than signs within your text because the browser could mistake them for markup.If we want the browser to actually display these characters we must insert character entities in the HTML source.
That does not say exactly which ones? E.g. I use a Japanese keyboard. Is it correct that I can NOT type {¥} (Yens), but instead I have to type {&yens;} or {&.#.165;}. However this rule does not apply when I type the sign {$} USD, which is not part of the list (160-255)?? What the reason of these differences? Both are currencies.Link b goes more in detail:
The higher part of ISO-8859-1 (codes from 160-255) contains the characters used in Western European countries and some commonly used special characters.
But it add to confusing: Codes from 160-255 include some French and German letters with accents (not all of them). I don't think French and German will type half their accent letters using the their keyboard and the other half (codes from 160-255) using {&name;} or {num;}?Another precision regarding the characters {" ' & }. Do I also need to replace them when they are in text block included in <pre> or/and <code> tags, or does it make exception?Lastly if I write a scenario (that's an example) with many {" "} )(quotes marking the start/end of dialogues), instead of filling the pages with {"} would it make sens to use the short quote <q> tag for this purpose, or would the tag be inappropriate?Thanks to anyone taking the time to answer these questions
Link to comment
Share on other sites

Do you mean that any of the characters present in the 2 links below cannot be typed as is (using the keyboard's correspondent key), instead they have to be replaced by {&name;} or {number;}?!?
No, it's just ", <, >, and &.
That does not say exactly which ones? E.g. I use a Japanese keyboard. Is it correct that I can NOT type {¥} (Yens), but instead I have to type {&yens;} or {&.#.165;}. However this rule does not apply when I type the sign {$} USD, which is not part of the list (160-255)?? What the reason of these differences? Both are currencies.
You can enter the yen symbol directly, if your character set contains it. However, if your character set doesn't contain it, you need to use its entity. The dollar sign is part of the base ASCII set and is therefore present in all ASCII-compatible character sets. This is because the ASCII was created by the US, not Japan.
But it add to confusing: Codes from 160-255 include some French and German letters with accents (not all of them). I don't think French and German will type half their accent letters using the their keyboard and the other half (codes from 160-255) using {&name;} or {num;}?
If you use ISO-8859-1, you can directly enter the characters that are present in it. All others must be encoded using entities (try it!). Fortunately, most people, French-speaking or otherwise, do not enter text by the rules of HTML on a regular basis.
Another precision regarding the characters {" ' & }. Do I also need to replace them when they are in text block included in <pre> or/and <code> tags, or does it make exception?
Yes, you need to use entities for them. The only way to avoid having to encode them is to surround the block of text with the <![CDATA[ ... ]]> markup declaration, which specifies everything inside it as character data only, and not markup.
Lastly if I write a scenario (that's an example) with many {" "} )(quotes marking the start/end of dialogues), instead of filling the pages with {"} would it make sens to use the short quote <q> tag for this purpose, or would the tag be inappropriate?
If it is actually a short quote, then use <q>. If it isn't, don't use it.In the end, entities are used for three reasons: [a] to encode syntactically meaningful characters, to encode characters that are not present in the document's character set, and [c] for convenience and portability.
Link to comment
Share on other sites

After reading your answer, I visited http://www.w3.org/ looking for paragraphs containing text quotes {"} dully replaced by {"}. There are none (see sample chapter below). Am I not missing something?

[May 2010</span>  | <a title="Archive: XProc Standard Defines Way to Organize and Share XML Workflows" href="/News/2010#entry-8793">Archive</a></p></div> <div class="description expand_description">	<p>Today W3C announced a powerful tool for managing XML-rich processes such as business processes used in enterprise environments. The W3C Recommendation "<a href="/TR/2010/REC-xproc-20100511/">XProc: An XML Pipeline Language</a>," provides a standard framework for composing XML processes. XProc streamlines the automation, sequencing and management of complex computations involving XML by leveraging existing technologies widely adopted in the enterprise setting. "XML is tremendously versatile," said Norman Walsh, MarkLogic, and one of the co-editors of the specification. "Just off the top of my head, I can name standard ways to store, validate, query, transform, include, label, and link XML. What we haven't had is any standard way to describe how to combine them to accomplish any particular task. That's what XProc provides." Read more in the <a href="/2010/05/xproc-pr">press release</a> and learn more about <a href="/standards/xml/">XML</a>.</p> </div></div>

"XML is tremendously versatile,"

shouldn't it be "XML is tremendously versatile,"?Does it mean that the web consortium does not apply its own standards, or is it different?

Link to comment
Share on other sites

The characters <, >, &, ", ' must be escaped at the context where they have a special meaning. At other places, they don't have to be escaped, but they could be."<" and "&" have a special meaning everywhere in a document.">" only has a special meaning within an opening or a closing tag. i.e. it's valid to have

<div>some > none</div>

but it's invalid to have

<div title="some > none">equastion</div>

Single and/or double quotes have a special meaning within an attribute's value. i.e. it's valid to have

<div>It's a "thing"</div>

<div class="It's my life">song</div>

<div class='This "thing" he said'>song</div>

but it's invalid to have

<div class='It's my life'>song</div>

<div class="This "thing" he said">song</div>

As for other characters... if you set up your document to use UTF-8 encoding (e.g. by going to Notepad's "Save As..." menu, and selecting it in the "Encoding" dropdown), and serve it up as one (with PHP's header() function or HTML's meta element; see this part of the HTML spec for details), you won't have to escape any characters besides the above ones.

Link to comment
Share on other sites

The characters <, >, &, ", ' must be escaped at the context where they have a special meaning. At other places, they don't have to be escaped, but they could be.
boen_robot, thank you. Your examples helped to understand the nuances.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...