Jump to content

creating a readability function


real_illusions

Recommended Posts

Hi all,Is there a way of creating a text readability function or script? * Calculate the average number of words you use per sentence. * Calculate the average number of syllables per word. * Multiply the average number of syllables per word multiplied by 84.6 and subtract it from the average number of words multiplied by 1.015. * Subtract the result from 206.835. * Algorithm: 206.835 - (1.015 * average_words_sentence) - (84.6 * average_syllables_word)Thats open information for the Flesch Reading Ease and results in a number between 0 and 100.Now, where do you start? How do you determine what text to use? And how do you parse an HTML document for all the text? Or at least all text from within the <p> tags.http://www.phpro.org/examples/Get-Text-Between-Tags.html I've tried the scripts here, but without success. Mainly using the DOM extensions for PHP. Just get a blank page. Using file_get_contents("http://www.example.com"); to get the html.Then how do you calculate the average words and the average syllables ??Any help would be appreciated.

Link to comment
Share on other sites

Hi all,Is there a way of creating a text readability function or script? * Calculate the average number of words you use per sentence. * Calculate the average number of syllables per word. * Multiply the average number of syllables per word multiplied by 84.6 and subtract it from the average number of words multiplied by 1.015. * Subtract the result from 206.835. * Algorithm: 206.835 - (1.015 * average_words_sentence) - (84.6 * average_syllables_word)Thats open information for the Flesch Reading Ease and results in a number between 0 and 100.Now, where do you start? How do you determine what text to use? And how do you parse an HTML document for all the text? Or at least all text from within the <p> tags.http://www.phpro.org/examples/Get-Text-Between-Tags.html I've tried the scripts here, but without success. Mainly using the DOM extensions for PHP. Just get a blank page. Using file_get_contents("http://www.example.com"); to get the html.Then how do you calculate the average words and the average syllables ??Any help would be appreciated.
From the sounds of it, you need some sort of text evaluator class, with specific methods for analyzing and interpreting a given String.1) You need a method that can break the string down into sentences, probably by counting the number of times a . appears. For each sentence, break that down into how many words there are, probably by counting the number " " (spaces) there are. From there add up all the words for each sentence, and divide by the number of sentences.2) I'm not sure how to analyze a word for syllables...that beats me, but the process of finding the average would follow similar to the above method for 1.3/4/5) A simple method once you figure out step 3I would make one main method within the class that runs all those other methods in order. Like
$analyzer = new TextAnalyzer($text);$analyzer.analyze();

You could still make those other methods public for use as a library, or just by calling a main analyze method (or having it autorun on instantiation) would save you the trouble of having to calling all those methods.To get all the text, you can probably use getElementsByTagName on the client side to get all the text and submit it via GET/POST to your PHP script.my $.02 :)

Link to comment
Share on other sites

You can do preg_split() over each ". ", "? " and "! "* to get an array of the sentences (and therefore get their count and split them further).You can then do explode() or another preg_split() over spaces to get all words within a sentence. To do better counting, you should probably exlude empty strings (created from overusing spaces like this) and lonely dashes of all kinds - like that for example.For syllables... how do you formally define a syllable in English? In Bulgarian grammar, a syllable is formally defined as a sequence of letters with a single vowel in it. In English grammar, that's true, yet it's not the whole story (according to the Wikipedia entry, a word like "House" is one syllable). So... to find out all syllables, you can count the vowels (that would mean "a", "o", "u", "e", "i", "y"), but that's likely to be an overestimate.[edit]Ah, yeah... that script above would probably give a much more realistic count[/edit]*Note the space: You don't want to match ".NET" and "example.com".

Link to comment
Share on other sites

Hmm...this isn't working how I expected it to be.I'm hoping that for each <p> tag found, split the contents into words (just realised I've missed out a step of counting number of sentances). Then for each sentance (at the moment, each <p> tag) split it into words. For each word, find out the syllable. Count the syllables, and add then up.Unless there's way of just counting the syllables per sentances, so to miss out the splitting the sentances into words step.Here's the code:

function get_p($file){	$h1tags = preg_match_all("/(<p.*>)(\w.*)(<\/p>)/ismU",$file,$patterns);	$res = array();	array_push($res,$patterns[2]);	array_push($res,count($patterns[2]));	return $res;}$p  = get_p($file);if($p[1] != 0){	echo "<br/>p Tags found: $p[1]</p>";	foreach($p[0] as $key => $val){		//echo "<li>" . htmlentities($val) . "</li>";		$val = htmlentities($val);		$words = preg_split('/ /', $val, -1, PREG_SPLIT_OFFSET_CAPTURE);		//print_r ($words);			foreach ($words as $k => $word) {								// Regex Patterns Needed $triples = “dn\’t|eau|iou|ouy|you|bl$”;				$doubles = "ai|ae|ay|au|ea|ee|ei|eu|ey|ie|ii|io|oa|oe|oi|oo|ou|oy|ue|uy|ya|ye|yi|yo|yu";				$singles = "a|e|i|o|u|y";				$vowels = '/(".$triples."|".$doubles."|".$singles.")/';				$trailing_e = "/e$/";				$trailing_s = "/s$/";								// Cleaning up word endings				$word = preg_replace($trailing_s, "", $word[0]);				$word = preg_replace($trailing_e, "", $word[0]);								// Count # of “vowels”				preg_match_all($vowels, $word[0], $matches );								$count = count($matches[0]);				++$count;								//print_r ($matches);				}				echo $count;				//return $syl_count;				//echo count_syllables($syl_count);							}

For example, if there were 8 <p> tags, this is what is currently outputted.p Tags found: 811111111The contents of the array of print_r ($matches); are appearing empty and I dont know why.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...