how to detect non English content ?


I want to scan website content for having only English content. 

I was thinking about using file_get_contents and read the page , but from here its unclear.

Like when using some kind of regex like this

if (!preg_match('/[^A-Za-z0-9]/', $string)) 

because file_get_contents() also gets html tags and so on.

Is there a way to block non English language groups and how is this done?


