Jump to content

how to detect non English content ?


BrainPill

Recommended Posts

I want to scan website content for having only English content. 

I was thinking about using file_get_contents and read the page , but from here its unclear.

Like when using some kind of regex like this

if (!preg_match('/[^A-Za-z0-9]/', $string)) 

because file_get_contents() also gets html tags and so on.

Is there a way to block non English language groups and how is this done?

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...