BrainPill Posted January 9, 2020 Share Posted January 9, 2020 I want to scan website content for having only English content. I was thinking about using file_get_contents and read the page , but from here its unclear. Like when using some kind of regex like this if (!preg_match('/[^A-Za-z0-9]/', $string)) because file_get_contents() also gets html tags and so on. Is there a way to block non English language groups and how is this done? Link to comment Share on other sites More sharing options...
dsonesuk Posted January 9, 2020 Share Posted January 9, 2020 If they use lang tag tou could search for that to identify the language the website is tailored for. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now