Jump to content

SEO negatives: content scraping, clickjacking, page rank theft


Guest So Called

Recommended Posts

Guest LH91325

I've done a lot of research into these topics very recently. I'm alarmed! I've seen a lot of abnormal activity on my own site that I can't explain any other way except that I may be getting scraped or 'jacked. It makes no sense to me to get your webpages all nice, get the HTML and CSS to validate, get your PHP and MySQL and ASP and everything working well.... And then your domain and your content and your vaunted page ranks get stolen by people who can't create original content or they find it easier to steal your content than create their own. Why aren't we discussing this? It serves no good purpose to create good pages and craftily constructed in every way, only to have all our efforts stolen. Google "clickjaking" "content scraping" "page framing" "frame busting" or "frame busters" (both good), "anti-frame busting" "skyscraper sites" "page rank theft" "302 redirects" After all my research I have only two good solutions. For every JavaScript protection there is a counter-offensive. Only two things look good to me: (1) header: X-Frame-Options: DENY, only helps with the latest browsers. (I added that to all my sites today. Cost: one line of PHP code. (2) CSS countermeasure:(AFAIK the only dependable measure today, or at the last great hope of mankind) declare your whole page body { display : none; } and then use JavaScript to var theBody = document.getElementsByTagName('body')[0]; theBody.style.display = "block"; which makes it show in frames as "none" but turns it all back if JavaScript is enabled and if the clickjacker or content scraper hasn't disabled JavaScript (in which case they get nothing.) Bad news, those who have JavaScript disabled (or running NoScript) get nothing, even your normal site visitors who weren't scraped or 'jacked. Discuss.

<head><style> body { display : none;} </style></head><body><script> if (self == top) {  var theBody = document.getElementsByTagName('body')[0];  theBody.style.display = "block";} else {   top.location = self.location; }</script>Note that this should be used in combination with a secure response header as well.

Credit: https://www.owasp.org/index.php/Clickjacking#Best-for-now_implementation

Edited by LH91325
  • Like 1
Link to comment
Share on other sites

I would worry about compromising accessibility. If a user does not have Javascript enabled then they will not be able to see your website. Certainly, "click-jacking" is dangerous for users but it is actually the responsibility of browser developers to fix that. As for content-scraping, it is not worth hindering a portion of your website's visitors to prevent it.

Link to comment
Share on other sites

Guest LH91325

I've read many discussions to the contrary, where people had thriving websites producing income and then they got hijacked and dropped entirely off the Google page ranks. And yes, as I said in my post, no JavaScript = no page, not good for visitors who have it disabled. As far as browser developers fixing it, they have, but unfortunately the fix works only for those who run the latest browsers. I've been logging visitor's browsers and versions for the last several months and I'm amazed at what old clunky browsers some people are still running. At least the X-Frame-Options: DENY fixes clickjacking and content scraping via framing, and appears to have no negative effects, but as I said it only works for later browser versions. At present I'm trying to figure out a way to determine how much this is happening to my own site, if at all. I've got some very suspicious activity in my logs but I cannot tell for sure exactly what is causing it. One interesting experiment I might run, start logging which visitors have JavaScript enabled. I can think of several ways to do that. I can even give them a page version without countermeasures if I want. That's the fun thing about a good server side script, you can dynamically adjust to situations and send different content based upon each access at the time it is made.

Link to comment
Share on other sites

You are saying that other websites repost your page content? Couldn't you scan back to find your content and then blacklist those ip's? Better yet, redirect them to rubbish. Now maybe they don't display your content with the same ip that they scan and scrape with, but you might be able to encode identifying sequences that would allow you to identify the culprit.

Edited by davej
Link to comment
Share on other sites

Guest So Called

They frame you. The actual request is coming from the visitor. I'm not even sure if I've been framed yet, or any way to find out without writing some JavaScript that tests to see if it's framed then reports back to my server. Except that frame busting would probably kill that code too. What I mean is that there is no direct interaction between the scraper site and the site being framed, so there's nothing to blacklist. If they're just copying your content and posting it somewhere else you can sometimes find that. I haven't found any of my site content yet but I've found my posts on a forum that were scraped/stolen. I had a unique signature string so one day I Googled it (just wondered if anybody was copying my signature) and I found the full forum articles on several or a dozen .INFO sites. Since I wasn't the forum owner I didn't do anything about it.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...