Dealing With Content Thieves
You’ve spent months or years building your website and its content, and you’re proud of it. One day you find that some thrown-together website has copied your content and tried to pass it off as its own work.
This is incredibly frustrating. Worse, this duplicate content forces search engines to decide which website is the original and should rank higher. It’s possible that the copied content on the other website may show up above yours in search results.
Here at AboutUs.org / AboutUs.com, our original content that we write is regularly copied/scraped and republished on another websites without any attribution. I want to share what we’ve learned, so you can deal with any site that steals your content.
How can I find out if someone steals my content?
It’s a great idea to track people who are talking about you. You can set up a Google Alert for your business name, your website name, and for the titles of important pieces of content or blog posts on your site. Once you’ve done this, you will get an email when a word or phrase for which you’ve created an alert shows up on the web.
Another great way is to search in Google or Bing for a unique sentence from something you wrote, placing quotes around it so that the search engine searches for web pages that mention that exact phrase.
For example, if I wanted to check if some other site had copied this blog post, I might set up a Google Alert for “What to Do Before & After Your Blog Content is Scraped/Stolen/Copied” (the name of this post) and search in Google or Bing for a sentence from the post like “Another great way is to search in Google or Bing for a unique sentence from something you wrote” (with the quotes).
Can I prevent my content from being scraped?
Not really. You can try to keep bad actors out of your site, but the tools to do this are really gentlemen’s agreements — and bad actors don’t usually honor these.
You can give it a try, though. The robots.txt file on your website lets you request that certain bots or spiders not crawl your site. You can tell search engine spiders it’s okay to crawl your site, and ask all other bots not to. Here’s where the gentlemen’s agreement comes in. Robots.txt is code that courteous websites and bots respect. But any website that would send out bots to grab content, and then republish that content without attribution, is unlikely to respect your robots.txt code.
There’s one more thing to keep in mind. Disallowing all bots other than search engine bots can be risky. Sometimes search engines change the names of their bots, and you would have no way of knowing that unless you’re trolling the SEO blogs like a crazy person. If you don’t change the names of the bots you’re allowing, you could end up banning search engines from your site – and you wouldn’t know it until you notice your traffic has plummeted or that you’re not showing up in Google or Bing.
One thing you can do if you have a WordPress site
Many scrapers are not so smart and will just copy all the text within your blog post via your RSS feed and then publish it as their own. They will rarely include the “By _____” and so someone reading your content on their website would just think that it was written by that copycat-ing site.
If you have a WordPress blog or website, there is something you can do using a free plug-in called WordPress SEO by Yoast. In the RSS section, you can specify some text/links to accompany your blog post’s content in the RSS feed — For example, “[Name of post] was written by [author] on [name of site].” In that example, the name of the post would show up as the anchor text of a link to the original post on your site.
This way, if a dumb scraper steals your content they will unintentionally give you credit and a backlink to help your SEO and make it easier for Google to tell who the original source is. Note: Some scrapers are more sophisticated, or sometimes there is a human involved who might spot these and take out the credit.
So your content has been copied. Now what?
Contact the site and ask nicely – but firmly – for attribution or removal
Look for contact information on the website itself. If that doesn’t work, check the site’s whois record on a site like DomainTools.com. The whois record will tell you either who owns the site or who registered it, or both.
If you can’t get in touch with the website owner, or if the responsible person doesn’t respond appropriately, you’ll need to complain to a higher authority.
Talk to the people that control their website
Contact the website’s registrar or hosting company to let them know what this site has done. Explain that the owner hasn’t responded to your polite request.
Report the duplicate content to Google
- If you are confident that you have a copyright case, you can report the copyright infringement to Google.
- If a site is violating a law other than copyright, you can submit a legal removal request. This applies “if you have a court order establishing that a site is in violation of the law, or if you have identified a clear case of a legal violation for which Google has a removal responsibility,” according to Google.
- You can also try reporting spam to Google. If someone has copied your content without attribution, check the box next to “duplicate site or pages.” Note: Google doesn’t read every spam report. The company normally focuses its attention and action on larger offenders, to have the biggest possible impact on improving search.
Shine a light on the miscreants
- Let everyone know what happened to you, and about the site that grabbed your content. Talk about it in public venues like Twitter and your blog.
- Make sure the offending site’s online reputation reflects their bad behavior. Give the site a red rating and add an account of what they’ve done on MyWOT.com, a community that monitors website reputation. Try other consumer sounding board sites such as ComplaintsBoard.com, RipoffReport.com and SiteJabber.com.
Make sure your site is fully visible to search engines
- If you optimize your site well for search engines, you’ll have a much better chance of outranking a content thief in search results. For free tips on how to outrank, read our SEO articles. If you’d like an in-depth analysis of your website’s SEO, search engine rankings, and social media presence — all compared to your top 3 competitors — check out what we offer.
The Internet Thrives on Trust
We love the openness, vibrancy and ever-changing nature of the web, and we love sharing content. All our content is available under open license, so long as you attribute it to us and include a link back to our site.
Bad actors who take content and fail to attribute break the trust that makes the web a great place to converse, learn and share. Make sure you help by calling people on their bad behavior, and publicizing it. Don’t forget to praise people who do great things on the web, too.
This article was written by Kristina Weis of AboutUs.
Kristina is customer service and social media lead for AboutUs. She helps website owners who are trying to promote their businesses online. Her personal blog is at KristinaWeis.com and she tweets at @KristinaWeis.