Thoughts On Categories

Abstract

see also: CategoriesToCull

Gads, there are way too many categories for most of the articles and many of them either don't make sense or are too broad. We need to trim them down so they are usable.

Possible One-Time Solution

  • Look at adult categories and use the existing categories to tag all of these pages "AdultContent"
  • Identify and store all category names in which there are more than 100 references (VeryGeneralCategories)
    • Lets get some samples to determine at what level (if any) categories are useful. Obviously human created categories are mostly useful, but the bot ones don't seem to be. I'll work on getting a list of categories together, so we can do a more in-depth review. Jason Parmer 13:57, 16 May 2007 (PDT)
    • Obviously, we'll need to keep Category:AboutUs Featured Site, which lists all of the past featured sites and is at around 400 items currently (but growing everyday). -- TakKendrick
  • Delete these categories because they are too bulky and not helpful to users (Do a manual review first, there may be some worth saving)
  • Delete all singleton categories (where there is only one page with that category) because these are likely too narrow

Data Points

  • 75,560,247 references to Categories
  • 10,451,207 Categories
  • 7,449,146 1 page Categories
  • 1,284,985 2 page Categories
  • 1,717,076 3+ page Categories
  • 59,548 100+ page Categories, accounting for 48,898,425 category references
  • of the 48 million, we have 18,804,318 left to go through in 51,775 categories
  • trimming 1, 2, and the 100+ page categories would leave us with 16,642,746 references to categories

Ongoing component

  • The AboutUsBot should not create categories in new records that match the VeryGeneralCategories
  • When a user creates a new category, if it is a VeryGeneralCategory then they should be warned about that before saving (but still allowed if they want). This will actually give an opportunity to some people to "capture" a general category such as "Real Estate" but I'm not sure what to do about that.
    • Perhaps a similar mechanism for people trying to save a page with 20+ (or 30+) categories. They can do it, but they get a warning message. -- TakKendrick

Category Utilities

  • Ability to "Delete a worthless category" - which would delete all instances of a worthless category
  • Ability to "Rename a category" - which would rename all instances of a category to another name
    • For user-created categories this may not always be what we want to do. i.e. if there are two categories that mean basically the same thing like "Table Tennis" and "Ping Pong", it may be fine to let them both exist because that will continue to match the folksonomy that people have created. Where this utility may be useful is in renaming some of the major categories that were prepopulated to categories that are more wiki-ish.
      My thoughts on these types of categories (categories which are very similar) is that they should refer back to each other, ie on Table Tennis or Ping Pong, have a link to the other category. Maybe or maybe not? Blake and I have noticed the obvious drawback, which is that it isn't very easy to automate, it requires some manual culling. Nathan (talk) / 16:52, 13 June 2007 (UTC)
This isn't really doable until our categories are cut down a LOT from what they are now. I start gibbering in fear thinking about renaming, say, the "Regional" category. Jason Parmer 13:54, 16 May 2007 (PDT)
  • It would be great if each category has some "random from category" mechanism that allowed users to view a random page inside of that category. Potentially could also be paired with the "5 or more edits" idea the current random page has to ensure that random searches would contain useful pages. This would be a boon for big categories like "Real Estate". -- TakKendrick
    • Unfortunatly the "5 or more edits" thing is kind of a trick. It works fine for what it does, but it does not and cannot give us random from category without variable load times with the current setup.

ParkingLots

I am sure there's a one click fix.... In The Category:Linkfarm, which is just about 1000 strong and comprised of what we have tagged as ParkingLots, I've removed the categories from the wikipages.

I've noticed:
Many of the wikipages in the Category: Linkfarm -Parkinglots do not have any categories or related domains.
Those that do tend to go overboard (way too specific, names as categories, the, for, urls-overkill)...and would not be useful to users, even if these pages were not ParkingLots.
Some of the categories found in this overall Category:Linkfarm would be useful (Healthcare, Portland, Colleges,??...) categories, but they link to ParkingLots. On some level this diminishes the quality of our fledgling category function when viable categories link to ParkingLots.
Many of the categories are useless Kasey

Great Idea

It would be ideal if a bot could be created to create new specific categories from two non-specific cateories. That is, if an article was in both Category:Portland and Category:Restaurants, add the article to Category:Portland Restaurants. --Eohippus 03:49, 22 July 2007 (PDT)

Categories to keep

please do not delete these:

I would also recommend keeping Category:Adult (sometimes generated by AboutUsBot from the page content), which is almost always an indicator of AdultContent. There are several other category names (for example, Category:BDSM) which should also be kept as an indicator of AdultContent, but note that Category:Gay and Category:Lesbian (and their equivalents in other languages) can refer to bars, literature or individual biographies, and are therefore not reliable indicators of AdultContent. --Eohippus 03:49, 22 July 2007 (PDT)

When I wen through the initial list, my intention was to try and catch the widest array of adult content in one net, knowing there would be some mistakes. Looking at the categories, they were overwhelmingly adult content and so the idea was that the other information could be organized better peeled away from the over used categories. As far as the obvious adult content categories, I am pretty sure they are going to be collapsed into the one adult content category. I believe we are doing this for simplicity sakes... MarkDilley

An observation or two

...after hand-editting about 1000 articles to prune not only categories, but related domains.

First, hand-editting these leaves me feeling as if I am draining a swamp one cupful at a time. However, it needs to be done to encourage other users by making their "click-at-random" experience more enjoyable -- & tempt them to make their own edits. Frankly, no one wants to wade thru a swamp, wondering where the dry land is, but a few inconvient puddles or a lake or two is a challenge. (And to extend this metaphor a little further, we still have the challenge of taming the woodlands.)

I'm not sure if this can be done well enough by a bot. A bot will catch many of the easy ones -- but I've found that where there is one useless category, there are many, & a bot will cut only the narrowest of trails thru this tangle. So I'm trying to figure out how to focus on a selected few that will create (to use my metaphor once again) a large enough clearing in the swamp that others will be attracted to it & want to build on what I have done.

Then there are a number of useless categories, like Category:In & Category:Good Value which were apparently created by the AboutUsBot from the text of the pages; although no category page exists, both appear on many pages -- "Category:In" has over a thousand. These I am purging with extreme prejudice. It's a similar situation with related domains: have a look at what links to Google.com. (::Shiver::)

Another point is that we have numerous duplicate categories, which need to be consolidated. I'm not talking about folksonomies here but examples like "Law", "Lawyer", "Lawyers", "Legal", "Law Firm", "Law Firms", "Lawfirms", etc. I can take care of a few of these, but quite frankly I'm not motivated to consolidate all of them -- & I worry that I might pick the wrong category title to consolidate into.

And last, some categories just need to be split -- or even created. Example that comes to mind is Category:Portland. We all might know that there's a Portland, Oregon & a Portland, Maine -- but also a Portland, Conncecticut, & a penninsula in Britain called Portland. Then there's Category:Vancouver. -- Llywrch 10:30, 22 August 2007 (PDT)




Retrieved from "http://aboutus.com/index.php?title=Thoughts_On_Categories&oldid=9113491"