RationalizeArticleCapitalization

Revision as of 19:46, 25 May 2007 by Stephen Judkins (talk | contribs)



DevelopmentPriorities This is the plan to bring case insensitivity to AboutUs. Things that content people will be interested in are in bold, everything else are implementation details. This is basically the plan I described to you on Friday. Please leave any comments or feedback on this page!

To get the "CaseSpace" title of a given article, we will do the following:

  1. Convert to lowercase. CamelCase becomes camelcase
  2. Strip spaces, underscores, question marks, and parentheses. "knock knock? whos_there" becomes "knockknockwhosthere"

Note that the users should never see these CaseSpace titles. They should only be used in the system for disambiguation.

Step 1

  • Create au_page_lookup_scratch table, populate with pages
  • Add hooks to block new page creation with casespace conflicts
  • Add hooks to add new pages to au_page_lookup_scratch table
  • Block creation of new pages that differ from existing pages only by capitalization.

Step 2

  • Create page with all pages with casespace conflicts, exluding redirects.
  • Have content do their thing and resolve all casespace conflicts, excluding redirects.

Step 3

  • Create au_page_lookup table, with primary unique key for casespace name.
  • Move hooks from #1 to point to au_page_lookup
  • Populate using same method as #1
  • Add hooks so all index.php?title=... queries are checked from au_page_lookup, 302'd to their real homes if necessary
  • Redirect all URLs to their proper capitalization, ie http://www.aboutus.org/cASEsPACE redirects to http://www.aboutus.org/CaseSpace.

Step 4

  • Figure out what to do with red/blue links in the parser.


Discussion

Could there be a way to seperate collisions? For example, I would like to know the group of redirects as opposed to the full wiki page collisions. MarkDilley
Mark, I was planning on excluding redirects from the page. Instead, we would simply delete every redirect that has a CaseSpace conflict after all the "real" pages have been consolidated. User:Stephen Judkins

Number of duplicates per number of revisions

1 423
1 299
1 95
1 89
1 86
1 74
1 69
1 66
1 63
1 55
1 48
2 45
1 43
1 40
4 39
1 38
3 37
1 35
2 34
1 33
1 31
2 30
4 28
4 26
4 25
4 24
4 23
4 22
6 21
6 20
1 19
2 18
7 17
8 16
11 15
10 14
14 13
14 12
33 11
37 10
41 9
54 8
82 7
98 6
191 5
278 4
701 3
1434 2
42794 1

A case to consider

SteveHabibRose is a redlink, but when I click to edit it to create a redirect to Steve Habib Rose, it tells me that I cannot do so, because a similar page already exists, namely Steve Habib Rose. This might be a feature, but it strikes me as a bug. TedErnst

I will look into fixing it. If it's difficult I will forget about it, because after step 3 is complete this will be a non-issue (any casespace form of "stevehabibrose" will redirect to the same page). The current system is supposed to be a stopgap measure anyways. Stephen Judkins 11:58, 23 May 2007 (PDT)



Retrieved from "http://aboutus.com/index.php?title=RationalizeArticleCapitalization&oldid=6969308"