Difference between revisions of "RationalizeArticleCapitalization"

(Discussion)
Line 31: Line 31:
 
: Could there be a way to seperate collisions?  For example, I would like to know the group of [[redirect]]s as opposed to the full [[wiki]] page collisions. [[MarkDilley]]
 
: Could there be a way to seperate collisions?  For example, I would like to know the group of [[redirect]]s as opposed to the full [[wiki]] page collisions. [[MarkDilley]]
 
: Mark, I was planning on excluding redirects from the page.  Instead, we would simply delete every redirect that has a CaseSpace conflict after all the "real" pages have been consolidated. [[User:Stephen Judkins]]
 
: Mark, I was planning on excluding redirects from the page.  Instead, we would simply delete every redirect that has a CaseSpace conflict after all the "real" pages have been consolidated. [[User:Stephen Judkins]]
 +
 +
==Number of duplicates per number of revisions==
 +
{| border="1"
 +
|-
 +
| 1||1559
 +
|-
 +
| 1||768
 +
|-
 +
| 1||655
 +
|-
 +
| 1||423
 +
|-
 +
| 1||401
 +
|-
 +
| 1||392
 +
|-
 +
| 1||299
 +
|-
 +
| 1||264
 +
|-
 +
| 1||218
 +
|-
 +
| 1||138
 +
|-
 +
| 1||101
 +
|-
 +
| 1||95
 +
|-
 +
| 1||89
 +
|-
 +
| 1||86
 +
|-
 +
| 1||83
 +
|-
 +
| 1||74
 +
|-
 +
| 1||73
 +
|-
 +
| 1||71
 +
|-
 +
| 1||69
 +
|-
 +
| 1||66
 +
|-
 +
| 1||63
 +
|-
 +
| 1||61
 +
|-
 +
| 1||60
 +
|-
 +
| 2||58
 +
|-
 +
| 2||55
 +
|-
 +
| 1||54
 +
|-
 +
| 2||52
 +
|-
 +
| 1||49
 +
|-
 +
| 1||48
 +
|-
 +
| 1||47
 +
|-
 +
| 1||45
 +
|-
 +
| 1||43
 +
|-
 +
| 2||40
 +
|-
 +
| 4||39
 +
|-
 +
| 3||38
 +
|-
 +
| 7||37
 +
|-
 +
| 2||36
 +
|-
 +
| 2||35
 +
|-
 +
| 3||34
 +
|-
 +
| 3||33
 +
|-
 +
| 1||32
 +
|-
 +
| 2||31
 +
|-
 +
| 3||30
 +
|-
 +
| 2||29
 +
|-
 +
| 7||28
 +
|-
 +
| 4||27
 +
|-
 +
| 5||26
 +
|-
 +
| 6||25
 +
|-
 +
| 6||24
 +
|-
 +
| 8||23
 +
|-
 +
| 8||22
 +
|-
 +
| 9||21
 +
|-
 +
| 8||20
 +
|-
 +
| 5||19
 +
|-
 +
| 9||18
 +
|-
 +
| 13||17
 +
|-
 +
| 13||16
 +
|-
 +
| 17||15
 +
|-
 +
| 17||14
 +
|-
 +
| 31||13
 +
|-
 +
| 28||12
 +
|-
 +
| 45||11
 +
|-
 +
| 45||10
 +
|-
 +
| 68||9
 +
|-
 +
| 78||8
 +
|-
 +
| 105||7
 +
|-
 +
| 153||6
 +
|-
 +
| 255||5
 +
|-
 +
| 383||4
 +
|-
 +
| 868||3
 +
|-
 +
| 1801||2
 +
|-
 +
| 43664||1
 +
|-
 +
|}

Revision as of 19:41, 21 May 2007

This is the plan to bring case insensitivity to AboutUs. Things that content people will be interested in are in bold, everything else are implementation details. This is basically the plan I described to you on Friday. Please leave any comments or feedback on this page!

To get the "CaseSpace" title of a given article, we will do the following:

  1. Convert to lowercase. CamelCase becomes camelcase
  2. Strip spaces, underscores, question marks, and parentheses. "knock knock? whos_there" becomes "knockknockwhosthere"

Note that the users should never see these CaseSpace titles. They should only be used in the system for disambiguation.

Step 1

  • Create au_page_lookup_scratch table, populate with pages
  • Add hooks to block new page creation with casespace conflicts
  • Add hooks to add new pages to au_page_lookup_scratch table
  • Block creation of new pages that differ from existing pages only by capitalization.

Step 2

  • Create page with all pages with casespace conflicts, exluding redirects.
  • Have content do their thing and resolve all casespace conflicts, excluding redirects.

Step 3

  • Create au_page_lookup table, with primary unique key for casespace name.
  • Move hooks from #1 to point to au_page_lookup
  • Populate using same method as #1
  • Add hooks so all index.php?title=... queries are checked from au_page_lookup, 302'd to their real homes if necessary
  • Redirect all URLs to their proper capitalization, ie http://www.aboutus.org/cASEsPACE redirects to http://www.aboutus.org/CaseSpace.

Step 4

  • Figure out what to do with red/blue links in the parser.


Discussion

Could there be a way to seperate collisions? For example, I would like to know the group of redirects as opposed to the full wiki page collisions. MarkDilley
Mark, I was planning on excluding redirects from the page. Instead, we would simply delete every redirect that has a CaseSpace conflict after all the "real" pages have been consolidated. User:Stephen Judkins

Number of duplicates per number of revisions

1 1559
1 768
1 655
1 423
1 401
1 392
1 299
1 264
1 218
1 138
1 101
1 95
1 89
1 86
1 83
1 74
1 73
1 71
1 69
1 66
1 63
1 61
1 60
2 58
2 55
1 54
2 52
1 49
1 48
1 47
1 45
1 43
2 40
4 39
3 38
7 37
2 36
2 35
3 34
3 33
1 32
2 31
3 30
2 29
7 28
4 27
5 26
6 25
6 24
8 23
8 22
9 21
8 20
5 19
9 18
13 17
13 16
17 15
17 14
31 13
28 12
45 11
45 10
68 9
78 8
105 7
153 6
255 5
383 4
868 3
1801 2
43664 1


Retrieved from "http://aboutus.com/index.php?title=RationalizeArticleCapitalization&oldid=6885022"