WhoisParsing

Revision as of 12:53, 22 November 2007 by Hassan Javeed (talk | contribs) (Status: Nov 22 2007)



OurWork Edit-chalk-10bo12.png

What (summary)

Pull out all pieces of information from the whois record across all domain registrars. WhoisParsing is one Task in the larger WhoisRefresh Project.

Why this is important

  • This project will help us get access to whois record fields (like administrative contacts, technical contacts, domain info, etc. ) that could be used to update the information of all the web pages currently hosted in AboutUs.
  • Add more fields to the domain like expire date etc.
  • Gives us mastery over our technology. We can change it easily, or adapt it for a different problem.
  • Allows us to turn off Apache and the large Java tomcat process on our database master.

DoneDone

  • A stable of interesting test cases for each of the largest 20 registrars have been created and hand audited.
  • All of the test cases are passing.

Steps to DoneDone

  • Write the whois record parsers for the top 20 registrar.
  • Pass one or two test cases for all the above 20 registrars.
  • Write address parsers for the top 20 countries and pass test cases for different formats of addresses.
    • Australia
    • Brazil
    • China
    • Canada
    • Germany
    • Japan
    • Netherlands
    • Spain
    • United States
    • United Kingdom
  • Write a wrapper that takes a domain name, fetches the whois record and then call the parser on this record.
  • Write more test cases for the top 20 registrars
    • Belgium Domains
    • Capitol Domain
    • DirectNic
    • Domain Discover
    • Domain Doorman
    • Dot Register
    • Dotster
    • Enom
    • Fabulous
    • Godaddy
    • Key Systems
    • Melbourne It
    • Moniker
    • NameKing
    • Network Solutions
    • Register.com
    • Schlund Partner
    • Tucows
    • Wild West

Status: Nov 22 2007

  • We added another 2 registrars.
  • Started work on the default parser.
  • Corrected small problems from the registrar parser that were identified by running the parsers on large data set.

Plan for Tomorrow

  • We plan to finish the top 50 Registrars tomorrow which cover 90% of the entire domains.
  • Continue on the default parser so that it can be finished by tomorrow.
  • Continue the exercise of running the parsers on large data sets and identifying problems and correcting them.

Status: Nov 21 2007

  • We added around 8-9 registrars.
  • Corrected small problems from the registrar parser that were identified by running the parsers on large data set.
  • Added few more test cases for the parsers.

Plan for Tomorrow

  • We plan to add few more registrars tomorrow.
  • Work on a default parser that can extract address from any registrar whois.
  • Continue the exercise of running the parsers on large data sets and identifying problems and correcting them.
Excellent! The plan sounds great ... especially the default parser :-) Keep rockin! --Brandon 21:30, 21 November 2007 (PST)

Testing

For each "Test Domain" included in these tallies, we have an exhaustive test that is passing.

S# Registrar # "Interesting" Test Domains Comment
1 GoDaddy 0 ToSkip
2 Melbourne IT 0 ToSkip
3 Enom 0 ToSkip
4 Network Solutions 0 ToSkip Most records have Domain Status set to nil.
5 Belguim Domains 0 ToSkip
6 Tucows 0 ToSkip Many UK tests failing, some havepostal codes same as state
7 Beijing Innovative 0 ToSkip


Formats of WhoIs For Registrars

Network Solutions

AEroiNstruments.com: Without Country in Address

c/o Network Solutions
P.O. Box 447  
Herndon, VA.  20172-0447

AEroiNstruments.com: Administrative Contact and Technical Contact are on the same line Administrative Contact and Technical Contact in Separate lines

Administrative Contact:
Technical Contact:

ArRail-Dental.com: China Address

     Y.P.Sun       
     Y.P.Sun
     Beijing Shengbin Company Limited
     304 Citic Building 2 No 19
     Jianguomenwai
     Beijing 100004
     100004
     CHN 
     999 999 9999 fax: 999 999 9999

Broadland-Gas.com: UK address

     BROADLAND GAS     
     35 The Street
     CARLTON  COLVILLE
     LOWESTOFT, SUFFOLK NR33 8JP 
     UK  

CommerceCenter.com: UK Address

  Lorien
  Felstead, Essex CM6 3LR 
  Felstead, ESSEX CM6 3LR 
  UK  

DuoCuisines.com: France Address

  REVANCHE
  20 Rue Bernard Lazare
  LE CAILAR 30740
  FR  

Hricn.com: China Address

  jin yuan, xiong
  Guizhou huangping jiuzhouzhongxue
  Guizhou, Guizhou 550000
  CN  

JennaJoy.com: Name Missing

     Technical Contact:
           
     ATTN: JENNAJOY.COM
     c/o Network Solutions
     P.O. Box 447
     Herndon, VA 20172-0447
     570-708-8780

OnegaTelecoms.com: City and State are same

  Unit 2 Ground Floor Caxton Street
  Studios Caxton Street North
  London, LONDON E16 1JL 
  GB  

TradeFootball.com: IE address

  10b Beckett Way 
  Parkwest Business Park, Clondalkin D22 00000
  IE  

XmlConference.com: Fax is NULL

     Daste, Kevin      
     323 Pine St
     New Orleans, LA 70118
     US
     504-208-1566 fax: null


Retrieved from "http://aboutus.com/index.php?title=WhoisParsing&oldid=12520823"