WhoisParsing
What (summary)
Pull out all pieces of information from the whois record across all domain registrars. WhoisParsing is one Task in the larger WhoisRefresh Project.
Why this is important
- This project will help us get access to whois record fields (like administrative contacts, technical contacts, domain info, etc. ) that could be used to update the information of all the web pages currently hosted in AboutUs.
- Add more fields to the domain like expire date etc.
- Gives us mastery over our technology. We can change it easily, or adapt it for a different problem.
- Allows us to turn off Apache and the large Java tomcat process on our database master.
DoneDone
- A stable of interesting test cases for each of the largest 20 registrars have been created and hand audited.
- All of the test cases are passing.
Steps to DoneDone
-
Write the whois record parsers for the top 20 registrar. -
Pass one or two test cases for all the above 20 registrars. -
Write address parsers for the top 20 countries and pass test cases for different formats of addresses.-
Australia -
Brazil -
China -
Canada -
Germany -
Japan -
Netherlands -
Spain -
United States -
United Kingdom
-
- Write a wrapper that takes a domain name, fetches the whois record and then call the parser on this record.
- Write more test cases for the top 20 registrars
- Belgium Domains
- Capitol Domain
- DirectNic
- Domain Discover
- Domain Doorman
- Dot Register
- Dotster
- Enom
- Fabulous
- Godaddy
- Key Systems
- Melbourne It
- Moniker
- NameKing
- Network Solutions
- Register.com
- Schlund Partner
- Tucows
- Wild West
Status: Nov 22 2007
- We added another 2 registrars.
- Started work on the default parser.
- Corrected small problems from the registrar parser that were identified by running the parsers on large data set.
Plan for Tomorrow
- We plan to finish the top 50 Registrars tomorrow which cover 90% of the entire domains.
- Continue on the default parser so that it can be finished by tomorrow.
- Continue the exercise of running the parsers on large data sets and identifying problems and correcting them.
Status: Nov 21 2007
- We added around 8-9 registrars.
- Corrected small problems from the registrar parser that were identified by running the parsers on large data set.
- Added few more test cases for the parsers.
Plan for Tomorrow
- We plan to add few more registrars tomorrow.
- Work on a default parser that can extract address from any registrar whois.
- Continue the exercise of running the parsers on large data sets and identifying problems and correcting them.
- Excellent! The plan sounds great ... especially the default parser :-) Keep rockin! --Brandon 21:30, 21 November 2007 (PST)
Testing
For each "Test Domain" included in these tallies, we have an exhaustive test that is passing.
S# | Registrar | # "Interesting" Test Domains | Comment |
---|---|---|---|
1 | GoDaddy | 0 | ToSkip |
2 | Melbourne IT | 0 | ToSkip |
3 | Enom | 0 | ToSkip |
4 | Network Solutions | 0 | ToSkip Most records have Domain Status set to nil. |
5 | Belguim Domains | 0 | ToSkip |
6 | Tucows | 0 | ToSkip Many UK tests failing, some havepostal codes same as state |
7 | Beijing Innovative | 0 | ToSkip |
Formats of WhoIs For Registrars
Network Solutions
AEroiNstruments.com: Without Country in Address
c/o Network Solutions P.O. Box 447 Herndon, VA. 20172-0447
AEroiNstruments.com: Administrative Contact and Technical Contact are on the same line Administrative Contact and Technical Contact in Separate lines
Administrative Contact: Technical Contact:
ArRail-Dental.com: China Address
Y.P.Sun Y.P.Sun Beijing Shengbin Company Limited 304 Citic Building 2 No 19 Jianguomenwai Beijing 100004 100004 CHN 999 999 9999 fax: 999 999 9999
Broadland-Gas.com: UK address
BROADLAND GAS 35 The Street CARLTON COLVILLE LOWESTOFT, SUFFOLK NR33 8JP UK
CommerceCenter.com: UK Address
Lorien Felstead, Essex CM6 3LR Felstead, ESSEX CM6 3LR UK
DuoCuisines.com: France Address
REVANCHE 20 Rue Bernard Lazare LE CAILAR 30740 FR
Hricn.com: China Address
jin yuan, xiong Guizhou huangping jiuzhouzhongxue Guizhou, Guizhou 550000 CN
JennaJoy.com: Name Missing
Technical Contact: ATTN: JENNAJOY.COM c/o Network Solutions P.O. Box 447 Herndon, VA 20172-0447 570-708-8780
OnegaTelecoms.com: City and State are same
Unit 2 Ground Floor Caxton Street Studios Caxton Street North London, LONDON E16 1JL GB
TradeFootball.com: IE address
10b Beckett Way Parkwest Business Park, Clondalkin D22 00000 IE
XmlConference.com: Fax is NULL
Daste, Kevin 323 Pine St New Orleans, LA 70118 US 504-208-1566 fax: null