05 January 2007

Composite Wikipedia Mad-Libs Microformats

imagine analysing
all wikipedia biographies
to determine
what sorts of facts
(birthplace, creative output)
are mentioned most often

now compose
the 100 most common of these
into a generic
biography template
for no-man/everyman

like the game of mad-libs)

now repeat this process 1000 times
for the 1000 most common
classes of article
(countries, cities, animals, historical events)

so we have 1000 article-templates
with 1000 fact-madlibs each

approximating the million most useful
kinds of facts

now pick any arbitrary webpage
and look for examples of those million facts

and assign metadata to that page
telling which article templates were used
which fact madlibs
and what madlib-fillers

(how big a chunk
this will take
out of the semantic-web problem
depends on how many pages
fall outside those 1000 article-types)