of the million most-frequent
words on the Web
based on these frequencies
we calculate the expected average distance
between every pair of words
and compare it to the
observed average distance
closer actual pairs
get connected by 'short' elastics
farther pairs by anti-elastics
the whole takes a semi-unique 3D shape
we find the freqency of each word
within each top-level Yahoo
topic category (14 total)
and color it one of 14 colors
to reflect which topic
uses that word most
we rotate the 3D shape
until the 'stars'
of each of the 14 colors
are optimally clustered
and we flatten the shape
along that axis
the radial order of the words
we call 'yahoobetical order'
we temporarily maintain
yahoobetical order
but reposition each word
with nearer orbits for common words
distant orbits for uncommon ones
we now consider word-pairs
on the web
and give them stars
with orbits based on frequency
and positions halfway
between their components
we add triplets, etc
up to phrases, sentences
paragraphs, pages, chapters, books
using hypothetical elastic links
to their components
determining their yahoobetical
positions
we loosen the original
yahoobetical ordering
of the individual words
and let the elastics
re-sort things
(expecting not-too-dramatic
a change)
this is recalibrated
yahoobetical order
we consider the full oeuvre
of any author
and count the word-frequencies
repositioning words' orbits to match
and we compare those
individual-author's orbits
to the whole-web orbits
and re-map based on differences
so the words leCarre (say) uses
more than average
get distinctive orbits
and ditto, those he uses less
(we may choose to filter out
words and phrases used more mainly
in only one or two titles
eg characters' names)
these words' orbits are his
stylistic fingerprint