Mining Resources
Automated Classification
l Starter Taxonomies
– Industry specific taxonomies or pods
– Prebuilt latching terms and clusters
– Apply to a document set and refine, or
– Derive from a training set through automated tools
– SEMIO – automatic generation of clusters, map to taxonomy
– Autonomy – drop document prototypes in defined folders
l Generation from Clusters
– Taxonomy engine makes a “best guess” pass at training set
– Experts rename and restructure the surfaced clusters
– Raven engine implementation
l Relies heavily upon linguistic analysis
– Identifying meaning by part of speech, clustering
– Position in content indicating importance
– By structure (SEMIO, Raven) or Neural Net (Autonomy)