cpdetector is a small yet clever framework forcodepage detection that integrates differentstrategies. It may be used as a library for thirdparty software that accesses textual data overnetwork. It also includes a best-practiceimplementation in form of a command line tool thatallows sorting and transforming large collectionsof documents based on their codepage. Availablestrategies include: jchardet (exclusion, frequencyanalysis, and guessing), detection of the HTMLcharset property, and detection of the XMLencoding declaration.
License: Mozilla Public License (MPL)
Changes:
The proguard shrinker is now used, so thecpdetector jar is now more than ten times smaller.System.out is no longer used for logging inJChardetFacade. All packages were renamed with theprefix "info.monitorenter".
More...