Document analysis using Hadoop and Mahout