This project sets out to create a powerful, robust, quick to deploy, web scraper. A web scraper is a tool that extracts specific parts of web pages rather than the entire html page as a crawler would.
The outcome of this project will be a tool, written in Java, that accomplishes the following
- A powerful, fast, reliable way of scraping the web,
- Simple to set up - no cumbersome XML configuration to write,
- Fully automatable,
- Extension points for parsing and exporting
See our road map to view our progress.