Data Toolbar

{{Short description|Computer software}}

{{Infobox Software

| name=Data Toolbar

| developer=DataTool Services

| operating system=Microsoft Windows

| genre=Browser toolbar, Web scraping

| website=[http://datatoolbar.com/ www.datatoolbar.com]

}}

Data Toolbar is a Web scraping computer software add-on to the Internet Explorer, Mozilla Firefox, and Google Chrome Web browsers that collects and converts the structured data from Web pages into a tabular format that can be loaded into a spreadsheet or database management program.{{ cite journal| title = A guide to the mortgage banking industry's leading providers of high-tech products and services | url = http://issuu.com/zackinpublications/docs/sme1101_online| journal=The Journal for Mortgage Banking Professionals | publisher = Zackin Publications |pages=14 |volume=25 |issue=2 |date = January 2011}}

== Algorithm ==

The program implements a variation of the genetic tree-matching algorithm with respect to nested lists.Alberto H. F. Laender, Berthier A. Ribeiro-Neto,

Altigran S. da Silva, Juliana S. Teixeira [http://homepages.dcc.ufmg.br/~berthier/books_journal_papers/sigmod_record_2002.pdf A Brief Survey of Web Data Extraction Tools] {{Webarchive|url=https://web.archive.org/web/20110706162225/http://homepages.dcc.ufmg.br/~berthier/books_journal_papers/sigmod_record_2002.pdf |date=2011-07-06 }} ACM SIGMOD Volume 31 Issue 2 That is, inside a given website, the program recursively traverses the branches of its DOM tree, aiming to detect nested lists of data items matching the format of the specified content. This approach is known to have several advantages over a simple string-matching algorithm.Nitin Jindal, Bing Liu [http://www.siam.org/proceedings/datamining/2010/dm10_081_jindaln.pdf A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction] Proceedings of the Tenth SIAM International Conference on Data Mining, 2010

Features

  • Collection of data and images directly from the Internet Explorer.
  • Collection of information from Details pages linked to the catalog.
  • Automatic processing of multi-page catalogs.
  • Support of irregular multi-row catalogs mixed with advertisement.

Similar tools

  • Automation Anywhere - The Web Extractor is a part of the larger automation system
  • [http://www.webextract.net Easy Web Extract] - Standalone application, Windows
  • [http://www.mozenda.com/web-content-extractor Mozenda] - Web based service
  • [http://www.newprosoft.com Newprosoft] - Standalone application, includes an Agent, Windows
  • [http://www.outwit.com/ OutWit] – Standalone Application and Firefox Extension
  • [https://www.datascraping.co/data-extraction-software.aspx Data Scraping Studio] – Standalone Application for Windows and Chrome Extension
  • [https://www.diggernaut.com/ Diggernaut] – Web platform with standalone application for Windows, Linux, MacOS and Google Chrome Extension

Sources