Geographic information retrieval
{{Short description|Technologies for discovering content about a place}}
Geographic information retrieval (GIR) or geographical information retrieval systems are search tools for searching the Web, enterprise documents, and mobile local search that combine traditional text-based queries with location querying, such as a map or placenames. Like traditional information retrieval systems, GIR systems index text and information from structured and unstructured documents, and also augment those indices with geographic information. The development and engineering of GIR systems aims to build systems that can reliably answer queries that include a geographic dimension, such as "What wars were fought in Greece?" or "restaurants in Beirut".{{Cite journal|title = Geographic Information Retrieval|journal = SIGSPATIAL Special|date = 2011-07-01|issn = 1946-7729|pages = 2–4|volume = 3|issue = 2|doi = 10.1145/2047296.2047297|first1 = Ross|last1 = Purves|first2 = Christopher|last2 = Jones|citeseerx = 10.1.1.130.3521|s2cid = 1940653}} Semantic similarity and word-sense disambiguation are important components of GIR.{{Cite journal|title = The semantics of similarity in geographic information retrieval {{!}} Janowicz {{!}} Journal of Spatial Information Science|issue = 2|pages = 29–57|url = http://josis.net/index.php/josis/article/viewArticle/26|journal = Journal of Spatial Information Science|volume = 2011|access-date = 2015-09-12|doi = 10.5311/JOSIS.2011.2.26|date = 2011-05-25|last1 = Kuhn|first1 = Werner|last2 = Raubal|first2 = Martin|last3 = Janowicz|first3 = Krzysztof|doi-broken-date = 1 November 2024}} To identify place names, GIR systems often rely on natural language processing{{cite news|date=2003-08-21|title=MetaCarta: Putting Natural Language on the Map|publisher=GIS Monitor|archive-url=https://web.archive.org/web/20031003000954/http://www.gismonitor.com/news/newsletter/archive/082103.php#MetaCarta|archive-date=2003-10-03|url=http://www.gismonitor.com/news/newsletter/archive/082103.php#MetaCarta}} or other metadata to associate text documents with locations. Such georeferencing, geotagging, and geoparsing tools often need databases of location names, known as gazetteers.{{cite news|url=https://www10.giscafe.com/nbc/articles/view_article.php?section=Magazine&articleid=629476|title=The Space Between Maps, Search and Content|first=Susan|last=Smith}}{{cite news |url=https://www.bizjournals.com/boston/blog/mass-high-tech/2003/11/ware-withal-mit-rooted-metacarta-stakes.html|title=Ware-Withal: MIT-rooted MetaCarta stakes its claim with automatic geoparsing software|first=Elizabeth|last=Dinan|date=2003-11-10}}{{cite news|url=https://gisuser.com/2007/06/metacarta-unveils-first-geo-referencing-solution-to-support-arabic-and-spanish-languages/|title=MetaCarta Unveils First Geo-referencing Solution to Support Arabic and Spanish Languages|date=2007-06-20}}{{cite web|url=https://cga-download.hmdc.harvard.edu/publish_web/Annual_Spring_Workshops/2008_georef/presentations/MetaCarta_JFrank.pdf|title=Locating All Content|first1=John|last1=Frank|first2=Bob|last2=Warren}}
GIR architecture
GIR involves extracting and resolving the meaning of locations in unstructured text. This is known as geoparsing. After identifying mentions of places and locations in text, a GIR system indexes this information for search and retrieval. GIR systems can commonly be broken down into the following stages: geoparsing, text and geographic indexing, data storage, geographic relevance ranking with respect to a geographic query and browsing results commonly with a map interface.
Some GIR systems separate text indexing from geographic indexing, which enables the use of generic database joins,{{cite book |url=https://www.postgis.us/ |title=PostGIS In Action |chapter=Chapter 15. Query performance tuning |publisher=Manning Publications |edition=Second}} or multi-stage filtering,{{cite web |url=https://lucene.apache.org/solr/guide/6_6/spatial-search.html |title=Apache Solr - Lucene Reference Guide - Spatial Search |access-date=2021-01-03}} and others combine them for efficiency.{{cite web|url=http://www.metacarta.com/cartatrees.html|archive-url=https://web.archive.org/web/20030402012531/http://www.metacarta.com/cartatrees.html|archive-date=2003-04-02|title=CartaTrees Map Search Text Index}}
GIR must manage several forms of uncertainty, including semantic ambiguity of mentions of places in natural language text and position precision.{{cite journal |title=Geographic information retrieval: Modeling uncertainty of user's context |journal=Fuzzy Sets and Systems |volume=196 |date=2012-06-01 |pages=105–124 |first1=Gloria |last1=Bordognaa |first2=Giorgio |last2=Ghisalbertib |first3=Giuseppe |last3=Psailac |doi=10.1016/j.fss.2011.04.005 |quote=Geographic information retrieval (GIR) is nowadays a hot research issue that involves the management of uncertainty and imprecision and the modeling of user preferences and context. Indexing the geographic content of documents implies dealing with the ambiguity, synonymy and homonymy of geographic names in texts. On the other side, the evaluation of queries specifying both content based conditions and spatial conditions on documents’ contents requires representing the vagueness and context dependency of spatial conditions and the personal user's preferences. }}
GIR systems
- MetaCarta created{{cite news|url=https://www.nytimes.com/2002/01/14/business/technology-federal-agents-look-to-adapt-private-technology.html|title=Federal Agents Look to Adapt Private Technology|author=Jennifer 8. Lee|date=2002-01-14|work=New York Times}}{{cite news|archive-url=https://web.archive.org/web/20201231153329/https://www.economist.com/technology-quarterly/2003/03/15/the-revenge-of-geography|archive-date=2020-12-31|url=https://www.economist.com/technology-quarterly/2003/03/15/the-revenge-of-geography|title=The revenge of geography|date=2003-03-13|publisher=The Economist}}{{cite web|archive-url=https://web.archive.org/web/20040603075232/http://www.msnbc.msn.com/id/5092840/site/newsweek/|archive-date=2004-06-03|url=http://www.msnbc.msn.com/id/5092840/site/newsweek/|title=Making the Ultimate Map - When digital geography teams up with wireless technology and the Web, the world takes on some new dimensions|first=Steven|last=Levy|publisher=Newsweek|date=2004-06-07}} and patented{{cite patent|pridate=2000-02-22|status=granted|title=Spatially coding and displaying information|inventor1-first=John R.|inventor1-last=Frank|inventor2-first=Erik M.|inventor2-last=Rauch|inventor3-first=Karen|inventor3-last=Donoghue|issue-date=2006-10-03|patent-number=7117199|country-code=US|url=https://patents.google.com/patent/US7117199B2/en?oq=7117199}} one of the first commercial products to offer GIR capabilities.{{cite speech |url=https://slideplayer.com/slide/7872906/ |title=A confidence-based framework for disambiguating geographic terms |author1=Erik Rauch |author2=Michael Bukatin |author3=Kenneth Baker from MetaCarta |access-date=2021-01-03 }}{{cite conference |title=MetaCarta at GeoCLEF 2005 |quote=In Memoriam Erik Rauch |author=András Kornai, MetaCarta |publisher=GeoCLEF |year=2005 }}
- Frankenplace: a general-purpose geographic search engine.{{Cite book|url = http://dl.acm.org/citation.cfm?id=2736277.2741137|publisher = International World Wide Web Conferences Steering Committee|date = 2015-01-01|location = Republic and Canton of Geneva, Switzerland|isbn = 978-1-4503-3469-3|pages = 12–22|series = WWW '15|first1 = Benjamin|last1 = Adams|first2 = Grant|last2 = McKenzie|first3 = Mark|last3 = Gahegan| title=Proceedings of the 24th International Conference on World Wide Web | chapter=Frankenplace |doi = 10.1145/2736277.2741137|s2cid = 1639723}}
- Web-a-where{{cite conference |title=Web-a-where: geotagging web content |first1=Einat |last1=Amitay |first2=Nadav |last2=Har'El |first3=Ron |last3=Sivan |first4=Aya |last4=Soffer|author4-link=Aya Soffer |conference=SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval |date=July 2004 |pages=273–280 |doi=10.1145/1008992.1009040 |quote=We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus --- a locality that the page discusses as a whole. }}
Study & Evaluation
The study of GIR systems has a rich history dating back to the 1970s and possibly earlier. See Ray Larson’s book Geographic information retrieval and spatial browsing{{cite book |title=Geographic information retrieval and spatial browsing |first=Ray R. |last=Larson |url=http://hdl.handle.net/2142/416 |year=1996 |publisher=Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign |hdl=2142/416 |isbn=0878450971 |issn=0069-4789 }} for references to much of the pre-Web literature on GIR.
In 2005 the Cross-Language Evaluation Forum added a geographic track, GeoCLEF. GeoCLEF was the first TREC-style evaluation forum for GIR systems and provided participants a chance to compare systems.{{Cite book|publisher = Springer Berlin Heidelberg|date = 2005-09-21|isbn = 978-3-540-45697-1|pages = 908–919|series = Lecture Notes in Computer Science|doi = 10.1007/11878773_101|language = en|first1 = Fredric|last1 = Gey|first2 = Ray|last2 = Larson|first3 = Mark|last3 = Sanderson|first4 = Hideo|last4 = Joho|first5 = Paul|last5 = Clough|first6 = Vivien|last6 = Petras| title=Accessing Multilingual Information Repositories | chapter=GeoCLEF: The CLEF 2005 Cross-Language Geographic Information Retrieval Track Overview | volume=4022 |editor-first = Carol|editor-last = Peters|editor-first2 = Fredric C.|editor-last2 = Gey|editor-first3 = Julio|editor-last3 = Gonzalo|editor-first4 = Henning|editor-last4 = Müller|editor-first5 = Gareth J. F.|editor-last5 = Jones|editor-first6 = Michael|editor-last6 = Kluck|editor-first7 = Bernardo|editor-last7 = Magnini|editor-first8 = Maarten de|editor-last8 = Rijke|citeseerx = 10.1.1.156.6368}}
Applications
GIR has many applications in geoweb, neogeography, and mobile local search and has been a focus of many conferences, including the ESRI Users Conferences and O'Reilly’s Where 2.0 conferences.{{cite speech|title=Local Search Faces Off - Craig Donato, Perry Evans, John Frank, Jeremy Kreitler, Shailesh Rao|date=2005-06-29|event=Where 2.0|url=http://itc.conversationsnetwork.org/shows/detail801.html|archive-url=https://web.archive.org/web/20130729205947/http://itc.conversationsnetwork.org/shows/detail801.html|archive-date=2013-07-29|access-date=2021-01-03|url-status=live}}{{cite journal |url=https://ieeexplore.ieee.org/document/1401768 |title=Local Search: The Internet Is the Yellow Pages |quote=Every day, millions of people use their local newspapers, classified ad circulars, Yellow Pages directories, regional magazines, and the Internet to find information pertaining to the activities of daily life… |first=Marty |last=Himmelstein |journal=Computer |year=2005 |volume=38 |issue=2 |pages=26–34 |publisher=Published by the IEEE Computer Society |doi=10.1109/MC.2005.65 }}
References
{{reflist}}