Tag cloud

File:Foundation-l word cloud without headers and quotes.png

[[File:Web 2.0 Map.svg|thumb|A tag

cloud with terms related to Web 2.0]]

A tag cloud (also known as a word cloud or weighted list in visual design) is a visual representation of text data which is often used to depict keyword metadata on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color.Martin Halvey and Mark T. Keane, [http://www2007.org/htmlposters/poster988/ An Assessment of Tag Presentation Techniques] {{webarchive|url=https://web.archive.org/web/20170514121504/http://www2007.org/htmlposters/poster988/ |date=2017-05-14 }}, poster presentation at WWW 2007, 2007{{Cite journal|last1=Helic|first1=Denis|last2=Trattner|first2=Christoph|last3=Strohmaier|first3=Markus|last4=Andrews|first4=Keith|date=2011|title=Are tag clouds useful for navigation? A network-theoretic analysis|journal=International Journal of Social Computing and Cyber-Physical Systems|language=en|volume=1|issue=1|pages=33|doi=10.1504/IJSCCPS.2011.043603|issn=2040-0721|doi-access=free}} When used as website navigation aids, the terms are hyperlinked to items associated with the tag.

History

File:Heidi Paris - Tausend Plateaus - Coverentwurf 1991.jpg

In the language of visual design, a tag cloud (or word cloud) is one kind of "weighted list", as commonly used on geographic maps to represent the relative size of cities in terms of relative typeface size. An early printed example of a weighted list of English keywords was the "subconscious files" in Douglas Coupland's Microserfs (1995). A German appearance occurred in 1992.{{cite book |title=Tausend Plateaus. Kapitalismus und Schizophrenie |author=Gilles Deleuze, Felix Guattari |year=1992 |publisher=Merve-Verlag |isbn=978-3-88396-094-4}}

The specific visual form and common use of the term "tag cloud" rose to prominence in the first decade of the 21st century as a widespread feature of early Web 2.0 websites and blogs, used primarily to visualize the frequency distribution of keyword metadata that describe website content, and as a navigation aid.

The first tag clouds on a high-profile website were on the photo sharing site Flickr, created by Flickr co-founder and interaction designer Stewart Butterfield in 2004. That implementation was based on Jim Flanagan's Search Referral Zeitgeist,A copy of Jim Flanagan's Search Referral Zeitgeist was [https://web.archive.org/web/20041204231120/http://twiki.tensegrity.net/bin/view/Main/SearchReferralZeitgeist available at archive.org] but has since been blocked. In the comments of a [http://www.37signals.com/svn/archives/000937.php blog entry] {{webarchive|url=https://web.archive.org/web/20060426191534/http://www.37signals.com/svn/archives/000937.php |date=2006-04-26 }}, a user identified as Steve Minutillo attribute the idea to Jim Flanagan, stating that Flanagan's site had such displays in 2002. a visualization of Web site referrers. Tag clouds were also popularized around the same time by Del.icio.us and Technorati, among others.

Oversaturation of the tag cloud method and ambivalence about its utility as a web-navigation tool led to a decline of usage among these early adopters.{{cite web |url=http://www.readwriteweb.com/archives/tag_clouds_rip.php |title=Tag Clouds R.I.P.? |publisher=Readwriteweb.com |date=2011-03-30 |url-status=dead |archive-url=https://web.archive.org/web/20120319093314/http://www.readwriteweb.com/archives/tag_clouds_rip.php |archive-date=2012-03-19 }} Flickr gave a five-word acceptance speech for the 2006 "Best Practices" Webby Award, which simply stated "sorry about the tag clouds."{{cite web |url=http://www.webbyawards.com/press/archived-speeches.php#2006 |title=Welcome to the Webby Awards |publisher=Webbyawards.com |date=2011-10-28 |access-date=2013-07-27 |url-status=live |archive-url=https://web.archive.org/web/20060703183324/http://www.webbyawards.com/press/archived-speeches.php#2006 |archive-date=2006-07-03 }}

A second generation of software development discovered a wider diversity of uses for tag clouds as a basic visualization method for text data. Several extensions of tag clouds have been proposed in this context.

Types

File:Word population tagcloud 2011.png with the wordcloud package, using data from Country population. The proportional sizes of China and India were divided in half.]]

There are three main types of tag cloud applications in social software, distinguished by their meaning rather than appearance. In the first type, there is a tag for the frequency of each item, whereas in the second type, there are global tag clouds where the frequencies are aggregated over all items and users. In the third type, the cloud contains categories, with size indicating number of subcategories.

= Frequency =

In the first type, size represents the number of times that tag has been applied to a single item.Bielenberg, K. and Zacher, M., [http://bielenberg.info/thesis.pdf Groups in Social Software: Utilizing Tagging to Integrate Individual Contexts for Social Navigation] {{webarchive|url=https://web.archive.org/web/20071008061841/http://bielenberg.info/thesis.pdf |date=2007-10-08 }}, Masters Thesis submitted to the Program of Digital Media, Universität Bremen (2006) This is useful as a means of displaying metadata about an item that has been democratically "voted" on and where precise results are not desired.

In the second, more commonly used type,{{Citation needed|date=July 2008}} size represents the number of items to which a tag has been applied, as a presentation of each tag's popularity.

= Significance =

Instead of frequency, the size can be used to represent the significance of words and word co-occurrences, compared to a background corpus (for example, compared to all the text in Wikipedia).{{Cite arXiv|last1=Schubert|first1=Erich|last2=Spitz|first2=Andreas|last3=Weiler|first3=Michael|last4=Geiß|first4=Johanna|last5=Gertz|first5=Michael|date=2017-08-11|title=Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding|eprint=1708.03569|class=cs.IR}} This approach cannot be used standalone, but it relies on comparing the document frequencies to expected distributions.

= Categorization =

In the third type, tags are used as a categorization method for content items. Tags are represented in a cloud where larger tags represent the quantity of content items in that category.

There are some approaches to construct tag clusters instead of tag clouds, e.g., by applying tag co-occurrences in documents.Knautz, K., Soubusta, S., & Stock, W.G. (2010). [http://www.phil-fak.uni-duesseldorf.de/fileadmin/Redaktion/Institute/Informationswissenschaft/stock/Knautz_Soubusta_Stock.pdf Tag clusters as information retrieval interfaces] {{webarchive|url=https://web.archive.org/web/20110717203420/http://www.phil-fak.uni-duesseldorf.de/fileadmin/Redaktion/Institute/Informationswissenschaft/stock/Knautz_Soubusta_Stock.pdf |date=2011-07-17 }}. Proceedings of the 43rd Annual Hawaii International Conference on System Sciences (HICSS-43), January 5–8, 2010. IEEE Computer Society Press (10 pages).

More generally, the same visual technique can be used to display non-tag data,{{cite arXiv |eprint=0710.2156|last1=Aouiche|first1=Kamel|title=Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation|last2=Lemire|first2=Daniel|last3=Godin|first3=Robert|class=cs.DB|year=2007}} as in a word cloud or a data cloud.

The term keyword cloud is sometimes used as a search engine marketing (SEM) term that refers to a group of keywords that are relevant to a specific website. In recent years tag clouds have gained popularity because of their role in search engine optimization of Web pages as well as supporting the user in navigating the content in an information system efficiently.{{cite journal | last1 = Helic | first1 = D. | last2 = Trattner | first2 = C. | last3 = Strohmaier | first3 = M. | last4 = Andrews | first4 = K. | year = 2011 | title = Are Tag Clouds Useful for Navigation? A Network-Theoretic Analysis | url = http://www.markusstrohmaier.info/documents/2011_JoSCCPS-socialcom2010_extended.pdf | journal = International Journal of Social Computing and Cyber-Physical Systems| volume = 1 | issue = 1| pages = 33–55 | doi = 10.1504/IJSCCPS.2011.043603 | doi-access = free }} Tag clouds as a navigational tool make the resources of a website more connected,Trattner, C.:[http://www.austria-lexikon.at/attach/User/Trattner%20Christoph/ctrattner_IADIS_WWW_journal.pdf Linking Related Content in Web Encyclopedias with search query tag clouds] {{webarchive|url=https://web.archive.org/web/20120615114901/http://www.austria-lexikon.at/attach/User/Trattner%20Christoph/ctrattner_IADIS_WWW_journal.pdf |date=2012-06-15 }}. IADIS International Journal on WWW/Internet, Volume 9, Issue 2, 2011 when crawled by a search engine spider, which may improve the site's search engine rank. From a user interface perspective they are often used to summarize search results to support the user in finding content in a particular information system more quickly.Tratter, C., Lin, Y., Parra, D., Yue, Z., Brusilovsky, P.: [http://www.austria-lexikon.at/attach/User/Trattner%20Christoph/ht076-trattner.pdf Evaluating Tag-Based Information Access in Image Collections] {{webarchive|url=https://web.archive.org/web/20120615160853/http://www.austria-lexikon.at/attach/User/Trattner%20Christoph/ht076-trattner.pdf |date=2012-06-15 }}. In Proceedings of the 23rd ACM Conference on Hypertext and Social Media (HT 2012). ACM, New York, NY, USA, 2012

Visual appearance

Tag clouds are typically represented using inline HTML elements. The tags can appear in alphabetical order, in a random order, they can be sorted by weight, and so on. Sometimes, further visual properties are manipulated in addition to font size, such as the font color, intensity, or weight.Lohmann, S., Ziegler, J., Tetzlaff, L. [http://www.uni-due.de/~s400268/Lohmann09-interact.pdf Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration] {{webarchive|url=https://web.archive.org/web/20091007173622/http://www.uni-due.de/~s400268/Lohmann09-interact.pdf |date=2009-10-07 }}, T. Gross et al. (Eds.): INTERACT 2009, Part I, LNCS 5726, pp. 392–404, 2009. Most popular is a rectangular tag arrangement with alphabetical sorting in a sequential line-by-line layout. The decision for an optimal layout should be driven by the expected user goals. Some prefer to cluster the tags semantically so that similar tags will appear near each otherHassan-Montero, Y., Herrero-Solana, V. [http://www.nosolousabilidad.com/hassan/improving_tagclouds.pdf Improving Tag-Clouds as Visual Information Retrieval Interfaces] {{webarchive|url=https://web.archive.org/web/20060813162618/http://www.nosolousabilidad.com/hassan/improving_tagclouds.pdf |date=2006-08-13 }}. InSciT 2006: Mérida, Spain. October 25–28, 2006.{{cite arXiv |eprint=cs/0703109|last1=Kaser|first1=Owen|title=Tag-Cloud Drawing: Algorithms for Cloud Visualization|last2=Lemire|first2=Daniel|year=2007}}Salonen, J. 2007. [http://matriisi.ee.tut.fi/hypermedia/julkaisut/2007-salonen-som-clouds.pdf Self-organising map based tag clouds – Creating spatially meaningful representations of tagging data] {{webarchive|url=https://web.archive.org/web/20081224093118/http://matriisi.ee.tut.fi/hypermedia/julkaisut/2007-salonen-som-clouds.pdf |date=2008-12-24 }}. Proceedings of the 1st OPAALS conference, 26–27 November 2007, Rome, Italy. or use embedding techniques such as tSNE to position words. Edges can be added to emphasize the co-occurrences of tags and visualize interactions. Heuristics can be used to reduce the size of the tag cloud whether or not the purpose is to cluster the tags.

Tag cloud visual taxonomy is determined by a number of attributes: tag ordering rule (e.g. alphabetically, by importance, by context, randomly, ordered for visual quality), shape of the entire cloud (e.g. rectangular, circle, given map borders), shape of tag bounds (rectangle, or character body), tag rotation (none, free, limited), vertical tag alignment (sticking to typographical baselines, free). A tag cloud on the web must address problems of modeling and controlling aesthetics, constructing a two-dimensional layout of tags, and all these must be done in short time on volatile browser platform. Tags clouds to be used on the web must be in HTML, not graphics, to make them robot-readable, they must be constructed on the client side using the fonts available in the browser, and they must fit in a rectangular box.Marszałkowski, J., Mokwa, D., Drozdowski, M., Rusiecki, L., Narożny, H. [http://www.sciencedirect.com/science/article/pii/S0952197617301422 Fast algorithms for online construction of web tag clouds], Engineering Applications of Artificial Intelligence 64, pp. 378–390, 2017.

Data clouds

File:Top 500 by volume on the NYSE.png

A data cloud or cloud data is a data display which uses font size and/or color to indicate numerical values.{{cite web |title=ManyEyes Visualization and Commentary: World Population Data Cloud. |url=http://services.alphaworks.ibm.com/manyeyes/view/SIk76IsOtha6qFGgix3cI2- |last=Apel |first=Warren |access-date=2007-08-26 |url-status=live |archive-url=https://web.archive.org/web/20071029115347/http://services.alphaworks.ibm.com/manyeyes/view/SIk76IsOtha6qFGgix3cI2- |archive-date=2007-10-29 }} It is similar to a tag cloud{{cite web |title=ManyEyes Visualization: Ad cloud |url=http://services.alphaworks.ibm.com/manyeyes/view/Sh3S9FsOtha6OdUrBNWFF2- |last=Wattenberg |first=Martin |access-date=2007-03-12 |url-status=live |archive-url=https://web.archive.org/web/20080214102610/http://services.alphaworks.ibm.com/manyeyes/view/Sh3S9FsOtha6OdUrBNWFF2- |archive-date=2008-02-14 }} but instead of word count, displays data such as population or stock market prices.

Text clouds

File:State of the union word clouds.png by U.S. President Bush and 2011 State of the Union Address by President Obama{{cite web |title=TagCrowd visualization: State of the Union |url=http://www.tagcrowd.com/blog/2011/03/05/state-of-the-union-2002-vs-2011/ |last=Steinbock |first=Daniel |date=5 March 2011 |access-date=2011-03-05 |url-status=live |archive-url=https://web.archive.org/web/20110411071238/http://www.tagcrowd.com/blog/2011/03/05/state-of-the-union-2002-vs-2011/ |archive-date=2011-04-11 }}]]

File:Malayalam World Cloud with Science related words -BlueBackground.svg

A text cloud or word cloud is a visualization of word frequency in a given text as a weighted list.{{cite web|title=Text Clouds: A New Form of Tag Cloud?|url=http://www.joelamantia.com/blog/archives/tag_clouds/text_clouds_a_new_form_of_tag_cloud.html|last=Lamantia|first=Joe|access-date=2008-09-11|url-status=bot: unknown|archive-url=https://web.archive.org/web/20080910235655/http://www.joelamantia.com/blog/archives/tag_clouds/text_clouds_a_new_form_of_tag_cloud.html|archive-date=2008-09-10}} The technique has recently{{When|date=March 2020}} been popularly used to visualize the topical content of political speeches.{{cite web |title=US Presidential Speeches Tag Cloud |url=http://chir.ag/phernalia/preztags/ |last=Mehta |first=Chirag |access-date=2008-09-11 |url-status=live |archive-url=https://web.archive.org/web/20071019035301/http://chir.ag/phernalia/preztags/ |archive-date=2007-10-19 }}

Collocate clouds

Extending the principles of a text cloud, a collocate cloud provides a more focused view of a document or corpus. Instead of summarising an entire document, the collocate cloud examines the usage of a particular word. The resulting cloud contains the words which are often used in conjunction with the search word. These collocates are formatted to show frequency (as size) as well as collocational strength (as brightness). This provides interactive ways to browse and explore language.{{cite web |title=Collocate cloud |url=http://www.scottishcorpus.ac.uk/corpus/search/collocatecloud.php | access-date=2008-12-05}}

Perception

Tag clouds have been the subjects of investigation in several usability studies. The following summary is based on an overview of research results given by Lohmann et al.:

Tag size: Large tags attract more user attention than small tags (effect influenced by further properties, e.g., number of characters, position, neighboring tags).
Scanning: Users scan rather than read tag clouds.
Centering: Tags in the middle of the cloud attract more user attention than tags near the borders (effect influenced by layout).
Position: The upper left quadrant receives more user attention than the others (Western reading habits).
Exploration: Tag clouds provide suboptimal support when searching for specific tags (if these do not have a very large font size).

Felix et al.{{cite journal |title=Taking Word Clouds Apart: An Empirical Investigation of the Design Space for Keyword Summaries. |journal=IEEE Transactions on Visualization and Computer Graphics |volume=24 |issue=1 |pages=657–666 |date=Jan 2018 |doi=10.1109/TVCG.2017.2746018 |pmid=28866593 |last1=Felix |first1=Cristian |last2=Franconeri |first2=Steven |last3=Bertini |first3=Enrico |s2cid=6570943 }} compared how human reading performance differs from traditional tag clouds that map numeric values to the size of the font and alternative designs that uses for example color or additional shapes like circle and bars. They also compared how different arrangement of the words affects performance.

Use an additional bar or circle instead of the font size increases accuracy when reading the numeric value
However users can find specific word quicker when no additional mark is used
The performance depends on the task, simple tasks like finding a word are highly affected by the design choice, however the effect on tasks like identify the topic of a tag cloud is much smaller.

Creation

File:Wikipedia Wordle - Top 1000 vital article hits.png

In principle, the font size of a tag in a tag cloud is determined by its incidence. For a word cloud of categories like weblogs, frequency, for example, corresponds to the number of weblog entries that are assigned to a category. For smaller frequencies one can specify font sizes directly, from one to whatever the maximum font size. For larger values, a scaling should be made. In a linear normalization, the weight $t_i$ of a descriptor is mapped to a size scale of 1 through f, where $t_{\min}$ and $t_{\max}$ are specifying the range of available weights.

: $s_i = \left \lceil \frac{f_{\max}\cdot(t_i - t_{\min})}{t_{\max}-t_{\min}} \right \rceil$ for $t_i > t_{\min}$ ; else $s_{i}=1$

:* $s_i$ : display fontsize

:* $f_{\max}$ : max. fontsize

:* $t_i$ : count

:* $t_{\min}$ : min. count

:* $t_{\max}$ : max. count

Since the number of indexed items per descriptor is usually distributed according to a power law,{{cite arXiv |eprint=cs/0604036|last1=Voss|first1=Jakob|title=Collaborative thesaurus tagging the Wikipedia way|year=2006}} for larger ranges of values, a logarithmic representation makes sense.{{cite web |url=http://www.echochamberproject.com/node/247 |title=Kentbyte: Tag Cloud Font Distribution Algorithm. June 2005 |publisher=Echochamberproject.com |access-date=2013-07-27 |url-status=live |archive-url=https://web.archive.org/web/20131002095156/http://www.echochamberproject.com/node/247 |archive-date=2013-10-02 }}

Implementations of tag clouds also include text parsing and filtering out unhelpful tags such as common words, numbers, and punctuation.

There are also websites creating artificially or randomly weighted tag clouds, for advertising, or for humorous results.

References

External links

[https://web.archive.org/web/20160722004350/http://joelamantia.com/ideas/tag-clouds-evolve-understanding-tag-clouds Understanding Tag Clouds] – an information design analysis of tag clouds
[https://web.archive.org/web/20131119070111/http://www.onlamp.com/pub/a/onlamp/2006/06/08/designing-tag-clouds.html Design tips for building tag clouds] – software development guide from O'Reilly's ONLamp

Category:Web 2.0 neologisms

Category:Visualization (graphics)