Comm
{{Short description|Standard UNIX utility for comparing files}}
{{Other uses}}
{{for|the Portuguese Order of Merit|ComM}}
{{more footnotes|date=January 2013}}
{{lowercase title}}
{{Infobox software
| name = comm
| logo =
| screenshot = Comm-example.png
| screenshot size =
| caption = Example usage of comm
command
| author = Lee E. McMahon
| developer = AT&T Bell Laboratories, Richard Stallman, David MacKenzie
| released = {{Start date and age|1973|11}}
| latest release version =
| latest release date =
| programming language = C
| operating system = Unix, Unix-like, Plan 9, Inferno
| platform = Cross-platform
| genre = Command
| license = coreutils: GPLv3+
Plan 9: MIT License
| website =
}}
The {{mono|comm}} command in the Unix family of computer operating systems is a utility that is used to compare two files for common and distinct lines. {{Mono|comm}} is specified in the POSIX standard. It has been widely available on Unix-like operating systems since the mid to late 1980s.
History
Written by Lee E. McMahon, {{Mono|comm}} first appeared in Version 4 Unix.{{cite tech report
| first1 = M. D.
| last1 = McIlroy
| authorlink1 = Doug McIlroy
| year = 1987
| url = https://www.cs.dartmouth.edu/~doug/reader.pdf
| title = A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 |series=CSTR
| number = 139
| institution = Bell Labs}}
The version of {{mono|comm}} bundled in GNU coreutils was written by Richard Stallman and David MacKenzie.{{Cite web|url=https://linux.die.net/man/1/comm|title = Comm(1): Compare two sorted files line by line - Linux man page}}
Usage
{{Mono|comm}} reads two files as input, regarded as lines of text. {{Mono|comm}} outputs one file, which contains three columns. The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. This functionally is similar to {{Mono|diff}}.
Columns are typically distinguished with the {{Mono|
For efficiency, standard implementations of {{Mono|comm}} expect both input files to be sequenced in the same line collation order, sorted lexically. The sort (Unix) command can be used for this purpose.
The {{Mono|comm}} algorithm makes use of the collating sequence of the current locale. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.
Return code
Unlike {{Mono|diff}}, the return code from {{Mono|comm}} has no logical significance concerning the relationship of the two files. A return code of 0 indicates success, a return code >0 indicates an error occurred during processing.
Example
$ cat foo
apple
banana
eggplant
$ cat bar
apple
banana
banana
zucchini
$ comm foo bar
apple
banana
banana
eggplant
zucchini
This shows that both files have one banana, but only bar has a second banana.
In more detail, the output file has the appearance that follows. Note that the column is interpreted by the number of leading tab characters. \t represents a tab character and \n represents a newline (Escape character#Programming and data formats).
class="wikitable" style="text-align:center"
! style="min-width:20px" | !! style="min-width:20px" | 0 !! style="min-width:20px" | 1 !! style="min-width:20px" | 2 !! style="min-width:20px" | 3 !! style="min-width:20px" | 4 !! style="min-width:20px" | 5 !! style="min-width:20px" | 6 !! style="min-width:20px" | 7 !! style="min-width:20px" | 8 !! style="min-width:20px" | 9 |
0
| \t || \t || a || p || p || l || e || \n |
---|
1
| \t || \t || b || a || n || a || n || a || \n |
2
| \t || b || a || n || a || n || a || \n |
3
| e || g || g || p || l || a || n || t || \n |
4
| \t || z || u || c || c || h || i || n || i || \n |
Comparison to diff
In general terms, {{Mono|diff}} is a more powerful utility than {{Mono|comm}}. The simpler {{Mono|comm}} is best suited for use in scripts.
The primary distinction between {{Mono|comm}} and {{Mono|diff}} is that {{Mono|comm}} discards information about the order of the lines prior to sorting.
A minor difference between {{Mono|comm}} and {{Mono|diff}} is that {{Mono|comm}} will not try to indicate that a line has "changed" between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.
Other options
{{Mono|comm}} has command-line options to suppress any of the three columns. This is useful for scripting.
There is also an option to read one file (but not both) from standard input.
Limits
Up to a full line must be buffered from each input file during line comparison, before the next output line is written.
Some implementations read lines with the function {{Mono|readlinebuffer()}} which does not impose any line length limits if system memory suffices.
Other implementations read lines with the function {{Mono|fgets()}}. This function requires a fixed buffer. For these implementations, the buffer is often sized according to the POSIX macro {{Mono|LINE_MAX}}.
See also
- Comparison of file comparison tools
- List of Unix commands
- cmp (Unix) – character oriented file comparison
- cut (Unix) – splitting column-oriented files
References
{{Reflist}}
External links
{{Wikibooks|Guide to Unix|Commands}}
- {{man|cu|comm|SUS|select or reject lines common to two files}}
- {{man|1|comm|Plan 9}}
- {{man|1|comm|Inferno}}
{{Unix commands}}
{{Plan 9 commands}}
{{Core Utilities commands}}
Category:Free file comparison tools