Sort (Unix)

{{short description|Standard UNIX utility}}

{{lowercase title}}

{{Infobox software

| name = sort

| logo =

| screenshot = Sortunix.png

| screenshot size =

| caption = The {{code|sort}} command

| author = Ken Thompson (AT&T Bell Laboratories)

| developer = Various open-source and commercial developers

| released = {{Start date and age|1971|11|3}}

| latest release version =

| latest release date =

| programming language = C

| operating system = Multics, Unix, Unix-like, V, Plan 9, Inferno, MSX-DOS, IBM i

| platform = Cross-platform

| genre = Command

| license = coreutils: GPLv3+
Plan 9: MIT License

| website =

}}

In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r" flag will reverse the sort order. Sort ordering is affected by the environment's locale settings.

History

A {{code|sort}} command that invokes a general sort facility was first implemented within Multics.{{Cite web|url=https://www.multicians.org/multics-commands.html|title=Multics Commands|website=www.multicians.org}} Later, it appeared in Version 1 Unix. This version was originally written by Ken Thompson at AT&T Bell Laboratories. By Version 4 Thompson had modified it to use pipes, but sort retained an option to name the output file because it was used to sort a file in place. In Version 5, Thompson invented "-" to represent standard input.{{cite tech report |first1=M. D. |last1=McIlroy |author-link1=Doug McIlroy |year=1987 |url=https://www.cs.dartmouth.edu/~doug/reader.pdf |title=A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 |series=CSTR |number=139 |institution=Bell Labs}}

The version of {{Mono|sort}} bundled in GNU coreutils was written by Mike Haertel and Paul Eggert.{{Cite web|url=https://linux.die.net/man/1/sort|title=sort(1): sort lines of text files - Linux man page|website=linux.die.net}} This implementation employs the merge sort algorithm.

Similar commands are available on many other operating systems, for example a {{Mono|sort}} command is part of ASCII's MSX-DOS2 Tools for MSX-DOS version 2.{{Cite web|url=https://archive.org/details/MSXDOS2TOOLS|title=MSX-DOS2 Tools User's Manual - MSX-DOS2 TOOLS ユーザーズマニュアル|date=April 1, 1993|via=Internet Archive}}

The {{Mono|sort}} command has also been ported to the IBM i operating system.{{cite web |title=IBM System i Version 7.2 Programming Qshell |language=en |author=IBM |website=IBM |author-link=IBM |url=https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/rzahz/rzahzpdf.pdf?view=kc |access-date=2020-09-05 }}

Syntax

sort [OPTION]... [FILE]...

With no FILE, or when FILE is -, the command reads from standard input.

=Parameters=

class="wikitable" border="6"

! Name

! Description

! Unix

! Plan 9

! Inferno

! FreeBSD

! Linux

! MSX-DOS

! IBM i

{{tt
b}},
{{tt
-ignore-leading-blanks}}

| Ignores leading blanks.

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
c}}

| Check that input file is sorted.

| {{No}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
C}}

| Like -c, but does not report the first bad line.

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
d}},
{{tt
-dictionary-order}}

| Considers only blanks and alphanumeric characters.

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
f}},
{{tt
-ignore-case}}

| Fold lower case to upper case characters.

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
g}},
{{tt
-general-numeric-sort}},
{{nowrap|{{tt|1=--sort=general-numeric}}}}

| Compares according to general numerical value.

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
h}},
{{tt
-human-numeric-sort}},
{{tt|1=--sort=human-numeric}}

| Compare human readable numbers (e.g., 2K 1G).

| {{Yes}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
i}},
{{tt
-ignore-nonprinting}}

| Considers only printable characters.

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
k}},
{{tt|1=--key=}}POS1{{tt|[,}}POS2{{tt|]}}

| Start a key at POS1 (origin 1), end it at POS2 (default end of line)

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
m}}

| Merge only; input files are assumed to be presorted.

| {{No}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
M}},
{{tt
-month-sort}},
{{tt|1=--sort=month}}

| Compares (unknown) < 'JAN' < ... < 'DEC'.

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
n}},
{{tt
-numeric-sort}},
{{tt|1=--sort=numeric}}

| Compares according to string numerical value.

| {{Yes}}

| {{Yes}}

| {{Yes}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
o}} OUTPUT

| Uses OUTPUT file instead of standard output.

| {{No}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
r}},
{{tt
-reverse}}

| Reverses the result of comparisons.

| {{Yes}}

| {{Yes}}

| {{Yes}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
R}},
{{tt
-random-sort}},
{{tt|1=--sort=random}}

| Shuffles, but groups identical keys. See also: shuf

| {{Yes}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
s}}

| Stabilizes sort by disabling last-resort comparison.

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
S}} size,
{{tt|1=--buffer-size=}}size

| Use size for the maximum size of the memory buffer.

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{No}}

| {{No}}

| {{No}}

{{tt
tx}}

| 'Tab character' separating fields is x.

| {{No}}

| {{Yes}}

| {{No}}

| {{No}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
t}} char,
{{tt|1=--field-separator=}}char

| Uses char instead of non-blank to blank transition.

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
T}} dir,
{{tt|1=--temporary-directory=}}dir

| Uses dir for temporaries.

| {{No}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
u}},
{{tt
-unique}}

| Unique processing to suppress all but one in each set of lines having equal keys.

| {{No}}

| {{Yes}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{Yes}}

{{tt
V}},
{{tt
-version-sort}}

| Natural sort of (version) numbers within text

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
w}}

| Like -i, but ignore only tabs and spaces.

| {{No}}

| {{Yes}}

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{No}}

{{tt
z}},
{{tt
-zero-terminated}}

| End lines with 0 byte, not newline

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
-help}}

| Display help and exit

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt
-version}}

| Output version information and exit

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{tt|/R}}

| Reverses the result of comparisons.

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{No}}

{{tt|/S}}

| Specify the number of digits to determine how many digits of each line should be judged.

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{No}}

{{tt|/A}}

| Sort by ASCII code.

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{No}}

{{tt|/H}}

| Include hidden files when using wild cards.

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{No}}

| {{Yes}}

| {{No}}

Examples

=Sort a file in alphabetical order=

$ cat phonebook

Smith, Brett 555-4321

Doe, John 555-1234

Doe, Jane 555-3214

Avery, Cory 555-4132

Fogarty, Suzie 555-2314

$ sort phonebook

Avery, Cory 555-4132

Doe, Jane 555-3214

Doe, John 555-1234

Fogarty, Suzie 555-2314

Smith, Brett 555-4321

=Sort by number=

The -n option makes the program sort according to numerical value. The {{mono|du}} command produces output that starts with a number, the file size, so its output can be piped to {{mono|sort}} to produce a list of files sorted by (ascending) file size:

$ du /bin/* | sort -n

4 /bin/domainname

24 /bin/ls

102 /bin/sh

304 /bin/csh

The {{mono|find}} command with the {{mono|ls}} option prints file sizes in the 7th field, so a list of the {{mono|LaTeX}} files sorted by file size is produced by:

$ find . -name "*.tex" -ls | sort -k 7n

=Columns or fields=

Use the -k option to sort on a certain column. For example, use "-k 2" to sort on the second column. In old versions of sort, the +1 option made the program sort on the second column of data (+2 for the third, etc.). This usage is deprecated.

$ cat zipcode

Adam 12345

Bob 34567

Joe 56789

Sam 45678

Wendy 23456

$ sort -k 2n zipcode

Adam 12345

Wendy 23456

Bob 34567

Sam 45678

Joe 56789

=Sort on multiple fields=

The -k m,n option lets you sort on a key that is potentially composed of multiple fields (start at column m, end at column n):

$ cat quota

fred 2000

bob 1000

an 1000

chad 1000

don 1500

eric 500

$ sort -k2,2n -k1,1 quota

eric 500

an 1000

bob 1000

chad 1000

don 1500

fred 2000

Here the first sort is done using column 2. -k2,2n specifies sorting on the key starting and ending with column 2, and sorting numerically. If -k2 is used instead, the sort key would begin at column 2 and extend to the end of the line, spanning all the fields in between. -k1,1 dictates breaking ties using the value in column 1, sorting alphabetically by default. Note that bob, and chad have the same quota and are sorted alphabetically in the final output.

=Sorting a pipe delimited file=

$ sort -k2,2,-k1,1 -t'|' zipcode

Adam|12345

Wendy|23456

Sam|45678

Joe|56789

Bob|34567

=Sorting a tab delimited file=

Sorting a file with tab separated values requires a tab character to be specified as the column delimiter. This illustration uses the shell's dollar-quote notation

{{cite web|title=The GNU Bash Reference Manual, for Bash, Version 4.2: Section 3.1.2.4 ANSI-C Quoting|url=https://www.gnu.org/software/bash/manual/bashref.html#ANSI_002dC-Quoting|access-date=1 February 2013|date=28 December 2010|publisher=Free Software Foundation, Inc.|quote=Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard.}}

{{cite web|title=KornShell FAQ|url=http://www2.research.att.com/~astopen/download/ksh/faq.html|access-date=3 March 2015|archive-url=https://web.archive.org/web/20130527195150/http://www2.research.att.com/~gsf/download/ksh/faq.html|archive-date=2013-05-27|url-status=live|first1=Glenn S.|last1=Fowler|first2=David G.|last2=Korn|author-link2=David Korn (computer scientist)|first3=Kiem-Phong|last3=Vo|quote=The $'...' string literal syntax was added to ksh93 to solve the problem of entering special characters in scripts. It uses ANSI-C rules to translate the string between the '...'.}}

to specify the tab as a C escape sequence.

$ sort -k2,2 -t $'\t' phonebook

Doe, John 555-1234

Fogarty, Suzie 555-2314

Doe, Jane 555-3214

Avery, Cory 555-4132

Smith, Brett 555-4321

=Sort in reverse=

The -r option just reverses the order of the sort:

$ sort -rk 2n zipcode

Joe 56789

Sam 45678

Bob 34567

Wendy 23456

Adam 12345

=Sort in random=

The GNU implementation has a -R --random-sort option based on hashing; this is not a full random shuffle because it will sort identical lines together. A true random sort is provided by the Unix utility shuf.

=Sort by version=

The GNU implementation has a -V --version-sort option which is a natural sort of (version) numbers within text. Two text strings that are to be compared are split into blocks of letters and blocks of digits. Blocks of letters are compared alpha-numerically, and blocks of digits are compared numerically (i.e., skipping leading zeros, more digits means larger, otherwise the leftmost digits that differ determine the result). Blocks are compared left-to-right and the first non-equal block in that loop decides which text is larger. This happens to work for IP addresses, Debian package version strings and similar tasks where numbers of variable length are embedded in strings.

See also

References

{{Reflist}}

Further reading

  • {{Cite book|last1=Shotts (Jr)|first1=William E.|title=The Linux Command Line: A Complete Introduction|date=2012|publisher=No Starch Press|isbn=978-1593273897}}
  • {{Cite book|author-last=McElhearn|author-first=Kirk|title=The Mac OS X Command Line: Unix Under the Hood|date=2006|publisher=John Wiley & Sons|isbn=978-0470113851}}