caverphone

The Caverphone within linguistics and computing, is a phonetic matching algorithm{{cite book|last1=Milette|first1=Greg|last2=Stroud|first2=Adam|title=Professional Android Sensor Programming|url=https://books.google.com/books?id=dZjo-254FucC&pg=PT421|accessdate=19 February 2013|date=2012-05-18|publisher=John Wiley & Sons|isbn=9781118240458|pages=421–}}{{cite journal|last1 = Phua | first1 = Clifton | first2 = Vincent | last2 = Lee | first3 = Kate | last3 = Smith |year=2006|title=The Personal Name Problem And a Recommended Data Mining Solution|journal=Encyclopedia of Data Warehousing and Mining|citeseerx = 10.1.1.127.5111 }} invented to identify English names with their sounds, originally built to process a custom dataset compound between 1893 and 1938 in southern Dunedin, New Zealand.{{cite web|url= https://xlinux.nist.gov/dads/HTML/caverphone.html | title= Caverphone | publisher= National Institute of Standards and Technology | accessdate= 2018-08-20}} Started from a similar concept as metaphone, it has been developed to accommodate and process general English since then.

Etymology

The Caverphone was created by David Hood in the Caversham Project at the University of Otago in New Zealand in 2002, revised in 2004. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a "commonly recognisable form". The algorithm was intended to apply to those names that could not easily be matched between electoral rolls, after the exact matches were removed from the pool of potential matches. The algorithm is optimised for accents present in the study area (southern part of the city of Dunedin, New Zealand).

Procedure

=Caverphone 1.0=

The rules of the algorithm are applied consecutively to any particular name, as a series of replacements.

The algorithm is as follows:

  1. Convert to lowercase
  2. Remove anything not A-Z
  3. If the name starts with...
  4. cough, replace it by cou2f
  5. rough, replace it by rou2f
  6. tough, replace it by tou2f
  7. enough, replace it by enou2f
  8. gn, replace it by 2n
  9. If the name ends with
  10. mb, replace it by m2
  11. Replace
  12. cq with 2q
  13. ci with si
  14. ce with se
  15. cy with sy
  16. tch with 2ch
  17. c with k
  18. q with k
  19. x with k
  20. v with f
  21. dg with 2g
  22. tio with sio
  23. tia with sia
  24. d with t
  25. ph with fh
  26. b with p
  27. sh with s2
  28. z with s
  29. any initial vowel with an A
  30. all other vowels with a 3
  31. 3gh3 with 3kh3
  32. gh with 22
  33. g with k
  34. groups of the letter s with a S
  35. groups of the letter t with a T
  36. groups of the letter p with a P
  37. groups of the letter k with a K
  38. groups of the letter f with a F
  39. groups of the letter m with a M
  40. groups of the letter n with a N
  41. w3 with W3
  42. wy with Wy
  43. wh3 with Wh3
  44. why with Why
  45. w with 2
  46. any initial h with an A
  47. all other occurrences of h with a 2
  48. r3 with R3
  49. ry with Ry
  50. r with 2
  51. l3 with L3
  52. ly with Ly
  53. l with 2
  54. j with y
  55. y3 with Y3
  56. y with 2
  57. remove all
  58. 2
  59. 3
  60. put six 1 on the end
  61. take the first six characters as the code

=Caverphone 2.0=

  1. Start with a word
  2. Convert to lowercase
  3. Remove anything not in the standard alphabet (typically a-z){{notetag|This may vary if the set of letters includes characters such as æ, ā, or ø }}
  4. Remove final e
  5. If the name starts with
  6. cough make it cou2f
  7. rough make it rou2f
  8. tough make it tou2f
  9. enough make it enou2f
  10. trough make it trou2f
  11. gn make it 2n
  12. If the name ends with
  13. mb make it m2
  14. Replace
  15. cq with 2q
  16. ci with si
  17. ce with se
  18. cy with sy
  19. tch with 2ch
  20. c with k
  21. q with k
  22. x with k
  23. v with f
  24. dg with 2g
  25. tio with sio
  26. tia with sia
  27. d with t
  28. ph with fh
  29. b with p
  30. sh with s2
  31. z with s
  32. an initial vowel{{notetag|Vowels are normally a, e, i, o, u but depending on the data might include characters such as æ, ā, or ø}} with an A
  33. all other vowels with a 3
  34. j with y
  35. an initial y3 with Y3
  36. an initial y with A
  37. y with 3
  38. 3gh3 with 3kh3
  39. gh with 22
  40. g with k
  41. groups of the letter s with a S
  42. groups of the letter t with a T
  43. groups of the letter p with a P
  44. groups of the letter k with a K
  45. groups of the letter f with a F
  46. groups of the letter m with a M
  47. groups of the letter n with a N
  48. w3 with W3
  49. wh3 with Wh3
  50. if the name ends in w replace the final w with 3
  51. w with 2
  52. an initial h with an A
  53. all other occurrences of h with a 2
  54. r3 with R3
  55. if the name ends in r replace the final r with 3
  56. r with 2
  57. l3 with L3
  58. if the name ends in l replace the final l with 3
  59. l with 2
  60. remove all 2s
  61. if the name end in 3, replace the final 3 with A
  62. remove all 3s
  63. put ten 1s on the end
  64. take the first ten characters as the code

----

{{notefoot}}

Examples

=Caverphone 1.0=

Lee -> lee

lee -> l33

l33 -> L33

L33 -> L

L -> L111111

L111111 -> L11111

Thompson -> thompson

thompson -> th3mps3n

th3mps3n -> th3mpS3n

th3mpS3n -> Th3mpS3n

Th3mpS3n -> Th3mPS3n

Th3mPS3n -> Th3MPS3n

Th3MPS3n -> Th3MPS3N

Th3MPS3N -> T23MPS3N

T23MPS3N -> TMPSN

TMPSN111111 -> TMPSN1

=Caverphone 2.0=

Lee -> lee

lee -> le

le -> l3

l3 -> L3

L3 -> LA

LA -> LA1111111111

LA1111111111 -> LA11111111

Thompson -> thompson

thompson -> th3mps3n

th3mps3n -> th3mpS3n

th3mpS3n -> Th3mpS3n

Th3mpS3n -> Th3mPS3n

Th3mPS3n -> Th3MPS3n

Th3MPS3n -> Th3MPS3N

Th3MPS3N -> T23MPS3N

T23MPS3N -> TMPSN

TMPSN1111111111 -> TMPSN11111

See also

References

{{Reflist}}