Hindi–Urdu transliteration

Hindi–Urdu (Devanagari: हिन्दी-उर्दू, Nastaliq: ہندی-اردو) (also known as Hindustani)^[1]^[2] is the lingua franca of modern-day Northern India and Pakistan (together classically known as Hindustan).^[3] Modern Standard Hindi is officially registered in India as a standard written using the Devanagari script, and Standard Urdu is officially registered in Pakistan as a standard written using an extended Perso-Arabic script.

Hindi–Urdu transliteration (or Hindustani transliteration) is essential for Hindustani speakers to understand each other's text, and it is especially important considering that the underlying language of both the Hindi & Urdu registers are almost the same.^[4] Transliteration is theoretically possible because of the common Hindustani phonology underlying Hindi-Urdu. In the present day, the Hindustani language is seen as a unifying language,^[5] as initially proposed by Mahatma Gandhi to resolve the Hindi–Urdu controversy.^[6] ("Hindustani" is not to be confused with followers of Hinduism, as 'Hindu' in Persian means 'Indo')

Technically, a direct one-to-one script mapping or rule-based lossless transliteration of Hindi-Urdu is not possible, majorly since Hindi is written in an abugida script and Urdu is written in an abjad script, and also because of other constraints like multiple similar characters from Perso-Arabic mapping onto a single character in Devanagari.^[7] However, there have been dictionary-based mapping attempts which have yielded very high accuracy, providing near-to-perfect transliterations.^[8] For literary domains, a mere transliteration between Hindi-Urdu will not suffice as formal Hindi is more inclined towards Sanskrit vocabulary whereas formal Urdu is more inclined towards Persian and Arabic vocabulary; hence a system combining transliteration and translation would be necessary for such cases.^[9]

In addition to Hindi-Urdu, there have been attempts to design Indo-Pakistani transliteration systems for digraphic languages like Sindhi (written in extended Perso-Arabic in Sindh of Pakistan and in Devanagari by Sindhis in partitioned India), Punjabi (written in Gurmukhi in East Punjab and Shahmukhi in West Punjab), Saraiki (written in extended-Shahmukhi script in Saraikistan and unofficially in Sindhi-Devanagari script in India) and Kashmiri (written in extended Perso-Arabic by Kashmiri Muslims and extended-Devanagari by Kashmiri Hindus).^[10]^[11]^[12]

Vowels

Hindustani vowels
IPA	Hindi		ISO 15919	Urdu^[13]				Approxi. English equivalent
IPA	Initial	Final	ISO 15919	Final		Medial	Initial	Approxi. English equivalent
ə^[14]	अ	ा	a	ـہ	ـا	ـ◌َـ	اَ	about
aː	आ	ा	ā	ـا			آ	far
ɪ	इ	ि	i	ـی		ـ◌ِـ	اِ	still
iː	ई	ी	ī	ـی		◌ِـیـ	اِیـ	fee
ʊ	उ	ु	u	ـو		ـ◌ُـ	اُ	book
uː	ऊ	ू	ū	◌ُـو			اُو	moon
eː	ए	े	ē	ے		ـیـ	ایـ	mate^{[verification needed]}
ɛː	ऐ	ै	ai	◌َـے		◌َـیـ	اَیـ	fairy
oː	ओ	ो	ō	ـو			او	force
ɔː	औ	ौ	au	◌َـو			اَو	lot (Received Pronunciation)
ʰ^[15]			h	ھ				(Aspirated sounds) cake
◌̃^[16]		ँ	m̐	ں		ـن٘ـ	ن٘	nasal vowel faun ([ãː, õː], etc.)
◌̃^[16]		ं	ṁ	ں		ـن٘ـ	ن٘	jungle

Consonants

Hindustani has a rich set of consonants in its full-alphabet, since it has a mixed-vocabulary (rekhta) derived from Old Hindi (from Dehlavi), with loanwords from Parsi (from Pahlavi) and Arabic languages, all of which itself are from 3 different language-families respectively: Indo-Aryan, Iranian and Semitic.

The following table provides an approximate one-to-one mapping for Hindi-Urdu consonants,^[17] especially for computational purposes (lossless script conversion). Note that this direct script conversion will not yield correct spellings,^[18] but rather a readable text for both the readers. Note that Hindi–Urdu transliteration schemes can be used for Punjabi as well, for Gurmukhi (Eastern Punjabi) to Shahmukhi (Western Punjabi) conversion, since Shahmukhi is a superset of the Urdu alphabet (with 2 extra consonants) and the Gurmukhi script can be easily converted to the Devanagari script.

Hindustani Consonants
PersoArabic	Roman	Devanagari	Comments
ک	k	क
کھ	kh	ख
ق	q	क़	The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as क^[19]^[20]
خ	k͟h	ख़	The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as ख^[19]^[20]
گ	g	ग
غ	g͟h	ग़	The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as ग^[19]^[20]
گھ	gh	घ
چ	c	च
چھ	ch	छ
ج	j	ज
جھ	jh	झ
ز	z	ज़	The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as ज^[19]^[20]
ذ	ẕ	ज़़	(Approximated in Devanagari for one-to-one map. Actually same sound as ज़)
ض	ẓ	ॹ	(Approximated in Devanagari for one-to-one map. Actually same sound as ज़)
ظ	z̤	ॹ़	(Approximated in Devanagari for one-to-one map. Actually same sound as ज़)
ژ	zh	झ़	Used in direct Farsi loan-words
ٹ	ṭ	ट
ٹھ	ṭh	ठ
ڈ	ḍ	ड
ڈھ	ḍh	ढ
ڑ	ṛ	ड़	Colloquially, ṛ is often confused with ḍ and vice versa
ڑھ	ṛh	ढ़	Colloquially, ṛh is often confused with ḍh and vice versa
ت	t	त
تھ	th	थ
ط	t̤	त़	(Approximated in Devanagari for one-to-one map. Actually same sound as त)
د	d	द
دھ	dh	ध
ن	n	न
پ	p	प
پھ	ph	फ
ف	f	फ़	The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as फ^[19]^[20]
ب	b	ब
بھ	bh	भ
م	m	म
ی	y	य
ر	r	र
ل	l	ल
و	v	व	و is transcribed as /w/ for Arabic words and /v/ for Indo-Iranian words
و	w	व़
ش	sh	श
س	s	स
ص	ṣ	स़	(Approximated in Devanagari for one-to-one map. Actually same sound as स)
ث	s̱	स़़	(Approximated in Devanagari for one-to-one map. Actually same sound as स)
ہ	h	ह
ح	ḥ	ह़	(Approximated in Devanagari for one-to-one map. Actually same sound as ह)
ۃ	ẖ	ह॒	Used only for Arabic-derived words (approximated in Devanagari)
ھ	h	ह	ھ is generally only used for aspirated consonants. Any individual usage is generally considered an error and to be taken as ہ
ع	ʿ	ʿ	Variable consonant placeholder

Sanskrit consonants

The following consonants are mostly used in words that are directly borrowed or adapted from Sanskrit.

Perso-Arabic	Roman	Devanagari	Remarks
ن٘	ṅ	ङ
ݩ	ñ	ञ	ݩ was introduced to write Gojri^[21]
ݨ	ṇ	ण	ݨ was introduced to write Shahmukhi^[21]
لؕ	ḷ	ळ	Rarely used in Shahmukhi
ݜ	ṣh	ष	ݜ was introduced to write Shina^[21]
ڔّ	r̥	ऋ

Implosive consonants

These consonants are mostly found only in languages like Sindhi and Saraiki.

Perso-Arabic	Roman	Devanagari
ڳ	g̤	ॻ
ڄ	j̈	ॼ
ݙ/ڏ	d̤	ॾ
ٻ	ḇ	ॿ

Numerals

Usage	Numeral System	Digits
Urdu	East-Arabic	۰	۱	۲	۳	۴	۵	۶	۷	۸	۹
International	Hindu-Arabic	0	1	2	3	4	5	6	7	8	9
Hindi	Modern Devanagari	०	१	२	३	४	५	६	७	८	९

Punctuations & Symbols

Script	Period	Question Mark	Comma	Semi-colon	Slash	Percent	End of verse
Perso-Arabic	۔	؟	،	؛	؍	٪	۝
Modern Devanagari	।	?	,	;	/	%	॥

Sample text

The following is an excerpt from the Hindustani poem Tarānah-e-Hindi written by Muhammad Iqbal.

Perso-Arabic	Devanagari	Roman	English translation
سَارے جَہَاں سے اَچّھَا، ہِنْدُوسِتَاں ہَمَارَا۔ ہَمْ بُلْبُلیں ہَیں اِسْکِی، یَہْ گُلْسِتَاں ہَمَارَا۔۔	सारे जहाँ से अच्छा, हिन्दुसिताँ हमारा। हम बुलबुलें हैं इसकी, यह गुलसिताँ हमारा॥	sāre jahā̃ se acchā, hindusitā̃ hamārā. ham bulbulẽ haĩ iskī, yah gulsitā̃ hamārā..	Better than the entire world, is our India. We are its nightingales, and it (is) our garden abode.

References

^ "About Hindi-Urdu". North Carolina State University. Archived from the original on 15 August 2009. Retrieved 9 August 2009.
^ Ray, Aniruddha (2011). The Varied Facets of History: Essays in Honour of Aniruddha Ray. Primus Books. ISBN 978-93-80607-16-0. There was the Hindustani Dictionary of Fallon published in 1879; and two years later (1881), John J. Platts produced his Dictionary of Urdu, Classical Hindi and English, which implied that Hindi and Urdu were literary forms of a single language. More recently, Christopher R. King in his One Language, Two Scripts (1994) has presented the late history of the single spoken language in two forms, with the clarity and detail that the subject deserves.
^ Ashmore, Harry S. (1961). Encyclopaedia Britannica: a new survey of universal knowledge, Volume 11. Encyclopædia Britannica. p. 579. The everyday speech of well over 50,000,000 persons of all communities in the north of India and in West Pakistan is the expression of a common language, Hindustani.
^ Lehal, Gurpreet Singh; Saini, Tejinder Singh (December 2012). "Development of a Complete Urdu-Hindi Transliteration System". Proceedings of COLING 2012: Posters. Mumbai, India: The COLING 2012 Organizing Committee: 643–652.
^ David Lunn, Dawn com (28 January 2019). "Urdu and Hindi could be one language called Hindustani. Will the politics of language allow it?". Scroll.in. Retrieved 2021-04-08.
^ "After experiments with Hindi as national language, how Gandhi changed his mind". Prabhu Mallikarjunan. The Feral. 3 October 2019.
^ Visweswariah, Karthik; Chenthamarakshan, Vijil; Kambhatla, Nandakishore (August 2010). "Urdu and Hindi: Translation and sharing of linguistic resources". Coling 2010: Posters. Coling 2010 Organizing Committee: 1283–1291.^{[dead link‍]}
^ Lehal, Gurpreet Singh; Saini, Tejinder Singh (2010). "A Hindi to Urdu Transliteration System" (PDF). {{cite journal}}: Cite journal requires |journal= (help)
^ Durrani, Nadir; Sajjad, Hassan; Fraser, Alexander; Schmid, Helmut (July 2010). "Hindi-to-Urdu Machine Translation through Transliteration". Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics: 465–474.
^ "Perso-Arabic To Indic Script Transliteration". sangam.learnpunjabi.org. Retrieved 2021-04-07.
^ "Saraiki - Devanagari Machine Transliteration System - SDMTS". www.sanlp.org. Retrieved 2021-08-09.
^ Lawaye, Aadil; Kak, Aadil; Mehdi, Nali (January 2010). "Building a Cross Script Kashmiri Converter: Issues and Solutions". Proceedings of Oriental COCOSDA.
^ Diacritics in Urdu are normally not written and usually implied and interpreted based on the context of the sentence
^ [ɛ] occurs as a conditioned allophone of /ə/ near an /ɦ/ surrounded on both sides by schwas. Usually, the second schwa undergoes syncopation, and the resultant is just an [ɛ] preceding an /ɦ/. Hindi does not have a letter to represent ə as it is usually implied
^ Hindi has individual letters for aspirated consonants whereas Urdu has a specific letter to represent an aspirated consonant
^ No words in Hindustani can begin with a nasalised letter/diacritic. In Urdu the initial form (letter) for representing a nasalised word is: ن٘ (nūn + small nūn ghunna diacritic)
^ NC, Gokul (2021-05-07), GokulNC/Indic-PersoArabic-Script-Converter, retrieved 2021-05-28
^ Ahmed, Nisar. "An efficient Hindi-Urdu Transliteration System" (PDF). 5th International. Multidisciplinary Conference, 29-31 Oct., at, ICBS, Lahore.
^ ^a ^b ^c ^d ^e Shapiro, Michael C. (1989). A Primer of Modern Standard Hindi. Motilal Banarsidass Publ. p. 20. ISBN 978-81-208-0508-8. In addition to the basic consonantal sounds discussed in sections 3.1 and 3.2, many speakers use any or all five additional consonants (क़ ḳ, ख़ ḳh,ग़ ġ, ज़ z, फ़ f) in words of foreign origin (primarily from Persian, Arabic, English, and Portuguese). The last two of these, ज़ z and फ़ f, are the initial sounds in English zig and fig respectively. The consonant क़ ḳ is a voiceless uvular stop, somewhat like k, but pronounced further back in the mouth. ख़ ḳh is a voiceless fricative similar in pronunciation to the final sound of the German ach. ग़ ġ is generally pronounced as a voiceless uvular fricative, although it is occasionally heard as a stop rather than a fricative. In devanāgari each of these five sounds is represented by the use of a subscript dot under one of the basic consonant signs. In practice, however, the dot is often omitted, leaving it to the reader to render the correct pronunciation on the basis of his prior knowledge of the language.
^ ^a ^b ^c ^d ^e Pandey, Dipti; Mondal, Tapabrata; Agrawal, S. S.; Bangalore, Srinivas (2013). "Development and suitability of Indian languages speech database for building watson based ASR system". 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE). p. 3. doi:10.1109/ICSDA.2013.6709861. ISBN 978-1-4799-2378-6. S2CID 26461938. Only in Hindi 10 Phonemes व /v/ क़ /q/ ञ /ɲ/ य /j/ ष /ʂ/ ख़ /x/ ग़ /ɣ/ ज़ /z/ झ़ /ʒ/ फ़ /f/
^ ^a ^b ^c "Proposal for extensions to the Arabic block" (PDF).

[NCSU-Hindustani-1] "About Hindi-Urdu". North Carolina State University. Archived from the original on 15 August 2009. Retrieved 9 August 2009.

[Ray2011-2] Ray, Aniruddha (2011). The Varied Facets of History: Essays in Honour of Aniruddha Ray. Primus Books. ISBN 978-93-80607-16-0. There was the Hindustani Dictionary of Fallon published in 1879; and two years later (1881), John J. Platts produced his Dictionary of Urdu, Classical Hindi and English, which implied that Hindi and Urdu were literary forms of a single language. More recently, Christopher R. King in his One Language, Two Scripts (1994) has presented the late history of the single spoken language in two forms, with the clarity and detail that the subject deserves.

[Ashmore1961-3] Ashmore, Harry S. (1961). Encyclopaedia Britannica: a new survey of universal knowledge, Volume 11. Encyclopædia Britannica. p. 579. The everyday speech of well over 50,000,000 persons of all communities in the north of India and in West Pakistan is the expression of a common language, Hindustani.

[4] Lehal, Gurpreet Singh; Saini, Tejinder Singh (December 2012). "Development of a Complete Urdu-Hindi Transliteration System". Proceedings of COLING 2012: Posters. Mumbai, India: The COLING 2012 Organizing Committee: 643–652.

[5] David Lunn, Dawn com (28 January 2019). "Urdu and Hindi could be one language called Hindustani. Will the politics of language allow it?". Scroll.in. Retrieved 2021-04-08.

[6] "After experiments with Hindi as national language, how Gandhi changed his mind". Prabhu Mallikarjunan. The Feral. 3 October 2019.

[7] Visweswariah, Karthik; Chenthamarakshan, Vijil; Kambhatla, Nandakishore (August 2010). "Urdu and Hindi: Translation and sharing of linguistic resources". Coling 2010: Posters. Coling 2010 Organizing Committee: 1283–1291.^{[dead link‍]}

[8] Lehal, Gurpreet Singh; Saini, Tejinder Singh (2010). "A Hindi to Urdu Transliteration System" (PDF). {{cite journal}}: Cite journal requires |journal= (help)

[9] Durrani, Nadir; Sajjad, Hassan; Fraser, Alexander; Schmid, Helmut (July 2010). "Hindi-to-Urdu Machine Translation through Transliteration". Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics: 465–474.

[10] "Perso-Arabic To Indic Script Transliteration". sangam.learnpunjabi.org. Retrieved 2021-04-07.

[11] "Saraiki - Devanagari Machine Transliteration System - SDMTS". www.sanlp.org. Retrieved 2021-08-09.

[12] Lawaye, Aadil; Kak, Aadil; Mehdi, Nali (January 2010). "Building a Cross Script Kashmiri Converter: Issues and Solutions". Proceedings of Oriental COCOSDA.

[13] Diacritics in Urdu are normally not written and usually implied and interpreted based on the context of the sentence

[14] [ɛ] occurs as a conditioned allophone of /ə/ near an /ɦ/ surrounded on both sides by schwas. Usually, the second schwa undergoes syncopation, and the resultant is just an [ɛ] preceding an /ɦ/. Hindi does not have a letter to represent ə as it is usually implied

[15] Hindi has individual letters for aspirated consonants whereas Urdu has a specific letter to represent an aspirated consonant

[16] No words in Hindustani can begin with a nasalised letter/diacritic. In Urdu the initial form (letter) for representing a nasalised word is: ن٘ (nūn + small nūn ghunna diacritic)

[17] NC, Gokul (2021-05-07), GokulNC/Indic-PersoArabic-Script-Converter, retrieved 2021-05-28

[18] Ahmed, Nisar. "An efficient Hindi-Urdu Transliteration System" (PDF). 5th International. Multidisciplinary Conference, 29-31 Oct., at, ICBS, Lahore.

[Shapiro1989-19] Shapiro, Michael C. (1989). A Primer of Modern Standard Hindi. Motilal Banarsidass Publ. p. 20. ISBN 978-81-208-0508-8. In addition to the basic consonantal sounds discussed in sections 3.1 and 3.2, many speakers use any or all five additional consonants (क़ ḳ, ख़ ḳh,ग़ ġ, ज़ z, फ़ f) in words of foreign origin (primarily from Persian, Arabic, English, and Portuguese). The last two of these, ज़ z and फ़ f, are the initial sounds in English zig and fig respectively. The consonant क़ ḳ is a voiceless uvular stop, somewhat like k, but pronounced further back in the mouth. ख़ ḳh is a voiceless fricative similar in pronunciation to the final sound of the German ach. ग़ ġ is generally pronounced as a voiceless uvular fricative, although it is occasionally heard as a stop rather than a fricative. In devanāgari each of these five sounds is represented by the use of a subscript dot under one of the basic consonant signs. In practice, however, the dot is often omitted, leaving it to the reader to render the correct pronunciation on the basis of his prior knowledge of the language.

[PandeyMondalAgrawalBangalore2013-20] Pandey, Dipti; Mondal, Tapabrata; Agrawal, S. S.; Bangalore, Srinivas (2013). "Development and suitability of Indian languages speech database for building watson based ASR system". 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE). p. 3. doi:10.1109/ICSDA.2013.6709861. ISBN 978-1-4799-2378-6. S2CID 26461938. Only in Hindi 10 Phonemes व /v/ क़ /q/ ञ /ɲ/ य /j/ ष /ʂ/ ख़ /x/ ग़ /ɣ/ ज़ /z/ झ़ /ʒ/ फ़ /f/

[arabic_ext_2002-21] "Proposal for extensions to the Arabic block" (PDF).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]