Talk:Associative array
This is the talk page for discussing improvements to the Associative array article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1Auto-archiving period: 2 years |
This level-5 vital article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
The Augmented map article was blanked on 2024-01-16 and that title now redirects to Associative array. The contents of the former article are available in the redirect's history; for the discussion at that location, see the redirect's talk page. |
appending
editSomeone should describe how to insert elements into a hash table. — Preceding unsigned comment added by 71.141.127.226 (talk) 22:39, 10 September 2006 (UTC)
Naming discussion (assoc array vs. dict vs. hash etc)
editSo far as I can tell, Wikipedia has decided/ordained that the canonical name for the abstract data type dictionary/map/associative array is "associative array". In my opinion, this decision should be revisited. Has this been discussed before, and if so, can someone point me to it?
In any case, here's the list of pros and cons I see for the current term:
pros:
- Unambiguous. There are no other meanings for "associative array", so it doesn't need to be clarified when mentioned in other articles.
cons:
- The only language/community that uses the term "associative array" is the PHP community.
- Each of the terms "dictionary" and "map" are used far more widely than "associative array", both in the software engineering and computer science fields.
- The term is misleading in that arrays are a completely different kind of data structure. Novices tend to gloss over this distinction because in PHP and Javascript you can use arrays with arbitrary keys, so they are really dictionaries.
I propose that we rename this article to "Map (computer science)" and redirect "Associative Array" to that.
Just to list the alternatives, right off the bat we can say that "Hashtable" is not quite right, because it implies a particular implementation of the abstract data type.
The next runner up is "Dictionary (computer science)", which I think is a very strong candidate, since it seems to see the most use in programming language standard libraries (check the list of terms used in language standard libraries in the article). However, I don't think it sees much use in the computer science field, and while I personally prefer the name "dictionary" in my own discussions, I feel like "map" more correctly refers to the abstract data type and less to the specific task of mapping string keys to values.
How do other people feel about this?
Rkleckner (talk) 18:16, 14 April 2010 (UTC)
- Wikipedia is of two minds on the topic, since the comparison article is Comparison of programming languages (mapping). For my part, I think "mapping" is an excellent term, and would support a rename to "Mapping data type" (by parallel with Array data type).—chaos5023 (talk) 18:19, 14 April 2010 (UTC)
- "Map" is a term with too many meanings already, even in CS alone. Consider memory mapping (mmap), character map, bitmap, mipmap, key map (xkbmap), and so on. Most of those have nothing to do with the data structure. "Associative array" makes it absolutely clear we're talking about a specific data structure; while we could talk about another data structure to implement the map of something (example: an actual geographic map). Niczar ⏎ 18:57, 14 April 2010 (UTC)
- It's very true that associative array is totally unambiguous. However, I don't think it is the most popular, especially in the field of computer science. While it may be best for the purposes of Wikipedia to use the unambiguous term, it promotes this term, which I believe is confusing and inhibits understanding. I'm more conflicted about it now than when I started, so I'm just going to put in my two cents and let the rest of you decide what to do. Rkleckner (talk) 19:08, 14 April 2010 (UTC)
- mmap maps virtual addresses onto something else, charmaps map integers to glyphs, keymaps map keys to signals. "map" is not ambiguous at all, it's a well understood frequently used term in computer science. 99.224.97.6 (talk) —Preceding undated comment added 06:36, 23 May 2011 (UTC).
- FWIW, though, "associative array" has seen a lot more usage than just in PHP; the term has a lot of history. In all likelihood it's the name of the article (and is used in PHP docs) because it's probably the oldest term for the concept. —chaos5023 (talk) 18:22, 14 April 2010 (UTC)
- I think the term "association list" has a lot of history, but I don't think "associative array" has as much. I take back what I said about it being restricted to the PHP community, though. there is more usage of the term, for example in the D programming language. Rkleckner (talk) 19:08, 14 April 2010 (UTC)
- I'd like to see some actual statistics correlated with popularity before making such changes. --Cybercobra (talk) 18:49, 14 April 2010 (UTC)
- I'm pretty happy with "associative array". It is an old, standard term. Gafter (talk) 00:04, 15 April 2010 (UTC)
- The problem with "associative array" is that many dictionary implementations (for example trees) have nothing to do with arrays.
- Dictionary is a more classic terminology that is more abstract and is used a lot in data structure books. Sabioverde (talk) 05:21, 20 August 2022 (UTC)
- Dictionary is also used in CS in the sense "data dictionary", which is distinct from an ADT. 165.146.42.194 (talk) 11:54, 15 April 2010 (UTC)
- You are incorrect saying that "Dictionary" is not very used in the computer science field.
- Rather than looking at how languages name Dictionaries. It's best to look at Data Structures Books. Such as the classic Lewis and Denenberg (1991) which consistently refers to Dictionaries by this name (see page 175 for example).
- Another example is the 1983 book "Data Structures and Algorithms" by Alfred Aho. For example in chapter 4.6.
- The very much used book "Introduction to Algorithms" by Thomas Cormen, et al. used in Universities such as MIT (see OpenCourseware for example). Uses the term "dictionary" as well. For example in the third edition of this book on page 229 at the beginning of Part 3.
- So "dictionary" is widely used in DATA STRUCTURES and Algorithm books to explain the concept of dictionary operations. Even in very widely used books such a Cormen, et al as late as 2009.
- I'm not sure why the Wikipedia discussions focus on what languages like PHP, Python, or Java call it, because it's largely irrelevant. We're talking about Data Structures and Abstract Data Types.
- And Dictionary is the proper name for this Abstract Data Type which is implemented with Data Structures such as linked lists (see Denenberg), trees of various sorts, hash tables, skip lists and many other data structures. Sabioverde (talk) 03:43, 10 September 2022 (UTC)
Consider renaming this article to "Map (abstract data type)"
edit- This will be consistent and symmetric with the naming of other composite ADT articles:
- 'map' is more popular term than 'associative array', both in computer science and software engineering:
- the most popular languages (C++, Java, JavaScript) use this term
- many less popular languages use this term as well (Go, Scala, Haskell, Clojure)
- 'dictionary' (C#, Python, Tcl) or 'hash' (Perl, Ruby) terms are also used
- 'associative array' is the least popular term (PHP)
- Even this article links to multimap and bidirectional map saying that they 'generalize an associative array'. It would be much more proper and natural to say that multimap generalizes map, rather than associative array.
-- Piotr Jurkiewicz (talk) 17:15, 3 June 2016 (UTC)
- I did some academic literature search to find usages since 2018 of "associative array", "dictionary", "map" in combination with "data structure":
- The search results for "map" were the most polluted, overlapping with map in the geography sense.
- [3], giving a completely new definition for associative array as row x column -> value. It seems Kepler has redefined the term in database theory.
- [4] gives the traditional definition for "associative array". And their usage is a key-value .dat file. OK. But they cite Kademlia for prior work. WTF, a DHT has completely distinct usages from a key-value CSV. Unreliable source.
- [5] uses all three terms in one sentence in the intro (mentioning that Python uses 'dictionary'), but settles on the term "2D associative arrays" for storing the values of
ins(x,y)
. AFAICT this isn't a normative use of the term, but it actually agrees with Kepler's usage. - [6] shows that new (academic) languages are copying Python's use of 'dictionary'
- [7] A handbook on data structures [8], originally published in 2004, using 'dictionary'. It's a bit suspicious that many of the authors have Indian names, but given the number of citations it seems quite reliable.
- [9]. A new data structure, for "dynamic dictionaries for multisets". The paper is a bit dense but it seems the operations are close to the standard definition.
- [10], apparently some people use the term "dictionary" in machines learning to mean matrix. But this is excluded by searching for "data structure" with quotes.
- [11] another dictionary data structure reference
- [12] Similarly the dictionary of data structures uses 'dictionary'.
- tl;dr is that academic usage has settled on 'dictionary' and the other terms mean other things now. So now reviewing WP:CRITERIA:
Criteria | Map (computer science) | Dictionary (data structure) | Associative array |
---|---|---|---|
Recognizability | Medium (would be Low except for C++) | High | Medium |
Naturalness | Medium | High | Low |
Precision | Low | High | Medium |
Concision | 22 | 27 | 17 |
Consistency | Low | Medium | Low |
- The consistency could be solved by renaming to XX (abstract data type). But this is long, and abstract is redundant. IMO all the pages should be renamed to XX (data type). So my vote is for Dictionary (data type). --Mathnerd314159 (talk) 23:24, 26 January 2022 (UTC)
- The use of "Dictionary" LONG predates the Python language. For example in Lewis and Denenberg's book "Data Structures and their algorithms" on page 175 the title of Chapter 6.1 is "SETS AND DICTIONARIES AS ABSTRACT DATA TYPES". This book is from 1991 and was used in major Universities in the United States in the 90s.
- It defines the "Dictionary" as an ADT by defining the set of operations it has to implement. I quote their definition here:
- "A set abstract data type with just the operations MakeEmptySet, IsEmptySet, Insert, Delete, and LookUp is called a dictionary. We begin by examining implementations of the dictionary abstract data type, noting occasionally when the implementation permits efficient implementation of other set operations. In Chapter 9 we return to the question of representations specifically designed to
- support other set operations. "
- The current article makes the following mistakes:
- - Using "Associative Array" when Dictionaries are not necessarily arrays at all. They can be trees, hash tables, arrays, lists, or whatever implementation we want to give it. For example Lewis and Denenberg suggest using Linked Lists as an implementation of Dictionaries (and Sets as well), by simply moving the last used item to the front.
- - Confusing ADTs with Data Structures. ADTs define the set of operations and is a terminology traditionally used back in the Pascal era when we defined new Data Types that weren't supported directly by the language. Data Structures are the implementation of ADTs. So a Dictionary isn't a data structure.
- We should refer to the proper terminology used in Data Structures books. We also have a precedent with languages such as Smalltalk 80 which referred to the Abstract Type as a Dictionary. (see "Smalltalk-80 the Language and its implementation" published in 1983 by Goldberg and Robson https://dl.acm.org/doi/10.5555/273 for example on page 67) Sabioverde (talk) 03:08, 10 September 2022 (UTC)
- The reason "abstract" is added is because of a longstanding explanation of classes of data structures that implement certain operations. One could say that all implementations of the Dictionary ADT are the same "data type" but that would be implementation dependent as well. Data types are specifically defined types in languages. Abstract Data Types are more abstract notions. Refer to Lewis and Denenberg (cited above) or your classic Pascal book for an explanation on the distinction. We should not deviate from long standing definitions.
- Here's another use of "dictionary" in a course catalog from the University of Washington: https://courses.cs.washington.edu/courses/cse326/ Sabioverde (talk) 03:18, 10 September 2022 (UTC)
Requested move 26 January 2022
edit- The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.
The result of the move request was: (non-admin closure) NO CONSENSUS User:力 (powera, π, ν) 23:46, 5 February 2022 (UTC)
There is quite a bit of discussion here, but nothing resembling consensus for any particular move -- and the discussion is too broad for a re-list to solve this. This should be discussed on a WikiProject before there is a re-nomination, but no prejudice against a re-nomination in March. User:力 (powera, π, ν) 23:46, 5 February 2022 (UTC)
- Associative array → Dictionary (data type)
- Collection (abstract data type) → Collection (data type)
- Container (abstract data type) → Container (data type)
- List (abstract data type) → List (data type)
- Set (abstract data type) → Set (data type)
- Graph (abstract data type) → Graph (data type)
- Tree (data structure) → Tree (data type)
- Stack (abstract data type) → Stack (data type)
- Queue (abstract data type) → Queue (data type)
– See my reasoning above for renaming associative array to dictionary. As for the rest, I think that leaving out "abstract" is more WP:CONCISE. Also, the word "abstract" adds confusion when the concrete data structure / syntax in many languages is also called list, set, etc. and this concrete implementation is discussed in the article, e.g. the list in Set_(abstract_data_type)#Language_support Mathnerd314159 (talk) 23:59, 26 January 2022 (UTC)
Discussion
editI tend to support this but it may not work well for stack
s... I suggest we pursue global RFC for this. I also would like to refer you to Abstract data type article to get sense of rationale behind titles. Courtesy pinging @TuukkaH:. He had pretty good summary on differences here: Talk:Data type#Topic of this article. AXONOV (talk) ⚑ 09:27, 27 January 2022 (UTC)
- Stack (data type) is just a redirect to Stack (abstract data type), it should be fine to use the shorter name. That redirect was actually one of the reasons I proposed the move. As far as ADT vs data type, Data type#Abstract data types defines ADTs as a subset of data types, so the move seems correct from a hierarchy perspective. As far as "global RFC", what do you mean? It is already listed at Wikipedia:Requested_moves#January_26,_2022, this seems to be the maximum level of visibility Wikipedia has for proposed moves. --Mathnerd314159 (talk) 19:21, 28 January 2022 (UTC)
- Agree The "abstract" in "abstract data type" doesn't add anything, and thus is unnecessary. I'm agnostic about "associative array" vs. "dictionary". --Macrakis (talk) 22:10, 30 January 2022 (UTC)
- Disagree. It gives a abstract description of the data type ideas regardless of the language in use. AXONOV (talk) ⚑ 16:43, 5 February 2022 (UTC)
Tree data structure versus Tree data type
editI disagree with the rename of Tree (data structure) to Tree (data type). The terms "tree data structure" and "tree data type" denote different notions. The difference is described in Tree_(data_structure)#Data_type_versus_data_structure and Rose_tree#Tree_data_type. The latter page states the following relationship between the terms:
- A tree data type is a set of values of tree data structures.
Moreover, the page also provides a diagrammatization of the difference between a "tree data structure" and a "tree data value".Hundblue (talk) 17:14, 30 January 2022 (UTC)
Dictionary: data structure / data type / abstract data type
editI think the 3 terms denote distinct notions.
- A single dictionary (data structure), or just a dictionary, is a representation of a finite map posessing its own identity. Two distinct dictionaries can represent the same finite map - in that case, the dictionaries have the same value (namely the finite map).
- A single dictionary (data type) is a set of dictionaries (or something that represents such a set) which are then called instances of that type.
- A single dictionary (abstract data type) is a set of dictionary values. In context of functional programming where there are only values (i.e. two distinct entities cannot have the same value), the "abstract" adjective is redundant.
d1 = {"a":5}
d2 = {"a":5}
Consider the JavaScript / Ruby / Python code shown in the box on the right in which dictionaries d1
and d2
are created. (The respective terminology for "dictionary" is "object" / "hash" / "dict".) The following is satisfied:
-
d1
andd2
are not the same, since applyingd1["a"] = 7
will affect onlyd1
. -
d1
andd2
have the same value. In Ruby and Python, this can be verified by the value comparison operator==
(i.e., in these two languages,d1 == d2
evaluates totrue
orTrue
).
The dictionaries d1
and d2
are not data types - instead, they are instances of a single built-in dictionary type (named respectively Object
/ Hash
/ dict
– "the dictionary").
Conclusion: Instead of the rename Associative array → Dictionary (data type) (which I disagree with) I suggest Dictionary (data structure). --Hundblue (talk) 22:31, 1 February 2022 (UTC)
- You are failing to distinguish a data type (the API: what you do with it) from a data structure (how you implement it and how fast that implementation runs). This article is mostly about the data type. There is no single dictionary data structure; there are many structures with different names that implement that data type, some of which are listed in the "implementation" section of this article. So your suggestion of Dictionary (data structure) is one that I think is incorrect and should not be used, because it incorrectly suggests that there is a single data structure described by this article, when really it is the data type that is described here. —David Eppstein (talk) 22:39, 1 February 2022 (UTC)
I find it questionable whether the article (in the current state) is mostly about the (API-defined) data type. There is a #Properties subsection that provides an adaptation of the "Formal Definition" from [13]. (Interestingly, the page is subtitled "(data structure)".) The definition does not state explicitly if multiple empty dictionaries are possible, but since there is a magical new operation which "creates a new empty dictionary" one can perhaps infer that there are (infinitely?) many empty dictionaries. Assuming that, there is a problem with the equality sign (=) in the definition. Is new() = new() satisfied (as suggested by the remove(k, new()) = new() condition)? Is insert(k, v, D) the same dictionary as insert(k, v, remove(k, D))?
In my opinion, a more rigorous alternative to the above can be obtained by the following definition:
- A dictionary is a structure (X, K, V, x, R) where X is a set of nodes, K is a set of keys, V is a set of values, x is a distinguished node and R is a relation between keys and values (a subset of K × V) that is finite and functional.
Alternatively, one can first define a dictionary context 𝒞 = (X, K, V) and then define a 𝒞-dictionary to be a pair (x, R) with the same constraints as above. The set (denote it 𝒟) of all 𝒞-dictionaries makes up for a "dictionary type". This set 𝒟 is not itself a dictionary. One can define "API" operations:
- remove : K × 𝒟 → 𝒟 by remove(k, (x, R)) = (x, R ∖ {k} × V),
- insert : K × V × 𝒟 → 𝒟 by insert(k, v, (x, R)) = (x, (R ∖ {k} × V) ∪ {(k, v)})),
- lookup : K × 𝒟 ↷ V by lookup(k, (x, R)) = v iff (k, v) ∈ R.
The operations remove and insert preserve "identity" (only the map is changed, not the node) like corresponding operations in JavaScript / Ruby / Python.
Note that there is a distinction between a dictionary (an element of 𝒟) and a node (an element of X). That is, in the JavaScript / Ruby / Python code d1 = {"a":5}
, d1
is not a dictionary but a node. After the insert operation d1["a"] = 7
, the d1
node is the same like before the operation. What has been changed is the associated map (the R-component). That is, in JavaScript / Ruby / Python, there is a global association of nodes to finite maps. Such association can be expressed as a set 𝒜 of source-name-target triples, see Nested dictionary.
The above text should document that at least in the JavaScript / Ruby / Python environments, there is a single common abstraction of the dictionary concept. In this abstraction, a dictionary is a mathematical structure, just like e.g. lattice in order theory. Thus, there is a reason to combine the words "dictionary" and "structure". The additional term "data" might be used for disambiguation. But I understand that the whole term "dictionary data structure" might already have established another meaning(s) - that of "implementation" structures of the dictionary concept. For this reason, I suggest to keep the current page name Associative array - which is neutral w.r.t. data-structure vs. data-type issues.
I disagree with the rename Dictionary (data type) since it incorrectly suggests that the page provides a valid definition of the term "dictionary data type". It presently does not. --Hundblue (talk) 13:31, 2 February 2022 (UTC)
- The things you are talking about are basically instances of the specific data type. I also would like to see comparison of popularity between Dictionary and Associative arrays in actual sources. I don't think that dictionary usage is prevailing. I also have to agree with David Eppstein's opinion above. AXONOV (talk) ⚑ 16:42, 5 February 2022 (UTC)
As for the notion of an instance, the "things" I am talking about are (direct) instances of Object
/ Hash
/ dict
in JavaSript / Ruby / Python, respectively. (There are even introspection methods or operators instanceof
/instance_of?
/isinstance
that support the term "instance".) These instances are accordingly termed objects / hashes / dictionaries. Let us use the Python term uniformly a call these objects dictionaries. As already has been demonstrated above, there is a common syntax and common characteristics of dictionaries in JavaScript/Ruby/Python. In particular:
- There can be more than one empty dictionary.
In the code
x = {}; y = {}
, bothx
andy
are empty dictionaries, butx
is not the same dictionary asy
. - There can be dictionaries with circularities, including cases similar to Quine atoms. For example,
x = {}; x["a"] = x
is allowed. After executing this code,x
is still a dictionary – it does not cease to be an instance ofObject
/Hash
/dict
.
Now the problem with the "dictionary data type" term is that the page (named Associative array at present) does not provide a definition (or description) of such a data type (especially when viewed as a set of "instances") so that dictionaries from JavaScript/Ruby/Python would be compliant with such a definition. That is, there is no valid definition of "dictionary data type" such that dictionaries from these 3 languages would appear to be instances of that data type, in particular w.r.t. (1) and (2). --Hundblue (talk) 23:08, 5 February 2022 (UTC)
Renaming discussion on WikiProject Computer science
edit(From above) This should be discussed on a WikiProject before there is a re-nomination
: I started a talk page for renaming this page on Wikipedia talk:WikiProject Computer science#Rename Associative array -> Dictionary (abstract data type)