GADDAG Documentation¶
GADDAG is a Python wrapper around cGADDAG.
A GADDAG data structure provides rapid word lookups for prefixes, suffixes and substrings, making it ideal for use in applications such as move generation in word games such as Scrabble.
GADDAG currently only supports the ASCII alphabet.
Usage¶
>>> import gaddag
>>> words = ["foo", "bar", "foobar", "baz"]
>>> gdg = gaddag.GADDAG(words)
>>> "foo" in gdg
True
>>> "bor" in gdg
False
>>> gdg.contains("ba")
['bar', 'foobar', 'baz']
Words stored in the GADDAG can be iterated over:
>>> for word in gdg:
... print(word)
...
foobar
foo
baz
bar
Searching¶
More advanced searching features are available, such as finding words starting with, ending with, or containing a given substring:
>>> gdg.starts_with("foo")
['foobar', 'foo']
>>> gdg.ends_with("bar")
['foobar', 'bar']
>>> gdg.contains("ba")
['bar', 'foobar', 'baz']
Low level access¶
Low level access is provided, which allows for the GADDAG to be traversed manually.
The root node of the GADDAG can be accessed:
>>> root_node = gdg.root
>>> root_node.edges
['a', 'b', 'f', 'o', 'r', 'z']
Edges can be followed:
>>> a_node = root_node["a"]
>>> a_node.edges
['b']
Following an edge which does not exist throws KeyError:
>>> a_node["f"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../gaddag/node.py", line 73, in __getitem__
raise KeyError(char)
KeyError: 'f'
Following edges in succession can become tedious, so a method is provided for ease of use:
>>> root_node.follow("ab") == root_node["a"]["b"]
True
The “+” character is the break character, which signifies that the start of the word has been reached and that the subsequent edges (if any) now form the suffix. A node’s letter set shows the characters which will end a word:
>>> break_node = root_node.follow("ab+")
>>> break_node.letter_set
['r', 'z']
The words “bar” and “baz” have been found.
Using these features, GADDAG can be used for more specialised applications, such as move generation in word games.
API¶
-
gaddag.
load
(path)¶ Load a GADDAG from file.
- Args:
- path: path to saved GADDAG to be loaded.
- Returns:
- Loaded GADDAG.
-
class
gaddag.
GADDAG
(words=None)¶ A data structure allowing for extremely fast prefix, suffix and substring searching of words.
-
add_word
(word)¶ Add a word to the GADDAG.
- Args:
- word: A word to be added to the GADDAG.
-
capacity
¶ Returns the current capacity of the GADDAG.
-
contains
(sub)¶ Find all words containing a substring.
- Args:
- sub: A substring to be searched for.
- Returns:
- A list of all words found.
-
ends_with
(suffix)¶ Find all words ending with a suffix.
- Args:
- suffix: A suffix to be searched for.
- Returns:
- A list of all words found.
-
load
(path)¶ Load a GADDAG from file, replacing the words currently in this GADDAG.
- Args:
- path: path to saved GADDAG to be loaded.
-
num_edges
¶ Returns the number of edges in the GADDAG.
-
num_nodes
¶ Returns the number of nodes in the GADDAG.
-
num_words
¶ Returns the number of words in the GADDAG.
-
root
¶ Returns the root node of the GADDAG.
-
save
(path, compressed=True, exist_ok=False)¶ Save the GADDAG to file.
- Args:
- path: path to save the GADDAG to. compressed: compress the saved GADDAG using gzip. exist_ok: overwrite existing file at path.
-
starts_with
(prefix)¶ Find all words starting with a prefix.
- Args:
- prefix: A prefix to be searched for.
- Returns:
- A list of all words found.
-
-
class
gaddag.
Node
(gdg, node)¶ A node in a GADDAG.
-
edges
¶ Return the edge characters of this node.
-
follow
(chars)¶ Traverse the GADDAG to the node at the end of the given characters.
- Args:
- chars: An string of characters to traverse in the GADDAG.
- Returns:
- The Node which is found by traversing the tree.
-
is_end
(char)¶ Return True if this char is part of this node’s letter set, False otherwise.
-
letter_set
¶ Return the letter set of this node.
-