From any Linux box, go to /homes/gws/zives/tukwila.
From bash:
Run Tukwila.exe. All DLLs should be in the same directory.
The GUI can be run with:
./TukwilaGUI [{hostname}]
where {hostname} defaults to localhost and is the name of the machine running the execution engine.
To test your connection, open "Maya1.xq" or "GS.xq" and run it.
The API is documented in InvokeXQuery.html.
Go to http://{hostname}:7777 to see the system status.
The memory management system is divided into several components. The core component is CMemoryManager, which is provided with a pool size (currently specified in tukwila_config.xml) and grants memory to the other managers in terms of extents, via the requestExtent() and freeExtent() calls.
Extents are used by the following components:
CBufferManager, which creates the buffer
pool and grants portions of the pool to implementors of the IRelation interface,
currently descendants of CRelationBase:
RelationBase's descendants are assumed to have a single reader and a single writer. This could be changed by switching from CMutex to CSem, but CSem is currently untested. Internally, there is a RelationManager that creates and tracks mappings between relation IDs and paths. |
CHashTable, which directly allocates a
hash table from the memory manager and allows explicit control of
buckets. There are functions to find tuples, replace them, etc.
Hash tables are not thread-safe; the assumption is that all reads and writes will be synchronous. |
CPageManager, which requests a group of
pages from the buffer manager and allows for indirect access to them. Basically,
the role of the PageManager is to reserve a fixed portion of the buffer pool
for use by an operator (instead of relying on the default LRU replacement of
the pool).
Typically, CPageManager itself isn't directly used -- instead, its descendant, CTupleManager, is. CTupleManager allows us to directly read and write tuples to/from pages that were allocated in a block. A particular user of CTupleManager is the external sort, implemented in CTupleSorter. |
IOperand is the basic entity of manipulation,
and it can be a computed expression or a primitive data value. Both
algebraic operators and data values should subclass from this interface.
Operands can be queried as to their return type, null-ness, and so on.
Note that null-ness is not preserved by an entity when it is written (to disk or to memory). Instead, for efficiency reasons the tuple container should record and maintain null-ness of its attributes. (This way each null only takes 1 bit of storage.) |
IValue is the basic data value, and it has a
sub-interface template called IPrimitiveType<T> that is used for
primitive scalar types.
Specific classes include:
Values generally implement:
|
CTuple is the basic tuple. It is currently not a subclass of IValue,
but this could easily be changed. The big question is what operators
would return tuples (e.g., instead of relations), and how scalar-based
operators would handle things when given a tuple.
Tuples are responsible for knowing which of their attributes are null, and they store and read this from memory. They call the appropriate attribute containers to make sure nullness is carried down to the attribute, and read from it. |
IOperator is a child of IOperand,
and thus every operator can return an operand that can be used in an expression
(and similarly, every IValue can be an operand). There are three
"interface"-style classes that define the arity of an operator:
Additionally, there is an IBoolean interface for classes that return booleans (namely, boolean operators and comparisons).
Subclasses of IOperator include:
|
IQueryOperator is the interface for all query
operators. Typically operators actually subclass one if its
descendants, which are named according to the number of child operators:
IQueryOperator is not currently a subclass of IOperator because it isn't clear that we want to interchange between tables and scalars. However, it is possible to extend in this way (and in fact, there are datatypes such as GROUP that are reserved for such a feature). The (standard) order of iterator execution goes:
Tuple creation and deletion is done by the parent operator, after it has read schema information from the child. setNewTuple() establishes the necessary bindings between the output tuple and the child tuple. Current query operators include:
|
The optimizer works on PlanNodes, which are higher-level representations than standard query operators. In particular, a PlanNode encapsulates the following operations:
The main class is OptimizePlan. The actual optimizer is split among ExecPlan.cpp (actual operator generation) and Optimizer.cpp (PlanNode optimization). Each re-optimization creates a new dynamic programming enumeration; after optimization, only the actual nodes in the final plan are preserved.
Important methods in OptimizePlan are:
AQPDaemon.cpp contains the server daemon (which now handles web requests, AQP/CQP requests, and also XML query plans).
Currently the daemon resides by default on port 7777. Queries are triggered via SOAP over HTTP. Status information can be obtained simply via HTTP requests (GET / will return the root status page). Performance monitoring can be done over executing queries; by default, the dot package is used to obtain graph-formatted query plans in GIF or PNG.
Query.cpp is the main routine and includes keyboard I/O control (which will probably be deprecated and replaced by web forms for reconfiguring the system)
The pre-optimizer is a Java component that is mostly responsible for parsing and language-level rewrites. It is not 100% compliant with the current XQuery specs, and in fact it does not even support multiple lexing modes (so some words are reserved).
The rewrites of the optimizer mostly involve the following:
- Remove [] filter expressions from XPaths (note that the current version may be buggy in some cases)
- Flatten source mappings and conjunctive predicates (may not work with disjunction)
- Do analysis of conjuncts to add transitive relationships (e.g., $a < $b, $b = "abc" => $a < "abc")
(Currently ignores expressions that do arithmetic, e.g. $a = $b + 10)- Separate query blocks (may not work with nested FORs -- only nested queries in RETURN)