.. -*- coding: utf-8 -*- =================================================== Specification "Relations Query Language" (Hercules) =================================================== Introduction ============ Goals RQL --------- The goal is to have a language emphasizing the way of browsing relations. As such, attributes will be regarded as cases of special relations (in terms of implementation, the user language not to see virtually no difference between an attribute and a relation). RQL is inspired by SQL but is the highest level. A knowledge of the `CubicWeb` schema defining the application is necessary. Comparison with existing languages ---------------------------------- SQL ``` RQL builds on the features of SQL but is at a higher level (the current implementation of RQL generates SQL). For that it is limited to the way of browsing relations and introduces variables. The user does not need to know the model underlying SQL, but the `CubicWeb` scheam defining the application. Versa ````` Should I look in more detail, but here is already some ideas for the moment ... Versa_ is the language most similar to what we wanted to do, but the model underlying data being RDF, there is some number of things such as namespaces or handling of the RDF types which does not interest us. On the functionality level, Versa_ is very comprehensive including through many functions of conversion and basic types manipulation, which may need to be guided at one time or another. Finally, the syntax is a little esoteric. See also `````````` RDFQL_ The different types of queries ------------------------------ Search ( `Any`) This type of query can extract entities and attributes of entities. Inserting entities ( `INSERT`) This type of query is used to insert new entities in the database. It will also create direct relationships entities newly created. Update entities, relations creation( `SET`) This type of query updates existing entities in the database, or create relations between existing entities. Deletion of entities or relationship ( `DELETE`) This type of query allows for the removal of entities and relations existing in the database. Examples ======== (see the tutorial: ref: `tutorielRQL` for more examples) Search Query ------------ [ `DISTINCT`] V1 (V2) \ * [ `GROUPBY` V1 (V2) \*] [ `ORDERBY` ] [ `WHERE` ] [ `LIMIT` ] [ `OFFSET` ] :entity type: Type of selected variables. The special type `Any` is equivalent to not specify a type. :restriction: list of relations to go through whic follow the pattern `V1 relation V2 | ` :orderterms: Definition of the selection order: variable or column number followed by sorting method ( `ASC`, `DESC`), ASC is the default. :note for grouped queries: For grouped queries (e.g., a clause `GROUPBY`), all selected variables must be aggregated or grouped. - *Search for the object of identifier 53* :: Any WHERE X X eid 53 - *Search material such as comics, owned by syt and available* :: WHERE X Document X occurence_of F, F class C, C name 'Comics' X owned_by U, U login 'syt' X available true - *Looking for people working for eurocopter interested in training* :: Person P WHERE P work_for P, S name 'Eurocopter' P interested_by T, T name 'training' - *Search note less than 10 days old written by jphc or ocy* :: Note N WHERE N written_on D, D day> (today -10), N written_by P, P name 'jphc' or P name 'ocy' - *Looking for people interested in training or living in Paris* :: Person P WHERE (P interested_by T, T name 'training') or (P city 'Paris') - *The name and surname of all people* :: Any N, P WHERE X is Person, X name N, X first_name P Note that the selection of several entities generally force the use of "Any" because the type specification applies otherwise to all the selected variables. We could write here :: String N, P WHERE X is Person, X name N, X first_name P Insertion query --------------- `INSERT` V1 (, V2) \ * `:` [ `WHERE` ] : assignments: list of relations to assign in the form `V1 relationship V2 | ` The restriction can define variables used in assignments. Caution, if a restriction is specified, the insertion is done for *each line results returned by the restriction*. - *Insert a new person named 'foo'* :: INSERT Person X: X name 'widget' - *Insert a new person named 'foo', another called 'nice' and a 'friend' relation between them* :: INSERT Person X, Person Y: X name 'foo', Y name 'nice', X friend Y - *Insert a new person named 'foo' and a 'friend' relation with an existing person called 'nice'* :: INSERT Person X: X name 'foo', X friend Y WHERE name 'nice' Update and relation creation queries ------------------------------------ `SET` [ `WHERE` ] Caution, if a restriction is specified, the update is done *for each line results returned by the restriction*. - *Renaming of the person named 'foo' to 'bar' with the first name changed* :: SET X name 'bar', X first_name 'original' where X is Person X name 'foo' - *Insert a relation of type 'know' between objects linked by the relation of type 'friend'* :: SET X know Y WHERE X friend Y Deletion query -------------- `DELETE` ( V) | (V1 relation v2 ),... [ `WHERE` ] Caution, if a restriction is specified, the deletion is made *for each line results returned by the restriction*. - *Deletion of the person named 'foo'* :: DELETE Person X WHERE X name 'foo' - *Removal of all relations of type 'friend' from the person named 'foo'* :: DELETE X friend Y WHERE X is Person, X name 'foo' Language definition =================== Reserved keywords ----------------- The keywords are not case sensitive. :: DISTINCT, INSERT, SET, DELETE, WHERE, AND, OR, NOT IN, LIKE, ILIKE, TRUE, FALSE, NULL, TODAY, NOW GROUPBY, ORDERBY, ASC, DESC Variables and Typing -------------------- With RQL, we do not distinguish between entities and attributes. The value of an attribute is considered an entity of a particular type (see below), linked to one (real) entity by a relation called the name of the attribute. Entities and values to browse and/or select are represented in the query by *variables* that must be written in capital letters. There is a special type **Any**, referring to a non specific type. We can restrict the possible types for a variable using the special relation **is**. The possible type(s) for each variable is derived from the schema according to the constraints expressed above and thanks to the relations between each variable. Built-in types `````````````` The base types supported are string (between double or single quotes), integers or floats (the separator is the'.'), dates and boolean. We expect to receive a schema in which types String, Int, Float, Date and Boolean are defined. * `String` (literal: between double or single quotes). * `Int`, `Float` (separator being'.'). * `Date`, `Datetime`, `Time` (literal: string YYYY/MM/DD [hh:mm] or keywords `TODAY` and `NOW`). * `Boolean` (keywords `TRUE` and `FALSE`). * `Keyword` NULL. Operators --------- Logical Operators ``````````````````` :: AND, OR, ',' ',' is equivalent to 'AND' but with the smallest among the priority of logical operators (see :ref:`PriorityOperators`). Mathematical Operators `````````````````````` :: +, -, *, / Comparison operators ```````````````````` :: =, <, <=, >=, > = ~, IN, LIKE, ILIKE * The operator `=` is the default operator. * The operator `LIKE` equivalent to `~=` can be used with the special character `%` in a string to indicate that the chain must start or finish by a prefix/suffix: :: Any X WHERE X name =~ 'Th%' Any X WHERE X name LIKE '%lt' * The operator `ILIKE` is a case-insensitive version of `LIKE`. * The operator `IN` provides a list of possible values: :: Any X WHERE X name IN ( 'chauvat', 'fayolle', 'di mascio', 'thenault') XXX nico: A trick <> 'bar' would not it be more convenient than NOT A trick 'bar'? .. _PriorityOperators: Operators priority `````````````````` 1. '*', '/' 2. '+', '-' 3. 'and' 4. 'or' 5. ',' Advanced Features ----------------- Functions aggregates ```````````````````` :: COUNT, MIN, MAX, AVG, SUM Functions on string ``````````````````` :: UPPER, LOWER Optional relations `````````````````` * They allow you to select entities related or not to another. * You must use the `?` behind the variable to specify that the relation toward it is optional: - Anomalies of a project attached or not to a version :: Any X, V WHERE X concerns P, P eid 42, X corrected_in V? - All cards and the project they document if necessary :: Any C, P WHERE C is Card, P? documented_by C BNF grammar ----------- The terminal elements are in capital letters, non-terminal in lowercase. The value of the terminal elements (between quotes) is a Python regular expression. :: statement:: = (select | delete | insert | update) ';' # select specific rules select ::= 'DISTINCT'? E_TYPE selected_terms restriction? group? sort? selected_terms ::= expression ( ',' expression)* group ::= 'GROUPBY' VARIABLE ( ',' VARIABLE)* sort ::= 'ORDERBY' sort_term ( ',' sort_term)* sort_term ::= VARIABLE sort_method =? sort_method ::= 'ASC' | 'DESC' # delete specific rules delete ::= 'DELETE' (variables_declaration | relations_declaration) restriction? # insert specific rules insert ::= 'INSERT' variables_declaration ( ':' relations_declaration)? restriction? # update specific rules update ::= 'SET' relations_declaration restriction # common rules variables_declaration ::= E_TYPE VARIABLE (',' E_TYPE VARIABLE)* relations_declaration ::= simple_relation (',' simple_relation)* simple_relation ::= VARIABLE R_TYPE expression restriction ::= 'WHERE' relations relations ::= relation (LOGIC_OP relation)* | '(' relations')' relation ::= 'NOT'? VARIABLE R_TYPE COMP_OP? expression | 'NOT'? R_TYPE VARIABLE 'IN' '(' expression (',' expression)* ')' expression ::= var_or_func_or_const (MATH_OP var_or_func_or_const) * | '(' expression ')' var_or_func_or_const ::= VARIABLE | function | constant function ::= FUNCTION '(' expression ( ',' expression) * ')' constant ::= KEYWORD | STRING | FLOAT | INT # tokens LOGIC_OP ::= ',' | 'GOLD' | 'AND' MATH_OP ::= '+' | '-' | '/' | '*' COMP_OP ::= '>' | '>=' | '=' | '<=' | '<' | '~=' | 'LIKE' | 'ILIKE' FUNCTION ::= 'MIN' | 'MAX' | 'SUM' | 'AVG' | 'COUNT' | 'upper' | 'LOWER' VARIABLE ::= '[A-Z][A-Z0-9]*' E_TYPE ::= '[A-Z]\w*' R_TYPE ::= '[a-z_]+' KEYWORD ::= 'TRUE' | 'FALSE' | 'NULL' | 'TODAY' | 'NOW' STRING ::= "'([^'\]|\\.)*'" |'"([^\"]|\\.)*\"' FLOAT ::= '\d+\.\d*' INT ::= '\d+' Remarks ------- Sorting and groups `````````````````` - For grouped queries (e.g. with a GROUPBY clause), all selected variables should be grouped. - To group and/or sort by attributes, we can do: "X,L user U, U login L GROUPBY L, X ORDERBY L" - If the sorting method (SORT_METHOD) is not specified, then the sorting is ascendant. Negation ```````` * A query such as `Document X WHERE NOT X owned_by U` means "the documents have no relation `owned_by`". * But the query `Document X WHERE NOT X owned_by U, U login "syt"` means "the documents have no relation `owned_by` with the user syt". They may have a relation "owned_by" with another user. Identity ```````` You can use the special relation `identity` in a query to add an identity constraint between two variables. This is equivalent to ``is`` in python:: Any A WHERE A comments B, A identity B return all objects that comment themselves. The relation `identity` is especially useful when defining the rules for securities with `RQLExpressions`. Implementation ============== Internal representation (syntactic tree) ---------------------------------------- The tree research does not contain the selected variables (e.g. there is only what follows "WHERE"). The insertion tree does not contain the variables inserted or relations defined on these variables (e.g. there is only what follows "WHERE"). The removal tree does not contain the deleted variables and relations (e.g. there is only what follows the "WHERE"). The update tree does not contain the variables and relations updated (e.g. there is only what follows the "WHERE"). :: Select ((Relationship | And | Gold)?, Group?, Sort?) Insert (Relations | And | Gold)? Delete (Relationship | And | Gold)? Update (Relations | And | Gold)? And ((Relationship | And | Gold), (Relationship | And | Gold)) Or ((Relationship | And | Gold), (Relationship | And | Gold)) Relationship ((VariableRef, Comparison)) Comparison ((Function | MathExpression | Keyword | Constant | VariableRef) +) Function (()) MathExpression ((MathExpression | Keyword | Constant | VariableRef), (MathExpression | Keyword | Constant | VariableRef)) Group (VariableRef +) Sort (SortTerm +) SortTerm (VariableRef +) VariableRef () Variable () Keyword () Constant () Remarks ------- - The current implementation does not support linking two relations of type 'is' with a OR. I do not think that the negation is supported on this type of relation (XXX FIXME to be confirmed). - Relations defining the variables must be left to those using them. For example:: Point P where P abs X, P ord Y, P value X+Y is valid, but:: Point P where P abs X, P value X+Y, P ord Y is not. Conclusion ========== Limitations ----------- It lacks at the moment: - COALESCE - restrictions on groups (HAVING) and certainly other things ... A disadvantage is that to use this language we must know the format used (with real relation names and entities, not those viewing in the user interface). On the other hand, we can not really bypass that, and it is the job of a user interface to hide the RQL. Topics ------ It would be convenient to express the schema matching relations (non-recursive rules):: Document class Type <-> Document occurence_of Fiche class Type Sheet class Type <-> Form collection Collection class Type Therefore 1. becomes:: Document X where X class C, C name 'Cartoon' X owned_by U, U login 'syt' X available true I'm not sure that we should handle this at RQL level ... There should also be a special relation 'anonymous'. .. _Versa: Http://uche.ogbuji.net/tech/rdf/versa/ .. _RDFQL: Http://www.w3.org/TandS/QL/QL98/pp/rdfquery.html