



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
kdb q common interview questions
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Sorted attribute - `#s
Where: Applied to lists/columns that are sorted in ascending order. There is no memory overhead.
Why: Allows for a binary search. The list or column is divided in two and check done to see in which half. It then splits that half again and this process continues until the data we want is found.
Unique attribute - `#u
Where: Applied to a group of distinct items such as the key column of a keyed tale or the domain of a dictionary. It creates an internal hash map which uses a large amount of memory.
Why: Allows to speed up searches such as distinct and allows q to exit of some comparisons early.
Parted attribute - `#p
Where: Used on lists where the items are contiguous but not necessarily in ascending or descending order. It is usually applied to the sym column stored on data stored in HDBs. There is a small memory overhead.
Why: It creates an internal dictionary mapping each output to the position of its first occurrence. Therefore, when querying data on disc, once the first instance is found, data retrieval from there on will be relatively quick with minimal skipping of the disc head.
Grouped attribute - `#g
Where: Used on lists where there is no apparent structure, by creating and maintaining a dictionary which maps each output to a list of the positions of all its occurrences.
Why: This greatly speeds up select queries with where clauses. For this reason, the grouped attribute is mostly used on columns with multiple occurrences of the same item such as the sym column. The trade-off is that a very large overhead of space is required.
Q. Rank the attributes in terms of memory overhead
A.
, - Simple join: (t1,t2)
When using , on tables the column names must match. The column types however do not. The results when using , on unkeyed tables is similar to using insert, and similar to using upsert on keyed tables, where it will replace any the key in the left argument with the same key in the right argument.
ij – inner join (t1 ij t2)
This joins the columns of the source table (t1) which have an entry in the key column of the lookup table t2. Therefore t2 must be keyed & its key columns must be columns of t1.
lj – left join (t1 lj t2) This joins the columns of the source table (t1) along the matching key columns of the lookup table (t2). In contrast to an inner join, a left join will return a record for every entry of t1 regardless of whether that entry appears in t2 or not.
The difference between a left join and an inner join is that with a left join every record of the column(s) in t will be returned regardless of whether that entry matches the keyed columns of t2, but with an inner join, the non-matching entries will be left out.
uj – union join ( t1 uj t2) This joins two tables vertically. Neither of the tables need to be keyed and columns of the same name need not have the same type. Also the two tables need not have the same number of columns. This could be used to join a trade and quote table and then sort ascending by time to show what trades and quotes occurred.
aj – asof join aj[symtime;trade;quote]
This join is for joining tables with reference to time. It is mainly used for getting the prevailing quote at the time of a trade. It is important the 1st^ argument are the columns to search on and that these columns are in the correct order, otherwise there will be a serious performance hit. The sym column must appear before the time column, as it is the sym column we must search on first.
In order to get the prevailing quote, if that quote was from the previous day, you would run an update statement on the fly and join the date and time columns to form a timestamp.
Functional update:
e.g.
![ table; whereConditions; groupBy; updatedColumns]
Functional delete: e.g.
![t;c;0b;a]
The aggregate argument a, is a simple list of symbols referring to columns to be removed. The where clause argument c, specifies the rows to be removed and appears in same format as in functional select. Either a or c must be present but not both.
Q. Why would you use functional statements in place of qsql queries?
A. It allows the user to dynamically select columns and build where clauses & avoiding overly complicated and long qSql statements.
Q. What does fby do? A. function-by: It applies an aggregate function to each member of the group selected. It saves you having to create an intermediary table and doing a left join.
Q. How to you use fby with more than one column?
A. select from t where price>(avg;price) fby ([]sym ; size)
e.g
We want to see each sym in the table whose price exceeds the average price for that sym group:
select from t where price>(avg;price) fby sym – will return a table showing all the syms whose price exceeds the avg price for THAT sym - not just the whole table.
Select by sym from t where price >avg price – will return the syms whose price exceeds the average price of the whole table.
Q. What is a compound column? A. A compound column is a column which contains lists instead of a single entry.
Q. How will this appear on disk?
A. There will be two columns associated with the compound column – a name file and a name# file. The # file will store the flattened values of the column. The name file will hold the count of each list in the column.
Q. How would you query a string column in a HDB? A. As a string can’t be queried with match, equality or ‘in’ you must either use ‘like’ or adverbs when querying string columns. e.g. select from t where id like “abc” or select from t where id ~: “abc”
Q. Why is a compound column split up into two separate files?
A. To speed up queries. Instead of scanning a large general list, q instead scans an index file. The trade-off for speed however, is memory.
Q. How would you set the q timer?
A. Either on the command line with –t or in a q session with \t. Then define whatever logic you need to happen on every tick in .z.ts
Q. How would you set the timer to run different logic at different intervals?
A. Create a table containing the logic to be ran along with a start and end time. Define .z.ts to only run the jobs whose start time has passed but not exceeded the end-time. You could also include a pre-defined interval on which to run each job and increment the start time accordingly after each run.
Q. What arguments does .z.ts take?
A. The current timestamp
Q. What’s the difference between scan and over? A. Over will operate on a list so that items of the 2nd^ argument are applied iteratively to the 1st. It will return a final result. Scan operates the same as over except that it returns the intermediary results also.
Q. Give an example of a function using scan and over? A. myCount: +/ will get the count of a list = 6 // myRunCount +\ will return the running count of the same list. Calculate first N items of Fibbonacci sequence fib: {x,sum -2#x}/