I've mentioned that I was working on an 'advanced' means of querying the database in several threads. I now have something to look at and test, although it doesn't actually do any searching yet.
The user interface for creating ad-hoc queries is just about done and I'd like your feedback.
http://www.brickset.com/search/advanced/You can create queries and see the resultant SQL that will, eventually, be executed (using a r/o database account, I hasten to add, to prevent SQL injection vulnerabilities)
You can specify queries to do things like
"show me all polybag sets released since 2007 that I don't own, order by year released, then number".
"Show me all sets with between 5 and 10 minifigs, order by number of minifigs, descending"
"Show me all sets in the themes Aquazone and Aquaraiders that cost less than $10, and order by year released"
"Show me all sets with an X in the name that were available from shop.LEGO.com in 2008" (useful, eh!)
that sort of thing, which can't of course be done at the moment.
Not every query will be possible because, to keep things simple, you can't specify how the terms should be bracketed. It assumes that you want OR terms bracketed together which in 95% of cases I believe you will do.
So, what you're testing is the UI for specifying the query and whether the SQL generated is what you're expecting. When that's done, I'll move on to saving and executing the query, then editing and sharing them with others.
There's a couple of things still to do, including providing guidance as to what type of data is valid in the text box (date, number, anything) and date validation.
LMKWYT...
Comments
1. Remove the second sort from default visibility and add "+" to add multiple sort levels. It may be confusing to see "select..." in the second row when its not required that you actually select anything.
2. The "remove" criteria is missing from the last entry line.
3. If you can't code the SQL for case insensitivity, you may want to add a disclaimer to users that they woiuld need an OR to check case variations (e.g. "Lego" vs "lego").
Looking forward to seeing it run!
1. Good idea
2. It's supposed to be, otherwise you could remove everything and not be able to add anything more.
3. The SQL Server is case-insensitive anyway, but it might be worth adding a note to that effect in the intro text.
Despite the wonderful number of queries available, I'd like to ask about another :o) Is it possible to add in the existing Amazon Watch % discount queries?
-You can now delete all entries, except the first
-Multiple sort level suggestion above implemented
-'help text' appears to tell you what sort of input is expected
-strings, numbers and dates should now be validated.
Have another play and LMKWYT...
http://www.brickset.com/search/advanced/
Thanks
WHERE (SetName LIKE '%bat%' AND YearFrom < 2010 OR Pieces >= 1000) ORDER BY YearFrom ASC
Shouldn't it be:
WHERE
(SetName LIKE '%bat%' AND (YearFrom < 2010 OR Pieces >= 1000) ORDER BY YearFrom ASChttp://www.brickset.com/search/advanced/
LMKWYT...
I'll work on the execution bit when back from AFOLCON next week.
I think it's pretty much complete:
- You can create, name and save queries
- Queries are now actually executed and results are shown in the normal way.
Things still to do
- The run button on the query builder page doesn't do anything yet, you need to save the query then return to your query list to execute it. As a result, you must be logged in to run queries, and you can't test it as you build it easily.
- A means to mark your queries as sharable and a means for admins to highlight universally useful ones, the list of which will ultimately replace the 'data mining' page.
Anyway, have a play and LMK if it works as you expect it to.
When entering UK price per piece, is this in pence or pounds e.g. 0.07 or 7? I'm getting weird results and can't figure out if it's me or the scripts :O)
SELECT * FROM SETS
WHERE (UKPricePerPiece < 10) AND (YearFrom > 2010)
ORDER BY UKPricePerPiece DESC
returns 174 matching records, but nothing displayed apart from the pagination.
SELECT * FROM SETS
WHERE (UKPricePerPiece < 0.10) AND (YearFrom > 2010)
ORDER BY UKPricePerPiece DESC
returns 8 sets that are free.
I should be able to sort it though...
I'll continue to investigate...
http://www.brickset.com/search/advanced/run/?q=18
which is
SELECT * FROM Sets
WHERE (UKPricePerPiece < 10) AND (UKPricePerPiece > 0) AND (YearFrom >= 2010)
ORDER BY UKPricePerPiece ASC
One idea though, when doing 'equals' for string fields, maybe do a like without the % characters - this will force a case insensitive compare (= may do this, but it's implementation dependent IIRC).
eg,
SELECT * FROM Sets
WHERE (SetName LIKE 'test')
If so, then <> would also need changing to NOT LIKE.
Looks like you're also assuming that OR and AND are always parsed as (A OR B) AND C - ie, it doesn't appear possible to do A AND (B OR C). Not sure how to offer this functionality to the user though and it might be confusing to new users. Maybe have the option to edit the generated SQL (maybe just the WHERE clause portion to prevent people trying to query things they shouldn't) by hand?
It also seems to me that you can see everyone's queries by simply guessing the query number:
http://www.brickset.com/search/advanced/run/?q=20
You can execute any query by guessing a number, but I don't think that's an issue, unless I'm missing something? Even if you include 'I own/want' in it, it applies to the person running the query rather than the person who created it.
Almost there now, I think...
All 'data mining' queries have been recreated using this, and in time I plan to add more general purpse ones to the list, that I, and others, create.
Any problems, LMK...
You should add that!
So, you could build something to show all sets you own, ordered by 'order added to your collection' descending to see those you'd added recently.
The 'Collections' table that holds who owns and wants what currently contains 5.9 million records. Each row consists of 4 integers and 2 bits (total 18). If I were to add a datetime field to record the date items were added it would increase the row length by 8, almost a 50% increase, which is obviously undesirable on such a large table, for a feature that would receive relelatively little use.
Maybe I can do something in the ACM table, though...
HTH, a bit...
I'll add it to the to-do list :)
Some extensions I would like to see (I know I've mentioned some in private, but I thought it might be handy to have this list, and to let others comment and/or expand on these ideas):
1. Querying against a Bricklist
2. Querying against elements, parts, colors
3. Subordinate clauses
4. "Not"
5. The ability to rearrange clauses when editing a query
A few sample uses:
Everything in list L that I don't already own
All sets containing both element e1 and e2
All sets in theme t containing part design p, ordered by quantity of p, decreasing
All sets that aren't drawn from theme t, that provide parts of color c, ordered by quantity of said parts, decreasing
Keep up the awesome work!
All sets that are not duplo, containing parts introduced since 2011 (excluding minifigs and minifig accessories)
(If colors were organized into groups, as is done on Basebrick, and you could query against part design groups as well, then you could also ask...)
All sets containing pearlescent slopes, ordered by quantity of those parts, decreasing
Having queries that yield lists of parts rather than lists of sets would also be greatly useful.
If the Ø designations were actual numerical attributes of parts, rather than ad hoc text that sometimes uses nn.nn and other times nn,nn notation, then one could query against those as well.
Then we could discover things like:
All parts of Ønn.nn in color c
If you wanted to modify that superset, you then have only to edit one query, and all the dependent ones automatically inherit the modification. Voila! The power of abstraction!