Open Side Menu Go to the Top
Register
Programming homework and newbie help thread Programming homework and newbie help thread

04-10-2016 , 01:30 PM
Unfortunately, I've never seen the PT4 hand structure, but I wouldn't figure it to be amazing.

Subselects can be slow, but there are strategies around that issue. The idea is to only grab the columns you need for the final result, that way you can use WHERE EXISTS. For all the columns you need, you can simply convert said subselect into a normal join.

There are other strategies that work well, such as partial indexes and windowing (windowing with take care of many subselect headaches for you). In general, stay away from using views unless you really need every single field generated by it. In a way, a subselect is a view, and unfortunately, there is no query-hinting on PostgreSQL and the offset 0 hack doesn't seem to work very well, IME.

All of this assumes you are tuning up your db and not blowing out your memory, but you should be able to get things down to sub 300ms very little effort. Use explain analyze and make sure you aren't sorting on disk. If you are, bump your memory and especially work_mem settings.

If you really need things to drop below 50ms, then start looking into loose index scans and other esoterica, but here you find rabbit holes.
Programming homework and newbie help thread Quote
04-10-2016 , 03:31 PM
I think maybe the proscription against "using string techniques for forming SQL queries" is being overblown. To me, it's totally fine to use string techniques to form the bones of a query, and then use bind variables to fill in the values. You just need to be aware of which variables you have control over, and which you do not. For example, it's kind of common to do something like

Code:
params = [val1]
query = "select count(*) from mytable where mycondition < %s"
if use_other_condition:
    query += " and other_condititon = %s"
    params.append[val2]
cursor.execute(query, params)
The %s is the general syntax that python's mysql and postgres libraries use. They substitute values in from the params list, safely. It is not possible to compromise the database in this way.

I think the proscription is against things of this type
Code:
query = "select count(*) from mytable where mycondition < '%s'" % (val1, )
which literally inserts the value into the query, without regard for quoting problems you might encounter, or, even worse
Code:
query = "select count(*) from %s" % (table_name, )
(where you, as the programmer, do not have control over table_name)

It is extremely common practice to build sql queries with strings, and it's definitely possible to do it in a completely safe manner. It's the building block of most ORM systems among other things.
Programming homework and newbie help thread Quote
04-10-2016 , 04:51 PM
http://initd.org/psycopg/docs/usage....ery-parameters

That's correct, but I'm wondering if coon74 is talking about doing full string interpolation, which what I was warning against, as it seemed to be what he was complaining about.

If it matters any, you can do the exact same thing with PL/pgSQL. It looks a bit like this:

Code:
format("select count(*) from %1$I where %2$I > $3$L", table_name, col_name, new_value);
There's a whole bunch of other things you can do, but PostgreSQL also offers quote_ident() and quote_literal() for sanitizing strings before they are concatenated, which I'd guess would be the correct way to deal with table and column names, though I've never used them directly in psycopg2.
Programming homework and newbie help thread Quote
04-10-2016 , 05:03 PM
Quote:
Originally Posted by daveT
http://initd.org/psycopg/docs/usage....ery-parameters

That's correct, but I'm wondering if coon74 is talking about doing full string interpolation, which what I was warning against, as it seemed to be what he was complaining about.
Yes, I was worried about full string interpolation, but the correct code is also easy! Thanks to both!

Quote:
Originally Posted by daveT
There are other strategies that work well, such as partial indexes and windowing


This must solve so many problems with PT4
Programming homework and newbie help thread Quote
04-10-2016 , 05:24 PM
I have just started using postgres windowing, and it is indeed useful, but I *think* windowing only works on row numbers essentially. Like you can use it with partition to get a moving average from, say, -10 rows ago to this row. If you want sample based windowing, that's fine. In my case, I wanted date-based windowing, and there would not be a record on every date. I "solved" it by left joining with a generated date sequence, assuring that 1 row = 1 day, but I'd love to know if there is a more elegant solution.
Programming homework and newbie help thread Quote
04-10-2016 , 05:41 PM
Of what type are the dates in your DB? Timestamps, text, something else?

You want the window frame to consist of all the records from within 10 astronomical days of the row being processed, right? Or do you want records from the previous 10 calendar days?

Last edited by coon74; 04-10-2016 at 05:46 PM.
Programming homework and newbie help thread Quote
04-10-2016 , 05:49 PM
Quote:
Originally Posted by coon74
Of what type are the dates in your DB? Timestamps, text, something else?

You want the window frame to consist of all the records from within 10 astronomical days of the query moment, right? Or do you want records from the last 10 calendar days?
They're plain dates (not datetimes). So pretend I have a table with 2 columns
mydate - date
myval - float
I want, say, a result set where each row is
date, sum of myval from (date - 10 days) to date

It can also be done by self-joining but I think that might perform very terribly (I haven't actually tried it, I've just been experimenting in my so called "free time")

Not that it matters for this discussion, but I'm actually using Amazon Redshift which has some modified rules for postgres. For example, you can't actually join with a generated table, so I literally make a tmp table first (of the generated dates) and join with that.
Programming homework and newbie help thread Quote
04-10-2016 , 06:21 PM
If I'm not mistaken, you can first compute the sums of values within each day (dailysum in the below code) by means of grouping, to obtain an auxiliary table whose rows are (date,dailysum), and then aggregate the daily sums over the last 10 days using the window function lag().
Code:
SELECT datealias, lag(dailysum,9) + ... + lag(dailysum,1) + dailysum AS movingsum
OVER (ORDER BY datealias)
FROM (
SELECT date AS datealias, sum(myvalue) AS dailysum
FROM mytable
GROUP BY date)
AS auxiliarytable
(I'm not sure is there's a need to alias date AS datealias, it might depend on the DBMS, I want to play it safe.)

But then you'll only get a table consisting of rows (datealias,movingsum) i.e. consisting only of dates and moving sums corresponding to those dates...

As for missing dates, you can insert fictitional entries with value==0 so that no date is skipped.

Last edited by coon74; 04-10-2016 at 06:49 PM. Reason: I have no idea what should be aliased and what shouldn't, but I hope you've got the main point.
Programming homework and newbie help thread Quote
04-10-2016 , 06:53 PM
It doesn't work unless there are no missing days, because a lag of 9 isn't guaranteed to be 10 days ago. So lag(dailysum,9) could include a day from 10 years ago (also, you don't need to do the addition you did there, you can use "partition over" to get the same result)
Programming homework and newbie help thread Quote
04-10-2016 , 08:21 PM
It's a bit clumsy, but you can check whether the date written in the previous row is within 10 days of the date of the current row, and add the value only if that date is within the range, otherwise add 0. Note that lag (mydate, 1) is the date of the previous row, lag (mydate, 2) is the date of the next to previous etc.
Code:
SELECT
date,
myvalue +
(CASE WHEN (lag (mydate, 1) > mydate - 11)
THEN lag (myvalue, 1)
ELSE 0 END) + ... +
(CASE WHEN (lag (mydate, 10) > mydate - 11)
THEN lag (myvalue, 10)
ELSE 0 END)
OVER (ORDER BY mydate)
FROM mytable
The above code is likely suboptimal in terms of time because it makes all the 10 condition checks all the time; it can be rewritten in the following, even clumsier way, with nested CASE statements :
Code:
SELECT
date,
myvalue +
(CASE WHEN (lag (mydate, 1) > mydate - 11)
THEN lag (myvalue, 1) +
    (CASE WHEN (lag (mydate, 2) > mydate - 11)
    THEN lag (myvalue, 2) +
        (CASE WHEN (lag (mydate, 3) > mydate - 11)
        THEN lag (myvalue, 3) + ...
        ELSE 0 END
        )
    ELSE 0 END
    )
ELSE 0 END
)
OVER (ORDER BY mydate)
FROM mytable
That's if the dates are distinct in your rows; otherwise group by date and run the windowing on the resulting table of daily sums.

Or, as said in an edit above, just insert rows with the missing dates and myvalue==0 beforehand into your initial table.

Last edited by coon74; 04-10-2016 at 08:30 PM.
Programming homework and newbie help thread Quote
04-10-2016 , 08:22 PM
I think left outer joining with a list of all dates in the necessary period is probably a loooot simpler
Programming homework and newbie help thread Quote
04-10-2016 , 08:26 PM
Question about a simple program I was tinkering with while going through Learn Python the Hard way.

After doing one of the examples, I was tinkering with the program. One small part of it won't work correctly. I've been staring at it and just can't figure it out for the life of me.

print "You enter a dark room with many doors. Choose your door number."

door = raw_input("> ")

if door == "1":
some code

elif door == "2":
some code

elif "3" <= door <= "10":
some code

elif "11" <= door <= "20":
some code

else:
print blah blah blah

So when I test this, I can not get the "elif '3' <= door <= '10':" block to execute. I don't see why. I don't see how it's different than the next elif statement that works fine. When I change it to, for instance, "elif door == '3':," that works. As it is now, when I enter a number 3-10, the else branch executes. I took out all the other code to make the post shorter, but other than that this is a direct copy/paste from my TextWrangler.
Programming homework and newbie help thread Quote
04-10-2016 , 08:35 PM
Your input is a string, and your if statements are operating against strings. The string "10" is less than the string "3", because of how strings are ordered, so it's impossible for door to be in between them. In fact, since they're string, consider something that's more intuitive. Say a=0, b=1, c=2 and so forth. That statment would be

elif "d" < door < "ba"

Clearly that can never be true, right? There's no word in the dictionary that is after "d" but before "ba"

Instead, convert your input to an int (check for exceptions, if you say door = int(door) it will throw a ValueError if it can not convert to an int)
Programming homework and newbie help thread Quote
04-10-2016 , 08:36 PM
The reason the 11 / 20 case works is because they have the same number of characters. In the alpha example I have about that would be

elif "bb" <= door <= "ca"
which is possible to fulfil.
Programming homework and newbie help thread Quote
04-10-2016 , 08:41 PM
Thank you. I had initially thought something like that, but the 11 - 20 thing was tripping me up.
Programming homework and newbie help thread Quote
04-10-2016 , 08:59 PM
Wow, I didn't know that Python reads 'a < b < c' as '(a < b) and (b < c)', that looked like a frivolity to my eyes that are more used to C/C++

@RustyBrooks: left outer joining does indeed look simpler, but when contemplating such problems, I tend to first seek solutions that involve manipulations with select lists only because I'd love the solutions to be applicable to PT4's custom stat generator that doesn't allow any other manipulations

Last edited by coon74; 04-10-2016 at 09:18 PM.
Programming homework and newbie help thread Quote
04-10-2016 , 09:13 PM
So first, actually any construction made of the same operator in python gets this treatment, which is the solution to the riddle of why
False is False is False
evaluates to True (because it's "(False is False) and (False is False)")

But second, what other possible meaning could a <= b <= c mean (except that b is in the range [a, c]
Programming homework and newbie help thread Quote
04-10-2016 , 09:32 PM
In C, where there's no built-in boolean type*, a <= b <= c is evaluated naively left-to-right like a pair of binary operators with int results: a <= b is evaluated first, returning either 0 (False) or 1 (True), then, depending on the result, either 0 <= c or 1 <= c is evaluated.

* For the purposes of if, while etc. operators, zero is treated like 'False' and nonzero is treated like 'True', so

Code:
if 0
return 2;
else return 3;
returns 3, while

Code:
if 2
return 4;
else return 5;
returns 4.

Furthermore,

Code:
int a = 0;

if a == 0
return 0;
is syntactically incorrect - the correct code is either
Code:
if (!a)
return 0;
or

Code:
if (a == 0)
return 0;
otherwise the compiler reads 'if a' and expects to see an operator right after that, but seems the '==' instead and has a hiccup; only taking the logical expression 'a == 0' into parentheses prompts the compiler to treat it in its entirety as the condition of the if-operator.

Last edited by coon74; 04-10-2016 at 10:00 PM.
Programming homework and newbie help thread Quote
04-10-2016 , 09:38 PM
Sure, but in C, a <= b <= c is essentially a non-sequitur, almost no one who writes it means to write it. So if you start from the belief that the line of code was written on purpose with a clear semantic meaning, it basically can't be the C method of evaluation.

That is to say, if it was embedded in a piece of writing, if someone said "a <= b <= c" then I think most people would interpret that as "b is between a and c"
Programming homework and newbie help thread Quote
04-10-2016 , 09:48 PM
Yes, of course, if "a <= b <= c" is syntactically correct in a language, then it likely means the same as humans mean by writing it, but when encountering this expression in a piece of code, my 'internal compiler' still instinctively raises a red flag
Programming homework and newbie help thread Quote
04-11-2016 , 12:08 PM
Quote:
Originally Posted by coon74
Yes, of course, if "a <= b <= c" is semantically natural in a mainstream language,..
FMP before RustyBrooks slaps me for an incorrect statement
_______________________________

What do people here think about IronPython as a method to 'have the best of two worlds' - make use of the simplicity (?) of both Python (relatively to C#) and Visual Studio? Or is pyQt or wxPython just as easy as VS for creation of a primitive Windows GUI (a window with text, buttons and forms only)?

It's merely the 5th mention of IronPython on this forum, so I don't think the topic is covered well enough.
Programming homework and newbie help thread Quote
04-11-2016 , 02:40 PM
Coming from C, a basic Java scope question I guess. Why don't I need to return the array to main?

Code:
public class bucky {
	public static void main (String args[]){
	int bucky[] = {3,2,4};
	change (bucky);
		
	for (int y:bucky)
		System.out.println(y);
	
    }
	
	public static void change(int x[]){
		for (int count = 0; count < x.length; count++)
			x[count]+=5;
	}
		
		
}
'cause in the same class would be my first guess...
Programming homework and newbie help thread Quote
04-11-2016 , 03:43 PM
I don't think it's an issue of scope. In java everything is essentially a pointer to an object. So the array you're modifying in change() is a pointer to bucky, and changes to x are changes to bucky. (This is the same in C, fwiw, because arrays are just pointers to memory)
Programming homework and newbie help thread Quote
04-11-2016 , 03:55 PM
Yeah, it's being a a few years since I did any C, but yeah you can return a pointer to the array which is what mixed me up.

But I think I have my answer to my general question, which is the variable is public and can be modified by any method in any class and if the variables are private, they can only be accessed by methods in their own class, i believe?

Last edited by mackeleven; 04-11-2016 at 04:01 PM.
Programming homework and newbie help thread Quote
04-11-2016 , 04:08 PM
Quote:
Originally Posted by mackeleven
Yeah, it's being a a few years since I did any C, but yeah you can return a pointer to the array which is what mixed me up.

But I think I have my answer to my general question, which is the variable is public and can be modified by any method in any class and if the variables are private, they can only be accessed by methods in their own class, i believe?

In javascript, variables are (essentially) pointers to objects. So when you call a function with an object, you are not passing a copy of the object, you are passing a pointer to the object. If you change a class member of the object in your function, it will be changed in the calling frame also.

It has nothing to do with scope, ownership, public/private etc. It's basically almost exactly like passing pointers in C.
Programming homework and newbie help thread Quote

      
m