Open Side Menu Go to the Top
Register
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** ** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

09-08-2017 , 05:46 PM
I think of that kind of question as an opportunity to explain how, in terms of thought process and making use of your accumulated experience and wisdom, you worked out some interesting and difficult problem.

So it doesn't matter if it was actually the hardest or not, or the quality of the code base, or whatever, it's just whether or not it's a good yarn that shows off your abilities.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 06:00 PM
If I took that literally, I'd be explaining for 2 hours and there's zero way they'd get what I said anyways, lol.

I was asked this yesterday... you know, every system has it's own little challenges. A query that totally fails to execute is different than writing a query that has to be returned in 15ms. I feel like they wanted to hear about some crazy bat**** query I had to solve, details schmetails to the max, and I wouldn't have one on mind. The question sounds like judging based on some masochism that I really don't get, nor would I hold someone in higher regard for masochistic code debugging stories.

I truly don't recall details of anything at all. I just know what to do, and even so, I don't think any details would help at the end.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 06:08 PM
My recommendation is to not treat the question literally. Like every other interview question, the person is trying to figure out whether a) you're competent to do the job and b) whether they'd like to work with you. Asking about the "hardest bug" translates to asking you to say something that makes you sound competent. Telling an engaging story makes you seem fun to work with.

I sympathize with the problem of forgetting details though. I have that problem. But like if I were interviewing for some C programming gig maybe I'd talk about a time I fixed a problem in an app by using strace logs to determine that a program was crashing because it was calling the brk() system call incorrectly on newer linux kernels, and then tracing that back to a config option in a build for Mono. I remember it because it was bizarre. It sounds suitably impressive. It wasn't actually the hardest thing I've ever done but it makes a good story that can be told in a couple minutes.

If the question was about queries I'd probably try to come up with something that involved working out just the right way to build a sub-query or a temp table or something in order to make a query work efficiently. I dunno. But remember the point is just to make it sound like you know what you're talking about.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 06:27 PM
Okay, fine... I'll try this. Which one is the best?

1- Write a PL function to convert a 3M row table to arrays to get a linear graph executed in under 20ms.

2- Manually writing loose Indexes.

3- Write functions and triggers to replicate FK constraint behavior because whoever wrote the database didn't know what PKs are, and I couldn't "hack core."

4- Unwind piles and piles of views that did the exact same thing as each view it was sub-calling to generate a simple window query.

5- Removing hand-rolled ORMs that left join all the things and concatenate strings.

6- I hate recursive CTEs.

Eye-glazing queries are fairly simple to work on, really. It's a fairly uninteresting topic.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 06:31 PM
I think (1) and (2) are probably best. The rest are all fixing people doing horrible things, but hopefully not fixes you hope to have to make at your New! Amazing! job. (1) and (2) make you sound like a boss.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 06:50 PM
It's contracting. Even the most painful stuff is temporary, lol.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 09:07 PM
daveT,

I think you read way too much into nearly everything and also take a supremely negative interpretation of it all. Maybe try and consider the most optimistic reasons for people to do the things they do and you may enjoy the interactions with other people more?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 10:00 PM
It's not even just about the negativity (although thats a real thing).

Being able to communicate things you worked on / problems you hit / solutions you came up with is a real part of our jobs. It's not just a fake interview skill. Dave doesn't even seem to realize that this is an important part of the job and is something he should work on if its true that "If I took that literally, I'd be explaining for 2 hours and there's zero way they'd get what I said anyways, lol."
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 10:45 PM
dave is giving us a good example of why interview questions are the way they are.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 11:29 PM
Mysql is driving me ****ing nuts.

I have a table, doesn't really matter what's in it, but obviously, there's a primary key and there's an index on that key. OK.

select count(*) from mytable
takes like 150ms. There are a few hundred thousand rows in the table but this should be instantaneous - like 1ms or less.

If I do an explain on my query I get that it's using an index. Not the one I'd think I'd choose but if I hint it to use a different index I get the same response speed, so, I guess it doesn't matter

The explain, though, says there are 266385 rows in the index. It returns in <1ms

The query says there are 271390 and takes a minimum of 150ms

I can't explain the discrepancy and I can't explain why it takes so ****ing long.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 11:31 PM
Haha so I thought I'd try this:
select count(*) from dash_samples where dash_sample_id is not null;

I mean it's nonsense, dash_sample_id is a not-null column. In fact the explain plan says "impossible where clause"

This one takes 30ms or so. If I use another nonsense where clause like 1=1 it also finishes in 30ms
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 11:36 PM
WTF now if I just do the plain count(*) query, it's also 30ms. Kill me.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 11:42 PM
Well, I solved it, maybe, but holy ****.

mysql apparently stores it's primary key entries alongside each column in a table instead of seperately. So, uh, this might explain a lot of problems I've had over the last few months.

If I make a new key, basically a duplicate of the primary key, then count(dash_sample_id) runs in less than 1ms. count(*) still takes 30ms.

What kind of bull**** is this

ETA: no I'm wrong, adding the index did not help, I was looking at output from explain again, which is still less than 1ms.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 11:52 PM
I recall you had to do count(something) instead of count(*) for performance reasons many years ago, maybe it is still the case.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 11:55 PM
Quote:
Originally Posted by _dave_
I recall you had to do count(something) instead of count(*) for performance reasons many years ago, maybe it is still the case.
Nah it optimizes that away as long as they're not null columns. I've tried count(*) count(1) count(id) all the same.

I am not getting the 150ms times any more, I have no idea why.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-08-2017 , 11:57 PM
Frankly I'm angling for switching from mysql. I'll probably try aurora in the near future.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 12:11 AM
Quote:
Originally Posted by jjshabado
It's not even just about the negativity (although thats a real thing).

Being able to communicate things you worked on / problems you hit / solutions you came up with is a real part of our jobs. It's not just a fake interview skill. Dave doesn't even seem to realize that this is an important part of the job and is something he should work on if its true that "If I took that literally, I'd be explaining for 2 hours and there's zero way they'd get what I said anyways, lol."
I obviously don't say stuff like that in an interview.

If it's going to lead into a discussion on debugging, that's one thing, but when it's a question in writing, the question itself doesn't seem that good and comes across as strange to me. What exactly are they asking me to write for them? Mind that this another company looking for a beast SQL guru and I'm number 15 in a line of people who cried uncle.

I may have been asked this particular question once or twice in an interview. I never had my answers challenged in an interview. I've been whiteboarded 3 or 4 times. I've never had my passions asked for...

I'm legit shocked that these are placed as "common interview questions." It isn't like I don't go through the 1 to 2 hour interviews. I haven't for a long time now, and I'm not missing it.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 12:18 AM
Rusty, it sounds like your frustration is that you don't know that explain and explain analyze are two very different things.

Explain is merely estimating, and that is only a measure of the time the query planner takes to plan a query.

Explain analyze means the same thing as explain AND it is a true measure of how long it took to execute the query in question.

This difference is why you can safely run explain over an update / delete / insert query while you need to run explain analyze in a transaction and rollback.

So, with that knowledge, the query planner is probably doing a select from the information schema. One of these table holds an estimated row count. This isn't seen because explain isn't going to do show you select from information schema.

I keep saying that your memory configs are likely wrong (this assuming you are on dev and not prod), and I consider this slow select count(*) another indicator. it is also possible that you have someone pushing data into the table while you are trying a select, and you are merely in a wait. I would guess there is a way to freeze a table for the queries you are running.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 12:41 AM
Quote:
Originally Posted by daveT
I obviously don't say stuff like that in an interview.
Its really not obvious to me and I'm pretty skeptical that you effectively hide the attitude you clearly show here. Maybe I'm wrong, but I really doubt it.

But my point wasn't even about the interview answer. It's that you said to us you couldn't do it and you seemed to think that was reasonable/common.


Quote:
Originally Posted by daveT
I'm legit shocked that these are placed as "common interview questions." It isn't like I don't go through the 1 to 2 hour interviews. I haven't for a long time now, and I'm not missing it.
I don't know what you're saying here.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 08:25 AM
@daveT - that is actually a fair interview question. FWIW you've received some excellent advice and insight in this thread, heed it.

Edit: daveT, you are going to have to conform to the software development process used at any place you work. It seems that you are reluctant to do that to me even if they are paying you to.

Last edited by adios; 09-09-2017 at 08:33 AM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 08:43 AM
Quote:
Originally Posted by daveT
Rusty, it sounds like your frustration is that you don't know that explain and explain analyze are two very different things.

Explain is merely estimating, and that is only a measure of the time the query planner takes to plan a query.
Not really. Explain will tell you how many rows are contained in the index. Clearly it's an estimation since the number doesn't match count(*). Although, having thought about it, I can't think of any reason you shouldn't be able to get an exact count of the number of elements in an index without having to count them.

Quote:
Explain analyze means the same thing as explain AND it is a true measure of how long it took to execute the query in question.
That's postgres. mysql does not have a real equivalent, AFAIK, to explain analyze.

Quote:
I keep saying that your memory configs are likely wrong (this assuming you are on dev and not prod), and I consider this slow select count(*) another indicator. it is also possible that you have someone pushing data into the table while you are trying a select, and you are merely in a wait. I would guess there is a way to freeze a table for the queries you are running.
The dev environment is on my local machine. There is no data being pushed into it. Most of the config is stock. I agree that I don't have it optimized.

But there is no reason for 150ms to count(*) on a table of this size. I don't understand why it started behaving better after screwing with it but whatever, even 30ms is pretty poor.

The count(*) slowness is a fairly well known problem in mysql with innodb tables. Changing it to myisam made the count(*) under 1ms, although if you add a where it goes back 30ms or so, which supports the notion that count(*) on innodb tables does something like an index scan
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 05:20 PM
Quote:
Originally Posted by RustyBrooks
Not really. Explain will tell you how many rows are contained in the index. Clearly it's an estimation since the number doesn't match count(*). Although, having thought about it, I can't think of any reason you shouldn't be able to get an exact count of the number of elements in an index without having to count them.



That's postgres. mysql does not have a real equivalent, AFAIK, to explain analyze.



The dev environment is on my local machine. There is no data being pushed into it. Most of the config is stock. I agree that I don't have it optimized.

But there is no reason for 150ms to count(*) on a table of this size. I don't understand why it started behaving better after screwing with it but whatever, even 30ms is pretty poor.

The count(*) slowness is a fairly well known problem in mysql with innodb tables. Changing it to myisam made the count(*) under 1ms, although if you add a where it goes back 30ms or so, which supports the notion that count(*) on innodb tables does something like an index scan
Ah yes. I did get confused about the explain thing... I'm not the largest fan of MySQL either...

I think the first thing I pointed out was that MySQL has had an unfixed memory leak bug for the past 10 years, and it may be possible that this count(*) is a manifestation of that.

Not sure what your precise query is, but a count(*) without a where clause requires a full table scan, and this not a place for indexes to be used. A count(*) with a where clause likely will use an index.

IRT size_of index: indexes only have a mb size, not a precise number size. This is true in Postgres, and this query from MySQL seems to suggest the same:

Code:
select * from information_schema.INNODB_SYS_INDEXES;
I'd also point out that a used index is going to have that ~255 space. An unused index will be at zero (and if old, should be removed). I haven't looked too much, but I'm guessing that "n_fields" is the hard amount of indexes is needed to cover the whole table. If that is the case, a row count for an index is probably 255 / 4 (size of int, size of word (assembly word, not english), and so on), then multiply by n_fields to get a guess.

***

If you aren't too concerned about maximum precision at all times, a trick may be to grab the estimated row size from the information_schema. I've shown results from the estimation and a real table I have. This is from a client I work with about one week a month, and MySQL is never running at other times.

Code:
select table_rows 
from information_schema.tables
where table_schema = 'obfuscated_schema_name' 
and table_name = 'obfuscated_table_name';
-- => 1500

select count(*) from obfuscated_table_name;
-- => 1523
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 07:12 PM
A count(*) should not require a table scan - at worst it should require an index scan, right? Because the primary key index should have the same # of entries that the table has. An index scan is no picnic but it should be way better than a table scan, especially if the rows are large.

I've spent a lot of today trying to optimize some queries and I've found some manual optimizations that I really feel like mysql should do for you. For example, consider

Code:
select * 
from a
join b using (id)
where condition_on_a
limit 10
This is slow as balls because a and b are both huge. But if I change this to

Code:
select * 
from (select * from a where condition_on_a limit 10)  
join b using (id)
It magically becomes very fast. Which makes sense, the join now only has to join 10 rows, and the join is expensive, even with an index.

Which makes it seem like mysql is joining first, then applying where clauses, even though it could apply where clauses first and then join? Maybe this isn't a standard thing?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 07:14 PM
Actually as soon as I was done typing I see the problem, probably. This only works if you don't care which 10 rows you get from a, or if you sort the subquery in a first and then limit, which is also probably slow.

(I actually personally kinda don't care which ones you get from a, but often people will I think)

ETA: sorting the subquery, then limiting, then joining still comes out about 10x faster for my problem query. I actually have to sort the whole result again, but ok.

Last edited by RustyBrooks; 09-09-2017 at 07:28 PM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 07:38 PM
Quote:
Originally Posted by daveT
If I took that literally, I'd be explaining for 2 hours and there's zero way they'd get what I said anyways, lol.

I was asked this yesterday... you know, every system has it's own little challenges. A query that totally fails to execute is different than writing a query that has to be returned in 15ms. I feel like they wanted to hear about some crazy bat**** query I had to solve, details schmetails to the max, and I wouldn't have one on mind. The question sounds like judging based on some masochism that I really don't get, nor would I hold someone in higher regard for masochistic code debugging stories.

I truly don't recall details of anything at all. I just know what to do, and even so, I don't think any details would help at the end.
This post gives a pretty good indication why you don't do as well as you may think on interviews.

"I'd be explaining for 2 hours and there's zero way they'd get what I said anyways, lol."

That's pretty arrogant of you to think other technical people couldn't get the gist of what the difficulty was, even if they have no domain knowledge. That would be a failure in your communication.

"The question sounds like judging based on some masochism that I really don't get, nor would I hold someone in higher regard for masochistic code debugging stories."

Again, you seem to be building some narrative in your head about the interviewer's intentions that confirms you're existing negative biases towards the interview question. This seems like a pretty straightforward question. Debugging and optimizing stuff that's 'kinda slow' is common problem, and it has challenges. Again, not sure why you think everyone is out to get you and prove they are smarter than you.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote

      
m