Open Side Menu Go to the Top
Register
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** ** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

09-09-2017 , 08:07 PM
Quote:
Originally Posted by RustyBrooks
A count(*) should not require a table scan - at worst it should require an index scan, right? Because the primary key index should have the same # of entries that the table has.
An index is a one-to-many address pointer. To get a count, the table has to be scanned to achieve one-to-one.

An address pointer is going to help if you are isolating a subset of the entire table, and thus it should use the index to scan and then count the returned rows.

If you think of it, there is no real difference between counting all the addresses of all indexes and counting all rows of a table. Now, if indexes had a flag like "full" where "full" equals 100 rows or whatever, then that would be faster, but indexes don't have a set size nor is that set N stored anywhere than I'm aware of.

This relates to the queries you posted: the small table sort of works like an index to the large table.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 08:42 PM
How many of each primary key would you expect to find in a table (hint: 1)

But even if that wasn't the case, you can scan any single index and get a count for the number of rows in the table. There has to be a single entry in the index for each row in the table (excluding some fairly new features in some databases like partial indexes)

Now, if the index has low exclusivity, then the index will have less than N "nodes" where N is the number of rows in the table, but each node will have pointers to the rows that match that value. So you can walk the nodes + their associated rows and you should be able to get the number of rows in the table from that.

I have no idea what you're going on about re: having to scan every index. Keep in mind we're talking about a count(*) with no where clause, i.e. "how many rows are in the table"

I understand why my example is faster, except, I feel that the database should do it for me. BTW at some point if you increase the limit X to a high enough value, it gets slower again, presumably because it has to make a temporary table for my values. But there's no reason it could optimize the query for me to limit the first table based on applicable where clauses.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 09:18 PM
Why would it makes sense to scan every entry in an index -vs- every entry in a table column? What's the difference?

http://use-the-index-luke.com/sql/anatomy
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-09-2017 , 10:14 PM
Quote:
Originally Posted by daveT
Why would it makes sense to scan every entry in an index -vs- every entry in a table column? What's the difference?
It depends on how data in a table is stored, but I think it's typical for rows to be stored sequentially in blocks. Sometimes rows have variable sizes. This means to count the rows by going through each one, you'd have to make a series of disk reads over a large section of disk - probably many different parts of the disk.

The index should be very small. In the case of things like primary keys the size of each index entry should be fixed. Whether the index is in memory or on disk, stepping through all of it should be faster than doing the equivalent on a table.

If the table fits in RAM and is entirely in ram, then there probably is not that much difference.

There are many cases in mysql where it does index scans instead of table scans. If they were equivalent, I don't think they'd do that. For example, if you do something like

select * from mytable where mycol like '%search%'

and mycol has an index on it, it will do a full index scan, instead of a full table scan.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-10-2017 , 01:35 AM
Using a VMware player image that I can't take snapshots on. Wondering if git would be a good poor mans snapshot
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-10-2017 , 02:20 AM
Quote:
Originally Posted by Noodle Wazlib
Using a VMware player image that I can't take snapshots on. Wondering if git would be a good poor mans snapshot
Just backup the images to another drive like you would backup any other file.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-10-2017 , 09:21 PM
Quote:
Originally Posted by RustyBrooks
It depends on how data in a table is stored, but I think it's typical for rows to be stored sequentially in blocks. Sometimes rows have variable sizes. This means to count the rows by going through each one, you'd have to make a series of disk reads over a large section of disk - probably many different parts of the disk.
Not sure if you are talking about tables or indexes here, but if tables, the only sequence that is stored is the PK, but as you know, "select col" doesn't return a sorted order.

An index is "sorted" in the sense that it is in blocks. In common parlance, "index" refers to a b-tree (balanced tree), which is exactly what it sounds like in CS terms.

IRT to reading on memory or disk, this depends on two factors:

1- the buffer size of the index -vs- the max size of the index
2- the cache buffer -vs- cache sizes.

A table scan / sort / lookup will attempt to hold as much index / table in memory, but if there is not enough space allocated, then it will revert to disk IO.

I should also mention that if your times are all over the place, drastically shifting throughout the day, you should probably flush out your cache and test again. In general, you don't want to take the first 2 or 3 times, as you are trying to capture the real-use case, which does involve caching. A new query needs a bit of time to heat up.

Quote:
The index should be very small.
On it's nose, an index is a horrible idea. It requires a lot of ram to hold data, requires maintenance, slows down writes, and so on. The benefit for an index is the read speed, and this cost / benefit is the entire crux of tuning a database for indexes, deciding what indexes and how, and so on. If you have an index that is never used, then it does nothing but add write-time, etc.

Indexes require a lot more ram than you may intuit. I was going to say allocating 50% of your memory isn't totally out of bounds, but here MySQL suggests up to 70%.

http://mysql.rjweb.org/doc.php/memory

Quote:
In the case of things like primary keys the size of each index entry should be fixed.
The query I posted above shows otherwise. I haven't tuned my MySQL database, but all indexes are sitting between 250 and 260. I'd guess yours is in a similar (sane) range.

Quote:
Whether the index is in memory or on disk, stepping through all of it should be faster than doing the equivalent on a table.
I'm not sure about that one. I don't recall anything that says counting N nodes in an a non-perfect tree is faster than counting N items in an array, for example.

If I'm wrong on that, I'd definitely like to know about it.

Quote:
There are many cases in mysql where it does index scans instead of table scans. If they were equivalent, I don't think they'd do that. For example, if you do something like

select * from mytable where mycol like '%search%'

and mycol has an index on it, it will do a full index scan, instead of a full table scan.
This looks strange to me. Are you sure you are doing partial-string searches here?

A b-tree index is a series of scalar nodes. There is no mechanism (that I'm aware of) that says a b-tree is going to have nodes 'bone', 'ball', 'bat' and have nodes 'one', 'all', 'at' back-referencing to the larger words.

According to the doc, an index will search the first few letters as long as the index is specified.

https://dev.mysql.com/doc/refman/5.7...ate-index.html

However, I think FTS are better left to the correct index types.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-10-2017 , 10:17 PM
Dave,

I'm not going to answer this in detail, but, either you have not understood most of what I said, or you don't understand how indexes work.

Like, just take my last point first. When doing a full glob search like %foo% on an indexed column, I said it would do a full index search. By this I obviously do not mean that "foo" is going to be an entry in the btree, but rather that it's cheaper to scan every entry in the index to see if it matches %foo% than it would be to search that column in the table for the same result.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-10-2017 , 10:29 PM
It's possible I'm not understanding you fully, but regardless, all the indexes in the world won't change a thing until you have enough memory for indexes and cache.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 02:53 AM
Quote:
Originally Posted by daveT
Why would it makes sense to scan every entry in an index -vs- every entry in a table column? What's the difference?

http://use-the-index-luke.com/sql/anatomy
I think Rusty's point was a bit lost: Index scans are faster than scans on the entire table because you have the relevant information located in far fewer disk blocks, even if the index information itself isn't useful to the query.

e.g.:
When trying to find all pages in a (physical) book that start with a three letter word, would you rather go through the book page by page, or work with an alphabetically sorted index of the first word on each page? The index (alphabetical order) won't help directly, but having the relevant data all in one place does. You still need to read the same # of words, but they are easier accessible.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 10:09 AM
Quote:
Originally Posted by daveT
It's possible I'm not understanding you fully, but regardless, all the indexes in the world won't change a thing until you have enough memory for indexes and cache.
I should be able to fit my entire database, indexes and all, into RAM 3x over
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 10:22 AM
If I have a hobby project on github that I'm working on at home and I clone it on my work machine is there any chance I'm accidentally giving the copyright to my employer?

What about if I push changes to the repo from my work machine (during work hours). Does that change anything?

edit:
Yes, this was a storyline on Silicon Valley.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 10:33 AM
You have to check your contract. It totally depends. It wouldn't matter for any contract I've signed (as in my employer doesn't get rights to things I work on not related to my job).
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 10:37 AM
On my contract it would be unlikely to be enforced but technically if I use their machines to create things they could say it's theirs.

So I just use my personal mbp for personal stuff and the work one for work stuff to not risk it.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 11:06 AM
Thanks. I guess I'll ask my supervisor and read my contract.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 12:23 PM
Quick caveat when moving to Aurora. It uses repeatable read by default. This caused us huge problems when we migrated from RDS. We had some reporting tools hooked up to a replica cluster. Those tools have some very slow queries that they run and as a result we got constant replication failures.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 01:28 PM
First full week of having analytics installed, 48k page views. Neat.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 01:36 PM
Quote:
Originally Posted by Wolfram
Thanks. I guess I'll ask my supervisor and read my contract.
It's a good reason for having a work computer and a personal one, even though lugging them both around is a pain.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 01:49 PM
I have two Mac 13" pros. One has a sticker one doesn't. Just make sure to double-check you're on the right one before porn time kids!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 01:58 PM
Okay,

So, irt to the index conversation.I posted links to very good resources for everything I said. From the mysql page:

For string columns, indexes can be created that use only the leading part of column value

Either I misinterpreted that, or the query shown isn't behaving as I expect, or mysql docs are wrong.

IRT to tuning databases and indexes. MySQL / Postgers / Oracle / whatever does not default to max out your machine. This is partly why your indexes (and queries) are acting wierd, mis-firing, going slower than. This is why I linked to mysql-tuner or whatever that is called up post.

Tuning indexes is easily the most difficult, tedious, and time-consuming thing you can ever do with a database. You really can't appreciate this fact until you've done it. You need a book to do it, no online resources I'm aware of are really that good.

You have to tune your database first, then you tune your indexes. Trying to measure indexes before tuning the database is putting the cart so far before the horse that the cart has already fell off the cliff and is now dragging the horse down with it.

For now, I'm just done with this conversation.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-11-2017 , 02:41 PM
Quote:
Originally Posted by daveT
Okay,

So, irt to the index conversation.I posted links to very good resources for everything I said. From the mysql page:

For string columns, indexes can be created that use only the leading part of column value

Either I misinterpreted that, or the query shown isn't behaving as I expect, or mysql docs are wrong.
If you want to use an index to really speed things up, then yeah, you have to do something like "mycol like 'foo%'" and it'll do an index range scan, which is way way faster than mycol like '%foo%'

But still, if given the choice for the %foo% case, you would rather scan the index from start to finish than scan the table from start to finish. And that's what mysql does - you can see it in the query planner output.

https://www.percona.com/blog/2012/11...n-performance/
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-12-2017 , 02:59 AM
Quote:
Originally Posted by Wolfram
If I have a hobby project on github that I'm working on at home and I clone it on my work machine is there any chance I'm accidentally giving the copyright to my employer?

What about if I push changes to the repo from my work machine (during work hours). Does that change anything?

edit:
Yes, this was a storyline on Silicon Valley.
Are you in America? This probably also depends on local laws, maybe even within each state? Like I think in California, "on your time, with your resources, not in your employer's line of business" = it's yours and your employer can't claim it from you, but that may be on the generous (for individuals) side of the law.

More reading: here
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-12-2017 , 04:40 AM
Goofy did you have any breakthrough on the react slowness issue? seems pretty weird (300 divs and spans is not a lot!) but I didn't investigate it much yet...
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-12-2017 , 06:57 AM
Quote:
Originally Posted by goofyballer
Are you in America?
Nah, I'm a euro.

I should just discuss this with my employer. They're pretty reasonable. Probably not a big deal, I mostly want to make sure that if I make something and open-source it that it stays open source.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
09-12-2017 , 02:07 PM
Quote:
Originally Posted by _dave_
Goofy did you have any breakthrough on the react slowness issue? seems pretty weird (300 divs and spans is not a lot!) but I didn't investigate it much yet...
Did you see the fiddle I posted earlier demonstrating the issue? That would give you something to poke at if you have any ideas, but I think I kinda accepted that it's slow and there are workarounds like lostmypw's that can mitigate the issue (I decided to just show 20 items on page load and have a button on the bottom to load more if necessary, when I have more time I'll figure out how to do that automatically based on page scrolling).

Quote:
Originally Posted by Wolfram
Nah, I'm a euro.
I remember, I'm pretty sure we met in Vegas a lifetime ago, wasn't sure if maybe you'd moved here or anything since then
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote

      
m