Open Side Menu Go to the Top
Register
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** ** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

04-20-2017 , 02:05 PM
Quote:
Originally Posted by RustyBrooks
Yeah but it could be
'ip': 192.168.1.1
or <ip>192.168.1.1</i>
or who knows what. And someone might want to match of "<ip>192.168.1.2</ip><domain>foo.co</domain>
They also might want to look for ".1.1" or ".1/24" or "160.1" and so forth.


Whitespace or carraige returns might be nonexistent - aside from titles/descriptions most of this is machine generated and/or scraped from APIs, web pages, and a whole host of 3rd party software.

The industry, if it matters, is network security, and we are basically getting data from any source possible on a particular threat, and shoving it into this database.

We actually have a DSL for search and that does pretty well but people mostly don't use it, they just paste whatever they're searching for in the search bar and that's about it. I tried reducing the number of fields that are searched in the default search and people pitched a fit.
If these were all separate tokens:

'ip': 192.168.1.1
<ip>192.168.1.1</ip>
<ip>192.168.1.2</ip><domain>foo.co</domain>

You can query them easily by doing a LIKE on the token column itself, EG:
SELECT SUM(Frequency) FROM Tokens WHERE Token LIKE '%19.2.168.1.1%'

It's hard to help much in this way because as described the problem is still abstract to me, but the solution to make something faster is always going to be specific.

For example, if you're only wanting to search for IP's or an IP search is a common search, you perhaps might want to extract all IP addresses from the text every time the text field is updated and store them in another table somewhere.

If it's a variety of searches, on a variety of data formats where for some reason it's difficult to sensibly tokenise the data then you're always going to have problems extracting information from that data in a performant way.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 02:11 PM
Rusty, could be a stupid question, but have you looked into FTS?

https://dev.mysql.com/doc/refman/5.5...xt-search.html

LIKE is a terrible thing to use on any larger set.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 02:28 PM
sorry, I was transliterating my query. t and i were in and out, I don't remember which is which.

That subselect runs in about 20ms
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 02:31 PM
Quote:
Originally Posted by daveT
Rusty, could be a stupid question, but have you looked into FTS?

https://dev.mysql.com/doc/refman/5.5...xt-search.html

LIKE is a terrible thing to use on any larger set.
Yes, I tried FTS. The problem is, it wants to treat your columns like english. So if you search for blueberry it will match 'blueberries' also, but if you search for 'blueber' it won't find anything. It worked great if you were looking for words.

But people are searching for fragments of stuff like snort network ids rules, which are sort of key-value pairs of arbitrary data, or correlation rules, which are XML. They are searching for file hashes, IP addresses (whole or partial), the names of viruses/malware/adware/etc. Almost none of the stuff is english words or anything like it.

Someone on my team keeps pushing elastic search but IME I've never seen it make search better.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 02:33 PM
Quote:
Originally Posted by Gullanian
If these were all separate tokens:

'ip': 192.168.1.1
<ip>192.168.1.1</ip>
<ip>192.168.1.2</ip><domain>foo.co</domain>
Right but someone could legitimately want to search for any fragment, complete or incomplete. I think tokenizing it is a complete non-starter.

If I knew what people wanted to search for, semantically, then I could maybe crack this, but, people want to search for whatever they want to search for. So far our attempts to lead them towards semantic searches have failed.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 02:57 PM
I was going to suggest looking into some flavor of Lucene, which elastic search is, and basically every open source search is?

But obv doing that is more of a decision, and isn't going to work right away, but longterm it could potentially be nice, or another thing to maintain and fix when it breaks.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 03:03 PM
Quote:
Originally Posted by RustyBrooks
Yes, I tried FTS. The problem is, it wants to treat your columns like english. So if you search for blueberry it will match 'blueberries' also, but if you search for 'blueber' it won't find anything. It worked great if you were looking for words.

But people are searching for fragments of stuff like snort network ids rules, which are sort of key-value pairs of arbitrary data, or correlation rules, which are XML. They are searching for file hashes, IP addresses (whole or partial), the names of viruses/malware/adware/etc. Almost none of the stuff is english words or anything like it.

Someone on my team keeps pushing elastic search but IME I've never seen it make search better.
I was thinking they'd have cubing, levenshtein distances, and other stuff to make this sort of stuff easier. I'm rather disappointed in MySQL for doing what you are describing.

Even though PostgreSQL has a great FTS, etc, library, including the prior list and much more, I'd never use it over Solr, ES, or Lucene.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 03:11 PM
Actually I'm looking at it again and it does do some fuzzy searching. I think the default is natural language. But I'm still not super impressed. For example:

Code:
create table foo(
  bar varchar(1000),
  fulltext(bar)
);
Code:
select * from foo;
banana
pear
blueberry
blueberries
blue
Code:
select bar, match(bar) against ('blue' in boolean mode) from foo
banana	0
pear	0
blueberry	0
blueberries	0
blue	0.4885590672492981
So, uh, an exact match gets a score of < 0.5? And nothing for blueberry / blueberries?

Actually maybe I am still ****ing it up because anything but exact matches gives all zeros.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 03:27 PM
Rusty. There is a slight mistake on my part. The query, as I wrote it, would actually pull all of the data from the main table and count everything.

I should have done a right join for this very specific query. This will actually count every single item, up to 11, and max out at 11.

Code:
select i.type, count(*)
from mytable
right join
     (select type
      from mytable in
      where t.type='A'
      and t.type=i.type
      limit 11)
group by 1;
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 03:39 PM
And also, these articles looks interesting, though they are 10 years old.

https://www.percona.com/blog/2006/08...scan-to-mysql/

https://www.percona.com/blog/2006/08...ery-profiling/
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-20-2017 , 06:11 PM
Thanks I'll check those out when I get home

Sent from my Nexus 5X using Tapatalk
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 02:36 AM
I has some questions about HTTPS. I added login w/ google accounts to my little project (this and this were great tutorials on setting that up w/ passport), and I figure that when sending things like Google login tokens or whatever around I should probably do so securely. I discovered that with the place I'm hosting my website, if I just put "https" in front of the URL instead of "http", everything seems to just work without any changes on my end, but this seems almost too easy and therefore I'm suspicious of it.

So, questions:
- Should this work? Am I really in the clear for HTTPS just by using a magic server that apparently lets either protocol work for me?
- I added a middleware package that auto redirects any HTTP request to HTTPS. However, in the course of the many StackOverflow links I've read while working on this authentication stuff, one thing I saw was that I should be careful about cookies w/ identifying session information being sent over HTTP and I should make sure cookies are only sent via HTTPS. According to express-session documentation I can do that with session options but they have this caveat:

Quote:
Please note that secure: true is a recommended option. However, it requires an https-enabled website, i.e., HTTPS is necessary for secure cookies. If secure is set, and you access your site over HTTP, the cookie will not be set. If you have your node.js behind a proxy and are using secure: true, you need to set "trust proxy" in express:
This is where I'm kinda getting into "I have no idea what I'm doing" territory - how would I figure out if this is the setup my host has? Additionally, since I mentioned I've made no changes to my code to account for HTTPS, does the fact that I'm still listening on a plain old `http` server rather than `https` mean that none of this will work at all if I enable that setting?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 05:07 AM
Also, bringing up an older question I'm finally dealing with...

Quote:
Originally Posted by goofyballer
Question: when going to my web page, I'm currently immediately launching a fetch request to grab some data through my REST API. Is this the standard way of doing things, or is there something I should be doing to send the initial data that the user will see along with the initial page load?
Quote:
Originally Posted by Bantam222
The other option would be to render the data with the initial page load so the REST call is not needed?
Quote:
Originally Posted by goofyballer
Yeah, I'm curious if this is a common approach and if so, how this would be done in a way that neatly integrates with my frontend code.
So, I wound up doing this in a slightly convoluted way and welcome feedback on the "right" way to do this (though it's certainly possible the "right" way is over my head atm)...

- added express-handlebars to my project
- added this to index.html, above where I'm loading my JS bundle:
Code:
<script>var initData = {{{json initData}}};</script>
- in express on the server, do `res.render('index.html', { initData: {...user's email, username, initial data state}});
- in my react code, seeding my app component with the initial data that we baked in above:
Code:
class App extends React.Component<null, AppState> {

  constructor(props) {
    super(props);
    this.email = initData.email;
    this.username = initData.username;
    this.state = initData.state;

How good/bad is this? I did lots of googling to research and most of the stuff I came across was like "uh yeah this is what template engines are for" even though this feels like a slightly gross way of doing it.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 08:17 AM
RE secure cookies, it requires the page setting the cookie to be on HTTPS if you set the cookie to HTTPS only.

When making a site, it's good practise from the beginning to just completely ensure HTTP and make sure it's always over HTTPS. Then you can develop going forwards knowing every request is over HTTPS.

If you want to get a bit more advanced you can enable HSTS on the domain which forces HTTPS only from the browser (helps protect more about MITM attacks) but be careful as you can't really undo this once set. New sites I make though, I'll be enabling HSTS from the go.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 09:09 AM
Goofy you can look into the details of your ssl cert in chrome developer tools under the security tab. Since it came free out of the box it's likely an open source certificate authority which will issue them for free, there is nothing wrong with this imo.

It sounds like you don't control the webserver/proxy which is doing the hosting but I always just redirect to https from there, it might be an option your host provides.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 09:12 AM
Oh I guess like that middleware README explains if you're deploying to ELB or something then it's kind of a pain to alter the reverse proxy settings.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 09:24 AM
Quote:
Originally Posted by daveT
Code:
select i.type, count(*)
from mytable
right join
     (select type
      from mytable in
      where t.type='A'
      and t.type=i.type
      limit 11)
group by 1;
I don't think this makes any sense.

What key are we joining the tables on? Why is mytable even on the left, there's nothing from the left table that I need.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 10:56 AM
Just had my yearly salary review with my manager. They told me that they would be giving a 4.5% increase for all employees. I argued for my case that I wasn't being compensated according to my value and how I had grown as an employee.

So I get a second meeting and they'll give me 9%. Feelsgoodman.

Last edited by Wolfram; 04-21-2017 at 11:13 AM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 11:54 AM
Quote:
Originally Posted by RustyBrooks
I don't think this makes any sense.

What key are we joining the tables on? Why is mytable even on the left, there's nothing from the left table that I need.
Yeah, I shouldn't write things whilst half asleep. The exists in ... limit pattern generates the correct result. If the subquery is 20ms (hooray!), then something like this, per the article?

Code:
select col, count(*)
from tbl
where col = 'a'
group by col
limit 11

union all

select col, count(*)
from tbl
where col = 'b'
group by col
limit 11

union all

select col, count(*)
from tbl
where col = 'c'
group by col
limit 11
It looks horrendous, but if you only really need a handful of values, that's great. I'm not sure if MySQL has an iterable, but maybe using a processing language to get all of that generated on the fly (or just using your DSL) would make it less manual.

I'm actually kind of surprised the initial idea turned out slow. I found that article because I was looking for a work-around. When push comes to shove, you really want to find a "loose index," though it appears that MySQL already does this by default only on indexed columns. I was trying to figure out if there was a way to make some sort of analog to loose index without the actual index. This can be done with CTE's, so not sure how you'd fake the CTE in MySQL. Since CTEs are looping / recursive, finding and using an iterate function may end up doing something close.

Last edited by daveT; 04-21-2017 at 12:01 PM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 12:03 PM
If it turns out there are fewer than 11 records matching x, then you're gonna have to do a full table scan to know that, so isn't this query going to be linear time anyway unless you figure out a way to index on the field you're matching on? Or do you expect it to be best case most of the time, which you're optimizing for?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 12:05 PM
Yeah my current prototype is basically doing this union all. It works OK.

I tried doing
select type, count(*) from mytable where id in (select type, count(*) from mytable where type='A' limit 11)
but you can't do limits inside in queries in mysql, apparently.

I have half a mind to write all this **** to files and try grep on it. I'm not even kidding.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 12:07 PM
Quote:
Originally Posted by iversonian
If it turns out there are fewer than 11 records matching x, then you're gonna have to do a full table scan to know that, so isn't this query going to be linear time anyway unless you figure out a way to index on the field you're matching on? Or do you expect it to be best case most of the time, which you're optimizing for?
Most of the time queries return a fairly large number of results. I am sure there are very unique queries that return little, or nothing, but yeah, I can't do a lot about that.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 01:17 PM
Quote:
Originally Posted by RustyBrooks
Yeah my current prototype is basically doing this union all. It works OK.

I tried doing
select type, count(*) from mytable where id in (select type, count(*) from mytable where type='A' limit 11)
but you can't do limits inside in queries in mysql, apparently.

I have half a mind to write all this **** to files and try grep on it. I'm not even kidding.
This query doesn't say what you are intending to say.

If you have one record for "A", it is a single item, regardless of the limit you said. If you are doing aggregates across multiple items, you will get back 11 rows maximum.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 01:44 PM
Quote:
Originally Posted by goofyballer
I has some questions about HTTPS. I added login w/ google accounts to my little project (this and this were great tutorials on setting that up w/ passport), and I figure that when sending things like Google login tokens or whatever around I should probably do so securely. I discovered that with the place I'm hosting my website, if I just put "https" in front of the URL instead of "http", everything seems to just work without any changes on my end, but this seems almost too easy and therefore I'm suspicious of it.
This is actually how stripe and other payment processors work.

So, questions:
Quote:
- Should this work? Am I really in the clear for HTTPS just by using a magic server that apparently lets either protocol work for me?
A server, as you may know, is basically a computer with ports that allows outside traffic in.

Port 80 is the standard public-facing HTTP port.

Port 443 is the port used for HTTPS.

Port [3000 | 8080 | etc] is the port you are using with an app server like Jetty, etc. I'm not sure what the defaults are for Node, but I don't ever use the default ports if I can help it. I'm going to use 3000.

If you are using a proxy server like nginx or apache, the inbound traffic is coming in at port 80. When you have HTTPs set up, you are redirecting 80 -> 443 -> 3000.

Quote:
- I added a middleware package that auto redirects any HTTP request to HTTPS. However, in the course of the many StackOverflow links I've read while working on this authentication stuff, one thing I saw was that I should be careful about cookies w/ identifying session information being sent over HTTP and I should make sure cookies are only sent via HTTPS. According to express-session documentation I can do that with session options but they have this caveat:

Please note that secure: true is a recommended option. However, it requires an https-enabled website, i.e., HTTPS is necessary for secure cookies. If secure is set, and you access your site over HTTP, the cookie will not be set. If you have your node.js behind a proxy and are using secure: true, you need to set "trust proxy" in express:

This is where I'm kinda getting into "I have no idea what I'm doing" territory - how would I figure out if this is the setup my host has? Additionally, since I mentioned I've made no changes to my code to account for HTTPS, does the fact that I'm still listening on a plain old `http` server rather than `https` mean that none of this will work at all if I enable that setting?
The caveat is basically what I wrote above. When you are using the middleware, you are redirecting 80 -> 443 -> 3000 using middleware instead of nginx / apache. Your site will always redirect away from port 80, so accessing HTTP, as you have it set up, isn't possible...

It can be an issue if your cert expires, there is some misconfig, etc, which would temporarily force your app to be served on HTTP. That's all HTTPS is necessary for secure cookies. If secure is set, and you access your site over HTTP, the cookie will not be set is saying. This means that, if you have an HTTPs problem, you will have to rebuild your app with the new params until it can be served on HTTPs again.

I don't know all the specifics here, but If you have your node.js behind a proxy and are using secure: true, you need to set "trust proxy" in express: is saying "if you are using nginx / apache / etc and you are redirecting 80 -> 443 -> 3000, you need to add a param to add the 'trust proxy' param."

Hope all that helps. My personal tastes is to use a proxy server and skip redirecting middleware. The other option is using one of CloudFlare's certs, which is included in their free tier. I'm pretty sure they are better at dealing with certs than I am.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
04-21-2017 , 01:57 PM
Quote:
Originally Posted by daveT
This query doesn't say what you are intending to say.

If you have one record for "A", it is a single item, regardless of the limit you said. If you are doing aggregates across multiple items, you will get back 11 rows maximum.
Yeah I should stop trying to re-obfuscate my queries. What I intended was

select type, count(*) from mytable where id in (select id from mytable where type='A' and myquery limit 11)

but obviously where the in clause repeated, i.e.

id in (select id from mytable where type = 'A' ...) or
id in (select id from mytable where type = 'B' ...) or ...
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote

      
m