** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** - Page 1101 - Computer Technical Help

Yeah but it could be
'ip': 192.168.1.1
or <ip>192.168.1.1</i>
or who knows what. And someone might want to match of "<ip>192.168.1.2</ip><domain>foo.co</domain>
They also might want to look for ".1.1" or ".1/24" or "160.1" and so forth.

Whitespace or carraige returns might be nonexistent - aside from titles/descriptions most of this is machine generated and/or scraped from APIs, web pages, and a whole host of 3rd party software.

The industry, if it matters, is network security, and we are basically getting data from any source possible on a particular threat, and shoving it into this database.

We actually have a DSL for search and that does pretty well but people mostly don't use it, they just paste whatever they're searching for in the search bar and that's about it. I tried reducing the number of fields that are searched in the default search and people pitched a fit.

If these were all separate tokens:

'ip': 192.168.1.1
<ip>192.168.1.1</ip>
<ip>192.168.1.2</ip><domain>foo.co</domain>

You can query them easily by doing a LIKE on the token column itself, EG:
SELECT SUM(Frequency) FROM Tokens WHERE Token LIKE '%19.2.168.1.1%'

It's hard to help much in this way because as described the problem is still abstract to me, but the solution to make something faster is always going to be specific.

For example, if you're only wanting to search for IP's or an IP search is a common search, you perhaps might want to extract all IP addresses from the text every time the text field is updated and store them in another table somewhere.

If it's a variety of searches, on a variety of data formats where for some reason it's difficult to sensibly tokenise the data then you're always going to have problems extracting information from that data in a performant way.

Quote

04-20-2017 , 02:11 PM

#27502

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

Rusty, could be a stupid question, but have you looked into FTS?

https://dev.mysql.com/doc/refman/5.5...xt-search.html

LIKE is a terrible thing to use on any larger set.

Quote

04-20-2017 , 02:28 PM

#27503

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

sorry, I was transliterating my query. t and i were in and out, I don't remember which is which.

That subselect runs in about 20ms

Quote

04-20-2017 , 02:31 PM

#27504

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Quote:

Originally Posted by daveT

Rusty, could be a stupid question, but have you looked into FTS?

https://dev.mysql.com/doc/refman/5.5...xt-search.html

LIKE is a terrible thing to use on any larger set.

Yes, I tried FTS. The problem is, it wants to treat your columns like english. So if you search for blueberry it will match 'blueberries' also, but if you search for 'blueber' it won't find anything. It worked great if you were looking for words.

But people are searching for fragments of stuff like snort network ids rules, which are sort of key-value pairs of arbitrary data, or correlation rules, which are XML. They are searching for file hashes, IP addresses (whole or partial), the names of viruses/malware/adware/etc. Almost none of the stuff is english words or anything like it.

Someone on my team keeps pushing elastic search but IME I've never seen it make search better.

Quote

04-20-2017 , 02:33 PM

#27505

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Quote:

Originally Posted by Gullanian

If these were all separate tokens:

'ip': 192.168.1.1
<ip>192.168.1.1</ip>
<ip>192.168.1.2</ip><domain>foo.co</domain>

Right but someone could legitimately want to search for any fragment, complete or incomplete. I think tokenizing it is a complete non-starter.

If I knew what people wanted to search for, semantically, then I could maybe crack this, but, people want to search for whatever they want to search for. So far our attempts to lead them towards semantic searches have failed.

Quote

04-20-2017 , 02:57 PM

#27506

Larry Legend

Celtic Pride

Join Date: Jul 2009 Posts: 42,920

I was going to suggest looking into some flavor of Lucene, which elastic search is, and basically every open source search is?

But obv doing that is more of a decision, and isn't going to work right away, but longterm it could potentially be nice, or another thing to maintain and fix when it breaks.

Quote

04-20-2017 , 03:03 PM

#27507

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

Quote:

Originally Posted by RustyBrooks

I was thinking they'd have cubing, levenshtein distances, and other stuff to make this sort of stuff easier. I'm rather disappointed in MySQL for doing what you are describing.

Even though PostgreSQL has a great FTS, etc, library, including the prior list and much more, I'd never use it over Solr, ES, or Lucene.

Quote

04-20-2017 , 03:11 PM

#27508

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Actually I'm looking at it again and it does do some fuzzy searching. I think the default is natural language. But I'm still not super impressed. For example:

Code:

create table foo(
  bar varchar(1000),
  fulltext(bar)
);

Code:

select * from foo;
banana
pear
blueberry
blueberries
blue

Code:

select bar, match(bar) against ('blue' in boolean mode) from foo
banana	0
pear	0
blueberry	0
blueberries	0
blue	0.4885590672492981

So, uh, an exact match gets a score of < 0.5? And nothing for blueberry / blueberries?

Actually maybe I am still ****ing it up because anything but exact matches gives all zeros.

Quote

04-20-2017 , 03:27 PM

#27509

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

Rusty. There is a slight mistake on my part. The query, as I wrote it, would actually pull all of the data from the main table and count everything.

I should have done a right join for this very specific query. This will actually count every single item, up to 11, and max out at 11.

Code:

select i.type, count(*)
from mytable
right join
     (select type
      from mytable in
      where t.type='A'
      and t.type=i.type
      limit 11)
group by 1;

Quote

04-20-2017 , 03:39 PM

#27510

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

And also, these articles looks interesting, though they are 10 years old.

https://www.percona.com/blog/2006/08...scan-to-mysql/

https://www.percona.com/blog/2006/08...ery-profiling/

Quote

04-20-2017 , 06:11 PM

#27511

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Thanks I'll check those out when I get home

Sent from my Nexus 5X using Tapatalk

Quote

04-21-2017 , 02:36 AM

#27512

goofyballer

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 70,719

I has some questions about HTTPS. I added login w/ google accounts to my little project (this and this were great tutorials on setting that up w/ passport), and I figure that when sending things like Google login tokens or whatever around I should probably do so securely. I discovered that with the place I'm hosting my website, if I just put "https" in front of the URL instead of "http", everything seems to just work without any changes on my end, but this seems almost too easy and therefore I'm suspicious of it.

So, questions:
- Should this work? Am I really in the clear for HTTPS just by using a magic server that apparently lets either protocol work for me?
- I added a middleware package that auto redirects any HTTP request to HTTPS. However, in the course of the many StackOverflow links I've read while working on this authentication stuff, one thing I saw was that I should be careful about cookies w/ identifying session information being sent over HTTP and I should make sure cookies are only sent via HTTPS. According to express-session documentation I can do that with session options but they have this caveat:

Quote:

Please note that secure: true is a recommended option. However, it requires an https-enabled website, i.e., HTTPS is necessary for secure cookies. If secure is set, and you access your site over HTTP, the cookie will not be set. If you have your node.js behind a proxy and are using secure: true, you need to set "trust proxy" in express:

This is where I'm kinda getting into "I have no idea what I'm doing" territory - how would I figure out if this is the setup my host has? Additionally, since I mentioned I've made no changes to my code to account for HTTPS, does the fact that I'm still listening on a plain old `http` server rather than `https` mean that none of this will work at all if I enable that setting?

Quote

04-21-2017 , 05:07 AM

#27513

goofyballer

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 70,719

Also, bringing up an older question I'm finally dealing with...

Quote:

Originally Posted by goofyballer

Question: when going to my web page, I'm currently immediately launching a fetch request to grab some data through my REST API. Is this the standard way of doing things, or is there something I should be doing to send the initial data that the user will see along with the initial page load?

Quote:

Originally Posted by Bantam222

The other option would be to render the data with the initial page load so the REST call is not needed?

Quote:

Originally Posted by goofyballer

Yeah, I'm curious if this is a common approach and if so, how this would be done in a way that neatly integrates with my frontend code.

So, I wound up doing this in a slightly convoluted way and welcome feedback on the "right" way to do this (though it's certainly possible the "right" way is over my head atm)...

- added express-handlebars to my project
- added this to index.html, above where I'm loading my JS bundle:

Code:

<script>var initData = {{{json initData}}};</script>

- in express on the server, do `res.render('index.html', { initData: {...user's email, username, initial data state}});
- in my react code, seeding my app component with the initial data that we baked in above:

Code:

class App extends React.Component<null, AppState> {

  constructor(props) {
    super(props);
    this.email = initData.email;
    this.username = initData.username;
    this.state = initData.state;

How good/bad is this? I did lots of googling to research and most of the stuff I came across was like "uh yeah this is what template engines are for" even though this feels like a slightly gross way of doing it.

Quote

04-21-2017 , 08:17 AM

#27514

Gullanian

Carpal \'Tunnel

Join Date: Dec 2006 Posts: 14,014

RE secure cookies, it requires the page setting the cookie to be on HTTPS if you set the cookie to HTTPS only.

When making a site, it's good practise from the beginning to just completely ensure HTTP and make sure it's always over HTTPS. Then you can develop going forwards knowing every request is over HTTPS.

If you want to get a bit more advanced you can enable HSTS on the domain which forces HTTPS only from the browser (helps protect more about MITM attacks) but be careful as you can't really undo this once set. New sites I make though, I'll be enabling HSTS from the go.

Quote

04-21-2017 , 09:09 AM

#27515

da_fume

veteran

Join Date: May 2007 Posts: 3,035

Goofy you can look into the details of your ssl cert in chrome developer tools under the security tab. Since it came free out of the box it's likely an open source certificate authority which will issue them for free, there is nothing wrong with this imo.

It sounds like you don't control the webserver/proxy which is doing the hosting but I always just redirect to https from there, it might be an option your host provides.

Quote

04-21-2017 , 09:12 AM

#27516

da_fume

veteran

Join Date: May 2007 Posts: 3,035

Oh I guess like that middleware README explains if you're deploying to ELB or something then it's kind of a pain to alter the reverse proxy settings.

Quote

04-21-2017 , 09:24 AM

#27517

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Quote:

Originally Posted by daveT

Code:

select i.type, count(*)
from mytable
right join
     (select type
      from mytable in
      where t.type='A'
      and t.type=i.type
      limit 11)
group by 1;

I don't think this makes any sense.

What key are we joining the tables on? Why is mytable even on the left, there's nothing from the left table that I need.

Quote

04-21-2017 , 10:56 AM

#27518

Wolfram

Carpal \'Tunnel

Join Date: Jan 2006 Posts: 15,066

Just had my yearly salary review with my manager. They told me that they would be giving a 4.5% increase for all employees. I argued for my case that I wasn't being compensated according to my value and how I had grown as an employee.

So I get a second meeting and they'll give me 9%. Feelsgoodman.

Last edited by Wolfram; 04-21-2017 at 11:13 AM.

Quote

04-21-2017 , 11:54 AM

#27519

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

Quote:

Originally Posted by RustyBrooks

I don't think this makes any sense.

What key are we joining the tables on? Why is mytable even on the left, there's nothing from the left table that I need.

Yeah, I shouldn't write things whilst half asleep. The exists in ... limit pattern generates the correct result. If the subquery is 20ms (hooray!), then something like this, per the article?

Code:

select col, count(*)
from tbl
where col = 'a'
group by col
limit 11

union all

select col, count(*)
from tbl
where col = 'b'
group by col
limit 11

union all

select col, count(*)
from tbl
where col = 'c'
group by col
limit 11

It looks horrendous, but if you only really need a handful of values, that's great. I'm not sure if MySQL has an iterable, but maybe using a processing language to get all of that generated on the fly (or just using your DSL) would make it less manual.

I'm actually kind of surprised the initial idea turned out slow. I found that article because I was looking for a work-around. When push comes to shove, you really want to find a "loose index," though it appears that MySQL already does this by default only on indexed columns. I was trying to figure out if there was a way to make some sort of analog to loose index without the actual index. This can be done with CTE's, so not sure how you'd fake the CTE in MySQL. Since CTEs are looping / recursive, finding and using an iterate function may end up doing something close.

Last edited by daveT; 04-21-2017 at 12:01 PM.

Quote

04-21-2017 , 12:03 PM

#27520

iversonian

Carpal \'Tunnel

Join Date: Sep 2003 Posts: 21,441

If it turns out there are fewer than 11 records matching x, then you're gonna have to do a full table scan to know that, so isn't this query going to be linear time anyway unless you figure out a way to index on the field you're matching on? Or do you expect it to be best case most of the time, which you're optimizing for?

Quote

04-21-2017 , 12:05 PM

#27521

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Yeah my current prototype is basically doing this union all. It works OK.

I tried doing
select type, count(*) from mytable where id in (select type, count(*) from mytable where type='A' limit 11)
but you can't do limits inside in queries in mysql, apparently.

I have half a mind to write all this **** to files and try grep on it. I'm not even kidding.

Quote

04-21-2017 , 12:07 PM

#27522

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Quote:

Originally Posted by iversonian

Most of the time queries return a fairly large number of results. I am sure there are very unique queries that return little, or nothing, but yeah, I can't do a lot about that.

Quote

04-21-2017 , 01:17 PM

#27523

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

Quote:

Originally Posted by RustyBrooks

This query doesn't say what you are intending to say.

If you have one record for "A", it is a single item, regardless of the limit you said. If you are doing aggregates across multiple items, you will get back 11 rows maximum.

Quote

04-21-2017 , 01:44 PM

#27524

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

Quote:

Originally Posted by goofyballer

This is actually how stripe and other payment processors work.

So, questions:

Quote:

- Should this work? Am I really in the clear for HTTPS just by using a magic server that apparently lets either protocol work for me?

A server, as you may know, is basically a computer with ports that allows outside traffic in.

Port 80 is the standard public-facing HTTP port.

Port 443 is the port used for HTTPS.

Port [3000 | 8080 | etc] is the port you are using with an app server like Jetty, etc. I'm not sure what the defaults are for Node, but I don't ever use the default ports if I can help it. I'm going to use 3000.

If you are using a proxy server like nginx or apache, the inbound traffic is coming in at port 80. When you have HTTPs set up, you are redirecting 80 -> 443 -> 3000.

Quote:

- I added a middleware package that auto redirects any HTTP request to HTTPS. However, in the course of the many StackOverflow links I've read while working on this authentication stuff, one thing I saw was that I should be careful about cookies w/ identifying session information being sent over HTTP and I should make sure cookies are only sent via HTTPS. According to express-session documentation I can do that with session options but they have this caveat:

Please note that secure: true is a recommended option. However, it requires an https-enabled website, i.e., HTTPS is necessary for secure cookies. If secure is set, and you access your site over HTTP, the cookie will not be set. If you have your node.js behind a proxy and are using secure: true, you need to set "trust proxy" in express:

This is where I'm kinda getting into "I have no idea what I'm doing" territory - how would I figure out if this is the setup my host has? Additionally, since I mentioned I've made no changes to my code to account for HTTPS, does the fact that I'm still listening on a plain old `http` server rather than `https` mean that none of this will work at all if I enable that setting?

The caveat is basically what I wrote above. When you are using the middleware, you are redirecting 80 -> 443 -> 3000 using middleware instead of nginx / apache. Your site will always redirect away from port 80, so accessing HTTP, as you have it set up, isn't possible...

It can be an issue if your cert expires, there is some misconfig, etc, which would temporarily force your app to be served on HTTP. That's all HTTPS is necessary for secure cookies. If secure is set, and you access your site over HTTP, the cookie will not be set is saying. This means that, if you have an HTTPs problem, you will have to rebuild your app with the new params until it can be served on HTTPs again.

I don't know all the specifics here, but If you have your node.js behind a proxy and are using secure: true, you need to set "trust proxy" in express: is saying "if you are using nginx / apache / etc and you are redirecting 80 -> 443 -> 3000, you need to add a param to add the 'trust proxy' param."

Hope all that helps. My personal tastes is to use a proxy server and skip redirecting middleware. The other option is using one of CloudFlare's certs, which is included in their free tier. I'm pretty sure they are better at dealing with certs than I am.

Quote

04-21-2017 , 01:57 PM

#27525

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Quote:

Originally Posted by daveT

Yeah I should stop trying to re-obfuscate my queries. What I intended was

select type, count(*) from mytable where id in (select id from mytable where type='A' and myquery limit 11)

but obviously where the in clause repeated, i.e.

id in (select id from mytable where type = 'A' ...) or
id in (select id from mytable where type = 'B' ...) or ...

Quote

Page 1101 of 1603

First

101 601 901 1001 1051 1081 1091 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1111 1121 1151 1201 1301 1601

Last

Post Reply Subscribe

...

Page 1101 of 1603

First

101 601 901 1001 1051 1081 1091 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1111 1121 1151 1201 1301 1601

Last