Open Side Menu Go to the Top
Register
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** ** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

03-24-2017 , 10:32 AM
Interesting post!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 10:37 AM
Quote:
Originally Posted by Gullanian


Loop each char and build a new string ignoring a-z's. Much easier for someone else to understand, much easier to modify and maintain and I'm sure will be more performant as well.
Strongly disagree here. If I saw this in a code base, I'd say, "Why did you just write 3 or 5 lines of code instead of using a regex?" It's literally tailor made for the problem.

You guys are talking about them as if they're some beastly algorithm that no one can be expected to internalize. But basic ones are incredibly simple and not hard to remember -- and that's coming from a programmer with a terrible memory.

And so what if you have to quickly google or check in an online tester? If, eg, I haven't programmed ruby for a while I have to google if upper-casing is "upcase" or "to_upper" or "uppercase". That I have to look it up isn't an argument against its being the correct solution to the problem. I'd never roll my own solution so I wouldn't have to be bothered googling.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 10:51 AM
Quote:
Originally Posted by Gullanian
Agree with this for most stuff, but even simple looking regex's can have n squared performance.

Jeff Atwood did a post on it:
https://blog.codinghorror.com/regex-performance/

Small strings, small patterns sometimes = big meaningful time sinks. I always avoid using Regex if I can. Could call it premature optimisation but it's safer long term as well if someone else comes along who doesn't know what they are doing and modifies the expression horribly which is not unusual!
It's an interesting blog post, but this does not seem like an actual problem in practice tbh. The example in the blog looks heavily constructed and the solution is pretty much common sense:
Quote:
The solution is simple. When nesting repetition operators, make absolutely sure that there is only one way to match the same match.
It's not like you will easily stumble into some terribly performing regexp by accident, but if you can find a realistic example I'd be really curious to see it.

Maybe don't nest repetition operators at all if you are really paranoid about it. But avoiding regexp entirely because you may need to profile them for performance once in a decade seems like a terrible trade-off.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 10:54 AM
Quote:
Originally Posted by jjshabado
Interesting post!
It is, although it's hard to see how anyone moderately competent would ever write the test regex:

Code:
(x+x+)+y
instead of the equivalent:

Code:
(x+)+y
maybe there's a less contrived example that you might construct in the wild, but in years of using regexes i've never encountered this.

also note modern JS seems to handle it fine:

https://regex101.com/r/IasbGO/2
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 11:00 AM
Sometimes you code for the incompetent person (or intern) that comes after you. ;P

I honestly don't feel strongly about regexes except to the point that they should virtually never be an interview question.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 11:14 AM
Quote:
Originally Posted by jjshabado
Sometimes you code for the incompetent person (or intern) that comes after you. ;P
this general philosophy, taken to the extreme (and i'd consider avoiding all regexes a small example of taking this to the extreme) is a virtual guarantee of a poor code base.

the rich hickey distinction between "easy" and "simple" is relevant here. proper use of high-level constructs will make things simpler. but if people aren't familiar with those constructs they won't be "easy."

the trivial example here is using "map" instead of a "for" loop. some percent of junior programmers who would have been able to read the "for" loop will be confused by "map"

but so what? you can't run a software company by dumbing everything down with the worst programmers in mind. hiring such people has already doomed you.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 11:21 AM
Quote:
Originally Posted by gaming_mouse
this general philosophy, taken to the extreme (and i'd consider avoiding all regexes a small example of taking this to the extreme) is a virtual guarantee of a poor code base.
Any philosophy taken to the extreme is a virtual guarantee of poor outcomes.

I think we disagree on: "i'd consider avoiding all regexes a small example of taking this to the extreme".

Also, I don't think anyone actually advocated for that. There are clear cases where regexes are the tool for the job. Simple string manipulation isn't a clear case, imo.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 12:00 PM
I think Regex by it's nature complicates code most of the time. Understand if people feel I'm wrong here maybe my brain is just not wired up properly to read them fluently.

In C# for the simple char removal you can do:

Code:
   myString.Where(c => !chars.Contains(charArray));
Which imo is a nice middle ground between writing your own loop and regex
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 01:28 PM
Quote:
Originally Posted by Gullanian
my brain is just not wired up properly to read them fluently.
knowing what i know about you, i think 30-60 minutes of effort + using them every day for a few days would make them (at least, simple ones like the exercism snippet) as readable to you as the C# snippet.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 01:31 PM
Quote:
Originally Posted by jjshabado
Any philosophy taken to the extreme is a virtual guarantee of poor outcomes.

I think we disagree on: "i'd consider avoiding all regexes a small example of taking this to the extreme".

Also, I don't think anyone actually advocated for that. There are clear cases where regexes are the tool for the job. Simple string manipulation isn't a clear case, imo.
i think we're on the same page except for exactly where the line is drawn.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 02:01 PM
I feel like we should argue longer.

Spoiler:
I use vim bindings in most of my IDEs, but I generally use the IDE specific commands for things like search and replace.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 02:25 PM
Quote:
Originally Posted by gaming_mouse
Code:
(x+x+)+y
instead of the equivalent:

Code:
(x+)+y
Those aren't equivalent. For example, this string does not match the first one, but does match the 2nd.

xy

The first regexp requires at least 2 'x's and the 2nd only 1.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 02:51 PM
what's the point of x+x+ vs just xx+?

ETA: actually what's the point of (x+)+ vs just x+?? (I clearly don't ever regex)
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 03:00 PM
Test string:
xxxxxxxxxxx

On:
https://regex101.com/

Greedy matches:
x+ = 1 match, 3 steps
x+x+ = 1 match, 4 steps
xx+ = 1 match, 3 steps
(x+)+ = 1 match, 7 steps
(x+x+)+ = 1 match, 7 steps

Nongreedy:
x+ = 11 matches, 23 steps
x+x+ = 5 matches, 15 steps
xx+ = 5 matches, 15 steps
(x+)+ = 11 matches, 46 steps
(x+x+)+ = 5 matches, 25 steps

On web services/sites, if you use nongreedy matching I think you need to be very careful what you do as it might be a possible attack vector for a ddos, by designing some inputs that take hundreds of ms to parse.

Last edited by Gullanian; 03-24-2017 at 03:09 PM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 03:14 PM
My first language was Perl. regex
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 03:52 PM
In this Ruby code...

if dna.chars.any? {|char| char =~ /[^CGAT]/}

unless dna.chars.all? {|char| char =~ /[CGAT]/}

are the / coming from ruby or regex?

i don't see it referenced here http://www.freeformatter.com/regex-tester.html
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 05:27 PM
Quote:
Originally Posted by RustyBrooks
Those aren't equivalent. For example, this string does not match the first one, but does match the 2nd.

xy

The first regexp requires at least 2 'x's and the 2nd only 1.
right, good catch. still, i think my point stands as wouldn't you rewrite it like:

Code:
x{2,}y
The way it's written in the example seems purposefully unsemantic.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 05:30 PM
Quote:
Originally Posted by OmgGlutten!
In this Ruby code...

if dna.chars.any? {|char| char =~ /[^CGAT]/}

unless dna.chars.all? {|char| char =~ /[CGAT]/}

are the / coming from ruby or regex?

i don't see it referenced here http://www.freeformatter.com/regex-tester.html
Just like you define a literal string by enclosing it with quotes like "some string", you define a literal regex by enclosing it with forward slashes.

What was the context of that code? At at a glance it seems odd. Normally you'd use chars.any? OR a regex, but not both together.

EDIT: eg, if you want to check if the dna string contains any invalid chars just do:

Code:
dna =~ /[^CGAT]/
if all chars are valid it will return nil, which evaluates to false in ruby. if there is any invalid char it will return its integer position within the string, which will evaluate to true, even if its position is 0.

Last edited by gaming_mouse; 03-24-2017 at 05:38 PM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 05:39 PM
Quote:
Originally Posted by jjshabado
I feel like we should argue longer.

Spoiler:
I use vim bindings in most of my IDEs, but I generally use the IDE specific commands for things like search and replace.
I was aghast to see 43% of people in the latest SO developer survey prefer tabs to spaces. Those people are clearly idiots.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 05:42 PM
Quote:
Originally Posted by goofyballer
I was aghast to see 43% of people in the latest SO developer survey prefer tabs to spaces. Those people are clearly idiots.
so, i use spaces. but i also have this nagging feeling that the tabs people are logically correct. a tab is a logical unit of indentation. the visual display of that unit should be decoupled from the logical unit (you might like wide, while i prefer narrow). is there any good rejoinder to that argument?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 05:44 PM
Quote:
Originally Posted by jjshabado
I use vim bindings in most of my IDEs, but I generally use the IDE specific commands for things like search and replace.
we definitely need to argue longer
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 09:29 PM
Quote:
Originally Posted by goofyballer
I was aghast to see 43% of people in the latest SO developer survey prefer tabs to spaces. Those people are clearly idiots.
Anyone who cut their teeth editing makefiles (in vi, of course) will always prefer tabs, as makefiles required tabs.

Vi and tabs are superior.

QED.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-24-2017 , 10:43 PM
Quote:
Originally Posted by gaming_mouse
so, i use spaces. but i also have this nagging feeling that the tabs people are logically correct. a tab is a logical unit of indentation. the visual display of that unit should be decoupled from the logical unit (you might like wide, while i prefer narrow). is there any good rejoinder to that argument?


Hah. I agree completely with this.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-25-2017 , 08:40 AM
Quote:
Originally Posted by gaming_mouse
so, i use spaces. but i also have this nagging feeling that the tabs people are logically correct. a tab is a logical unit of indentation. the visual display of that unit should be decoupled from the logical unit (you might like wide, while i prefer narrow). is there any good rejoinder to that argument?
Amazing. This. I use spaces as well (probably cuz python?), but the best arguments I see for tabs >> best arguments I see for spaces. And there's the philosophical points you touch on. It basically never comes up with modern editors, but I've always thought tabs vs spaces was an interesting discussion nonetheless.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
03-25-2017 , 12:32 PM
I think g_m's argument is solid in theory but encounters trouble in practice...
- if you need indentation of less than a full unit (i.e. maybe your tabs are set to 4 but when long lines spill over you try to match something w/ the line above or only indent 2, then...oops (say your tabs are set to 2 but you're trying to line something up with the line above 4 spaces in - if you do that indent w/ 2 tabs instead of 4 spaces, you run into the below problem immediately)
- this is more a problem with people who won't stick to tabs if that's what's agreed upon, but I've never seen a file with mixed tabs and spaces where the tabs didn't wind up making **** look really wrong as soon as you open it on a different tab setting than the person who wrote the tabs
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote

      
m