The Inverse Of [^\/:] | Regular Expression Improvement

April 21, 2024 Post a Comment

This character set [^\/:] // all characters except / or : is weak per jslint b.c. I should be specifying the characters that can be used not he characters that can not be used per

Solution 1:

According to http://en.wikipedia.org/wiki/Domain_name#Internationalized_domain_names

the character set allowed in the Domain Name System is based on ASCII

and as per http://www.netregister.biz/faqit.htm#1

to name your domain you can use any letter, numbers between 0 and 9, and the symbol "-" [as long as the first character is not "-"]

and considering that your domain must end with .something, you are looking for

([a-zA-Z0-9][a-zA-Z0-9-]*\.)+[a-zA-Z0-9][a-zA-Z0-9-]*

Solution 2:

"I should be specifying the characters that can be used not he characters that can not be use"

No, that's nonsense, just JSLint being JSLint.

When you see [^\/:] in a regex it's immediately obvious what it is doing. If you tried to list all possible allowed characters the resulting regex would be horrendously difficult to read and it would be easy to accidentally forget to include some characters.

If you have a specific set of allowed characters then fine, list them. That's easier and more reliable than trying to list all possible invalid characters.

But if you have a specific set of invalid characters the [^] syntax is the appropriate way to do it.

Solution 3:

Here`s a regex for characters you can have:

mycharactersarecool[^shouldnothavethesechars](oneoftwooptions|anotheroption)

Is this what you're talking about ?

Solution 4:

This is a great question for Google, you know... but just to wet your beak: Matthew O'Riordan has written such regular expression that mathces link with or without protocol.

Here's link to his blog post

But for future reference let me provide the regular expression from the post here as well:

/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)/

And as nicely broken down by blog writer Matthew himself:

(
 ( # brackets covering match for protocol (optional) and domain
  ([A-Za-z]{3,9}:(?:\/\/)?)# match protocol, allow in format http:// or mailto:
  (?:[\-;:&=\+\$,\w]+@)?   # allow something@ for email addresses
  [A-Za-z0-9\.\-]+   # anything looking at all like a domain, non-unicode domains| # or instead of above
  (?:www\.|[\-;:&=\+\$,\w]+@) # starting with something@ or www.
  [A-Za-z0-9\.\-]+   # anything looking at all like a domain
 )
 ( # brackets covering match for path, query string and anchor
  (?:\/[\+~%\/\.\w\-]*)  # allow optional /path?\??(?:[\-\+=&;%@\.\w]*)  # allow optional query string starting with ? #?(?:[\.\!\/\\\w]*) # allow optional anchor #anchor 
 )? # make URL suffix optional
)

What about your particular example

But in your case of mathing URL domains the negative of [^\/:] could simply be:

[-0-9a-zA-Z_.]

And that should match everything after // and before first /. But what happens when your URLs don't end with a slash? what will you do in that case?

Upper regular expression (simplification) only matches one character just like your negative character set does. So this just replaces your negative set in the complete reg ex you're using.

JavaScript College