+2007-12-16 Darin Adler <darin@apple.com>
+
+ Reviewed by Maciej.
+
+ - http://bugs.webkit.org/show_bug.cgi?id=16438
+ - removed some more unused code
+ - changed quite a few more names to WebKit-style
+ - moved more things out of pcre_internal.h
+ - changed some indentation to WebKit-style
+ - improved design of the functions for reading and writing
+ 2-byte values from the opcode stream (in pcre_internal.h)
+
+ * pcre/dftables.cpp:
+ (main): Added the kjs prefix a normal way in lieu of using macros.
+
+ * pcre/pcre_compile.cpp: Moved some definitions here from pcre_internal.h.
+ (errorText): Name changes, fewer typedefs.
+ (checkEscape): Ditto. Changed uppercase conversion to use toASCIIUpper.
+ (isCountedRepeat): Name change.
+ (readRepeatCounts): Name change.
+ (firstSignificantOpcode): Got rid of the use of OP_lengths, which is
+ very lightly used here. Hard-coded the length of OP_BRANUMBER.
+ (firstSignificantOpcodeSkippingAssertions): Ditto. Also changed to
+ use the advanceToEndOfBracket function.
+ (getOthercaseRange): Name changes.
+ (encodeUTF8): Ditto.
+ (compileBranch): Name changes. Removed unused after_manual_callout and
+ the code to handle it. Removed code to handle OP_ONCE since we never
+ emit this opcode. Changed to use advanceToEndOfBracket in more places.
+ (compileBracket): Name changes.
+ (branchIsAnchored): Removed code to handle OP_ONCE since we never emit
+ this opcode.
+ (bracketIsAnchored): Name changes.
+ (branchNeedsLineStart): More fo the same.
+ (bracketNeedsLineStart): Ditto.
+ (branchFindFirstAssertedCharacter): Removed OP_ONCE code.
+ (bracketFindFirstAssertedCharacter): More of the same.
+ (calculateCompiledPatternLengthAndFlags): Ditto.
+ (returnError): Name changes.
+ (jsRegExpCompile): Ditto.
+
+ * pcre/pcre_exec.cpp: Moved some definitions here from pcre_internal.h.
+ (matchRef): Updated names.
+ Improved macros to use the do { } while(0) idiom so they expand to single
+ statements rather than to blocks or multiple statements. And refeactored
+ the recursive match macros.
+ (MatchStack::pushNewFrame): Name changes.
+ (getUTF8CharAndIncrementLength): Name changes.
+ (match): Name changes. Removed the ONCE opcode.
+ (jsRegExpExecute): Name changes.
+
+ * pcre/pcre_internal.h: Removed quite a few unneeded includes. Rewrote
+ quite a few comments. Removed the macros that add kjs prefixes to the
+ functions with external linkage; instead renamed the functions. Removed
+ the unneeded typedefs pcre_uint16, pcre_uint32, and uschar. Removed the
+ dead and not-all-working code for LINK_SIZE values other than 2, although
+ we aim to keep the abstraction working. Removed the OP_LENGTHS macro.
+ (put2ByteValue): Replaces put2ByteOpcodeValueAtOffset.
+ (get2ByteValue): Replaces get2ByteOpcodeValueAtOffset.
+ (put2ByteValueAndAdvance): Replaces put2ByteOpcodeValueAtOffsetAndAdvance.
+ (putLinkValueAllowZero): Replaces putOpcodeValueAtOffset; doesn't do the
+ addition, since a comma is really no better than a plus sign. Added an
+ assertion to catch out of range values and changed the parameter type to
+ int rather than unsigned.
+ (getLinkValueAllowZero): Replaces getOpcodeValueAtOffset.
+ (putLinkValue): New function that most former callers of the
+ putOpcodeValueAtOffset function can use; asserts the value that is
+ being stored is non-zero and then calls putLinkValueAllowZero.
+ (getLinkValue): Ditto.
+ (putLinkValueAndAdvance): Replaces putOpcodeValueAtOffsetAndAdvance. No
+ caller was using an offset, which makes sense given the advancing behavior.
+ (putLinkValueAllowZeroAndAdvance): Ditto.
+ (isBracketOpcode): Added. For use in an assertion.
+ (advanceToEndOfBracket): Renamed from moveOpcodePtrPastAnyAlternateBranches,
+ and removed comments about how it's not well designed. This function takes
+ a pointer to the beginning of a bracket and advances to the end of the
+ bracket.
+
+ * pcre/pcre_tables.cpp: Updated names.
+ * pcre/pcre_ucp_searchfuncs.cpp:
+ (kjs_pcre_ucp_othercase): Ditto.
+ * pcre/pcre_xclass.cpp:
+ (getUTF8CharAndAdvancePointer): Ditto.
+ (kjs_pcre_xclass): Ditto.
+ * pcre/ucpinternal.h: Ditto.
+
+ * wtf/ASCIICType.h:
+ (WTF::isASCIIAlpha): Added an int overload, like the one we already have for
+ isASCIIDigit.
+ (WTF::isASCIIAlphanumeric): Ditto.
+ (WTF::isASCIIHexDigit): Ditto.
+ (WTF::isASCIILower): Ditto.
+ (WTF::isASCIISpace): Ditto.
+ (WTF::toASCIILower): Ditto.
+ (WTF::toASCIIUpper): Ditto.
+
2007-12-16 Darin Adler <darin@apple.com>
Reviewed by Maciej.
Reviewed by Maciej.
- Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings.
+ Centralize code for subjectPtr adjustments using inlines, only ever check for a single
+ trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char
+ classes and garbled UTF16 strings.
* pcre/pcre_exec.cpp:
(match):
"This file contains the default tables for characters with codes less than\n"
"128 (ASCII characters). These tables are used when no external tables are\n"
"passed to PCRE. */\n\n"
- "const unsigned char _pcre_default_tables[%d] = {\n\n"
+ "const unsigned char kjs_pcre_default_tables[%d] = {\n\n"
"/* This table is a lower casing table. */\n\n", tables_length);
if (lcc_offset != 0)
using namespace WTF;
+/* Negative values for the firstchar and reqchar variables */
+
+#define REQ_UNSET (-2)
+#define REQ_NONE (-1)
+
/*************************************************
* Code parameters and static tables *
*************************************************/
ERR10, ERR11, ERR12, ERR13, ERR14, ERR15, ERR16, ERR17
};
-/* Table of sizes for the fixed-length opcodes. It's defined in a macro so that
-the definition is next to the definition of the opcodes in pcre_internal.h. */
-
-static const uschar OP_lengths[] = { OP_LENGTHS };
-
/* The texts of compile-time error messages. These are "char *" because they
are passed to the outside world. */
-static const char* error_text(ErrorCode code)
+static const char* errorText(ErrorCode code)
{
- static const char error_texts[] =
+ static const char errorTexts[] =
/* 1 */
"\\ at end of pattern\0"
"\\c at end of pattern\0"
;
int i = code;
- const char* text = error_texts;
+ const char* text = errorTexts;
while (i > 1)
i -= !*text++;
return text;
req_varyopt = 0;
needOuterBracket = false;
}
- const uschar* start_code; /* The start of the compiled code */
+ const unsigned char* start_code; /* The start of the compiled code */
const UChar* start_pattern; /* The start of the pattern */
int top_backref; /* Maximum back reference */
unsigned backrefMap; /* Bitmap of low back refs */
/* Definitions to allow mutual recursion */
-static bool compileBracket(int, int*, uschar**, const UChar**, const UChar*, ErrorCode*, int, int*, int*, CompileData&);
-static bool bracketIsAnchored(const uschar* code);
-static bool bracketNeedsLineStart(const uschar* code, unsigned captureMap, unsigned backrefMap);
-static int bracketFindFirstAssertedCharacter(const uschar* code, bool inassert);
+static bool compileBracket(int, int*, unsigned char**, const UChar**, const UChar*, ErrorCode*, int, int*, int*, CompileData&);
+static bool bracketIsAnchored(const unsigned char* code);
+static bool bracketNeedsLineStart(const unsigned char* code, unsigned captureMap, unsigned backrefMap);
+static int bracketFindFirstAssertedCharacter(const unsigned char* code, bool inassert);
/*************************************************
* Handle escapes *
on error, errorptr is set
*/
-static int check_escape(const UChar** ptrptr, const UChar* patternEnd, ErrorCode* errorcodeptr, int bracount, bool isclass)
+static int checkEscape(const UChar** ptrptr, const UChar* patternEnd, ErrorCode* errorcodeptr, int bracount, bool isclass)
{
const UChar* ptr = *ptrptr + 1;
} else {
switch (c) {
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- /* Escape sequences starting with a non-zero digit are backreferences,
- unless there are insufficient brackets, in which case they are octal
- escape sequences. Those sequences end on the first non-octal character
- or when we overflow 0-255, whichever comes first. */
-
- if (!isclass) {
- const UChar* oldptr = ptr;
- c -= '0';
- while ((ptr + 1 < patternEnd) && isASCIIDigit(ptr[1]) && c <= bracount)
- c = c * 10 + *(++ptr) - '0';
- if (c <= bracount) {
- c = -(ESC_REF + c);
- break;
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ case '8':
+ case '9':
+ /* Escape sequences starting with a non-zero digit are backreferences,
+ unless there are insufficient brackets, in which case they are octal
+ escape sequences. Those sequences end on the first non-octal character
+ or when we overflow 0-255, whichever comes first. */
+
+ if (!isclass) {
+ const UChar* oldptr = ptr;
+ c -= '0';
+ while ((ptr + 1 < patternEnd) && isASCIIDigit(ptr[1]) && c <= bracount)
+ c = c * 10 + *(++ptr) - '0';
+ if (c <= bracount) {
+ c = -(ESC_REF + c);
+ break;
+ }
+ ptr = oldptr; /* Put the pointer back and fall through */
}
- ptr = oldptr; /* Put the pointer back and fall through */
- }
-
- /* Handle an octal number following \. If the first digit is 8 or 9,
- this is not octal. */
-
- if ((c = *ptr) >= '8')
- break;
-
+
+ /* Handle an octal number following \. If the first digit is 8 or 9,
+ this is not octal. */
+
+ if ((c = *ptr) >= '8')
+ break;
+
/* \0 always starts an octal number, but we may drop through to here with a
larger first octal digit. */
-
- case '0': {
- c -= '0';
- int i;
- for (i = 1; i <= 2; ++i) {
- if (ptr + i >= patternEnd || ptr[i] < '0' || ptr[i] > '7')
- break;
- int cc = c * 8 + ptr[i] - '0';
- if (cc > 255)
- break;
- c = cc;
+
+ case '0': {
+ c -= '0';
+ int i;
+ for (i = 1; i <= 2; ++i) {
+ if (ptr + i >= patternEnd || ptr[i] < '0' || ptr[i] > '7')
+ break;
+ int cc = c * 8 + ptr[i] - '0';
+ if (cc > 255)
+ break;
+ c = cc;
+ }
+ ptr += i - 1;
+ break;
}
- ptr += i - 1;
- break;
- }
- case 'x': {
- c = 0;
- int i;
- for (i = 1; i <= 2; ++i) {
- if (ptr + i >= patternEnd || !isASCIIHexDigit(ptr[i])) {
- c = 'x';
- i = 1;
- break;
+
+ case 'x': {
+ c = 0;
+ int i;
+ for (i = 1; i <= 2; ++i) {
+ if (ptr + i >= patternEnd || !isASCIIHexDigit(ptr[i])) {
+ c = 'x';
+ i = 1;
+ break;
+ }
+ int cc = ptr[i];
+ if (cc >= 'a')
+ cc -= 32; /* Convert to upper case */
+ c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10));
}
- int cc = ptr[i];
- if (cc >= 'a')
- cc -= 32; /* Convert to upper case */
- c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10));
+ ptr += i - 1;
+ break;
}
- ptr += i - 1;
- break;
- }
- case 'u': {
- c = 0;
- int i;
- for (i = 1; i <= 4; ++i) {
- if (ptr + i >= patternEnd || !isASCIIHexDigit(ptr[i])) {
- c = 'u';
- i = 1;
- break;
+
+ case 'u': {
+ c = 0;
+ int i;
+ for (i = 1; i <= 4; ++i) {
+ if (ptr + i >= patternEnd || !isASCIIHexDigit(ptr[i])) {
+ c = 'u';
+ i = 1;
+ break;
+ }
+ int cc = ptr[i];
+ if (cc >= 'a')
+ cc -= 32; /* Convert to upper case */
+ c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10));
}
- int cc = ptr[i];
- if (cc >= 'a')
- cc -= 32; /* Convert to upper case */
- c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10));
+ ptr += i - 1;
+ break;
}
- ptr += i - 1;
- break;
-
- /* Other special escapes not starting with a digit are straightforward */
- }
- case 'c':
- if (++ptr == patternEnd) {
- *errorcodeptr = ERR2;
- return 0;
+
+ case 'c':
+ if (++ptr == patternEnd) {
+ *errorcodeptr = ERR2;
+ return 0;
+ }
+ c = *ptr;
+
+ /* A letter is upper-cased; then the 0x40 bit is flipped. This coding
+ is ASCII-specific, but then the whole concept of \cx is ASCII-specific. */
+ c = toASCIIUpper(c) ^ 0x40;
+ break;
}
- c = *ptr;
-
- /* A letter is upper-cased; then the 0x40 bit is flipped. This coding
- is ASCII-specific, but then the whole concept of \cx is ASCII-specific. */
-
- if (c >= 'a' && c <= 'z')
- c -= 32;
- c ^= 0x40;
- break;
- }
}
*ptrptr = ptr;
return c;
}
-
-
/*************************************************
* Check for counted repeat *
*************************************************/
Returns: true or false
*/
-static bool is_counted_repeat(const UChar* p, const UChar* patternEnd)
+static bool isCountedRepeat(const UChar* p, const UChar* patternEnd)
{
if (p >= patternEnd || !isASCIIDigit(*p))
return false;
return (p < patternEnd && *p == '}');
}
-
/*************************************************
* Read repeat counts *
*************************************************/
/* Read an item of the form {n,m} and return the values. This is called only
-after is_counted_repeat() has confirmed that a repeat-count quantifier exists,
+after isCountedRepeat() has confirmed that a repeat-count quantifier exists,
so the syntax is guaranteed to be correct, but we need to check the values.
Arguments:
current ptr on error, with errorcodeptr set non-zero
*/
-static const UChar* read_repeat_counts(const UChar* p, int* minp, int* maxp, ErrorCode* errorcodeptr)
+static const UChar* readRepeatCounts(const UChar* p, int* minp, int* maxp, ErrorCode* errorcodeptr)
{
int min = 0;
int max = -1;
return p;
}
-
/*************************************************
* Find first significant op code *
*************************************************/
/* This is called by several functions that scan a compiled expression looking
for a fixed first character, or an anchoring op code etc. It skips over things
-that do not influence this. For some calls, a change of option is important.
-For some calls, it makes sense to skip negative forward and all backward
-assertions, and also the \b assertion; for others it does not.
+that do not influence this.
Arguments:
code pointer to the start of the group
- skipassert true if certain assertions are to be skipped
-
Returns: pointer to the first significant opcode
*/
-static const uschar* firstSignificantOpCode(const uschar* code)
+static const unsigned char* firstSignificantOpcode(const unsigned char* code)
{
while (*code == OP_BRANUMBER)
- code += OP_lengths[*code];
+ code += 3;
return code;
}
-static const uschar* firstSignificantOpCodeSkippingAssertions(const uschar* code)
+static const unsigned char* firstSignificantOpcodeSkippingAssertions(const unsigned char* code)
{
while (true) {
switch (*code) {
- case OP_ASSERT_NOT:
- do {
- code += getOpcodeValueAtOffset(code, 1);
- } while (*code == OP_ALT);
- code += OP_lengths[*code];
- break;
- case OP_WORD_BOUNDARY:
- case OP_NOT_WORD_BOUNDARY:
- case OP_BRANUMBER:
- code += OP_lengths[*code];
- break;
- default:
- return code;
- }
- }
- ASSERT_NOT_REACHED();
-}
-
-
-/*************************************************
-* Find the fixed length of a pattern *
-*************************************************/
-
-/* Scan a pattern and compute the fixed length of subject that will match it,
-if the length is fixed. This is needed for dealing with backward assertions.
-In UTF8 mode, the result is in characters rather than bytes.
-
-Arguments:
- code points to the start of the pattern (the bracket)
- options the compiling options
-
-Returns: the fixed length, or -1 if there is no fixed length,
- or -2 if \C was encountered
-*/
-
-static int find_fixedlength(uschar* code, int options)
-{
- int length = -1;
-
- int branchlength = 0;
- uschar* cc = code + 1 + LINK_SIZE;
-
- /* Scan along the opcodes for this branch. If we get to the end of the
- branch, check the length against that of the other branches. */
-
- while (true) {
- int d;
- int op = *cc;
- if (op >= OP_BRA)
- op = OP_BRA;
-
- switch (op) {
- case OP_BRA:
- case OP_ONCE:
- d = find_fixedlength(cc, options);
- if (d < 0)
- return d;
- branchlength += d;
- do {
- cc += getOpcodeValueAtOffset(cc, 1);
- } while (*cc == OP_ALT);
- cc += 1 + LINK_SIZE;
- break;
-
- /* Reached end of a branch; if it's a ket it is the end of a nested
- call. If it's ALT it is an alternation in a nested call. If it is
- END it's the end of the outer call. All can be handled by the same code. */
-
- case OP_ALT:
- case OP_KET:
- case OP_KETRMAX:
- case OP_KETRMIN:
- case OP_END:
- if (length < 0)
- length = branchlength;
- else if (length != branchlength)
- return -1;
- if (*cc != OP_ALT)
- return length;
- cc += 1 + LINK_SIZE;
- branchlength = 0;
- break;
-
- /* Skip over assertive subpatterns */
-
- case OP_ASSERT:
case OP_ASSERT_NOT:
- do {
- cc += getOpcodeValueAtOffset(cc, 1);
- } while (*cc == OP_ALT);
- /* Fall through */
-
- /* Skip over things that don't match chars */
-
- case OP_BRANUMBER:
- case OP_CIRC:
- case OP_DOLL:
- case OP_NOT_WORD_BOUNDARY:
- case OP_WORD_BOUNDARY:
- cc += OP_lengths[*cc];
- break;
-
- /* Handle literal characters */
-
- case OP_CHAR:
- case OP_CHAR_IGNORING_CASE:
- case OP_NOT:
- branchlength++;
- cc += 2;
- while ((*cc & 0xc0) == 0x80)
- cc++;
- break;
-
- case OP_ASCII_CHAR:
- case OP_ASCII_LETTER_IGNORING_CASE:
- branchlength++;
- cc += 2;
- break;
-
- /* Handle exact repetitions. The count is already in characters, but we
- need to skip over a multibyte character in UTF8 mode. */
-
- case OP_EXACT:
- branchlength += get2ByteOpcodeValueAtOffset(cc,1);
- cc += 4;
- while((*cc & 0x80) == 0x80)
- cc++;
+ advanceToEndOfBracket(code);
+ code += 1 + LINK_SIZE;
break;
-
- case OP_TYPEEXACT:
- branchlength += get2ByteOpcodeValueAtOffset(cc,1);
- cc += 4;
- break;
-
- /* Handle single-char matchers */
-
- case OP_NOT_DIGIT:
- case OP_DIGIT:
- case OP_NOT_WHITESPACE:
- case OP_WHITESPACE:
- case OP_NOT_WORDCHAR:
- case OP_WORDCHAR:
- case OP_NOT_NEWLINE:
- branchlength++;
- cc++;
+ case OP_WORD_BOUNDARY:
+ case OP_NOT_WORD_BOUNDARY:
+ ++code;
break;
-
- /* Check a class for variable quantification */
-
- case OP_XCLASS:
- cc += getOpcodeValueAtOffset(cc, 1) - 33;
- /* Fall through */
-
- case OP_CLASS:
- case OP_NCLASS:
- cc += 33;
-
- switch (*cc) {
- case OP_CRSTAR:
- case OP_CRMINSTAR:
- case OP_CRQUERY:
- case OP_CRMINQUERY:
- return -1;
-
- case OP_CRRANGE:
- case OP_CRMINRANGE:
- if (get2ByteOpcodeValueAtOffset(cc, 1) != get2ByteOpcodeValueAtOffset(cc, 3))
- return -1;
- branchlength += get2ByteOpcodeValueAtOffset(cc, 1);
- cc += 5;
- break;
-
- default:
- branchlength++;
- }
+ case OP_BRANUMBER:
+ code += 3;
break;
-
- /* Anything else is variable length */
-
default:
- return -1;
+ return code;
}
}
- ASSERT_NOT_REACHED();
-}
-
-
-/*************************************************
-* Complete a callout item *
-*************************************************/
-
-/* A callout item contains the length of the next item in the pattern, which
-we can't fill in till after we have reached the relevant point. This is used
-for both automatic and manual callouts.
-
-Arguments:
- previous_callout points to previous callout item
- ptr current pattern pointer
- cd pointers to tables etc
-*/
-
-static void complete_callout(uschar* previous_callout, const UChar* ptr, const CompileData& cd)
-{
- int length = ptr - cd.start_pattern - getOpcodeValueAtOffset(previous_callout, 2);
- putOpcodeValueAtOffset(previous_callout, 2 + LINK_SIZE, length);
}
-
-
/*************************************************
* Get othercase range *
*************************************************/
Yield: true when range returned; false when no more
*/
-static bool get_othercase_range(int* cptr, int d, int* ocptr, int* odptr)
+static bool getOthercaseRange(int* cptr, int d, int* ocptr, int* odptr)
{
int c, othercase = 0;
for (c = *cptr; c <= d; c++) {
- if ((othercase = _pcre_ucp_othercase(c)) >= 0)
+ if ((othercase = kjs_pcre_ucp_othercase(c)) >= 0)
break;
}
int next = othercase + 1;
for (++c; c <= d; c++) {
- if (_pcre_ucp_othercase(c) != next)
+ if (kjs_pcre_ucp_othercase(c) != next)
break;
next++;
}
Returns: number of characters placed in the buffer
*/
-// FIXME: This should be removed as soon as all UTF8 uses are removed from PCRE
-int _pcre_ord2utf8(int cvalue, uschar *buffer)
+static int encodeUTF8(int cvalue, unsigned char *buffer)
{
int i;
- for (i = 0; i < _pcre_utf8_table1_size; i++)
- if (cvalue <= _pcre_utf8_table1[i])
+ for (i = 0; i < kjs_pcre_utf8_table1_size; i++)
+ if (cvalue <= kjs_pcre_utf8_table1[i])
break;
buffer += i;
for (int j = i; j > 0; j--) {
*buffer-- = 0x80 | (cvalue & 0x3f);
cvalue >>= 6;
}
- *buffer = _pcre_utf8_table2[i] | cvalue;
+ *buffer = kjs_pcre_utf8_table2[i] | cvalue;
return i + 1;
}
}
static bool
-compileBranch(int options, int* brackets, uschar** codeptr,
+compileBranch(int options, int* brackets, unsigned char** codeptr,
const UChar** ptrptr, const UChar* patternEnd, ErrorCode* errorcodeptr, int *firstbyteptr,
int* reqbyteptr, CompileData& cd)
{
int repeat_min = 0, repeat_max = 0; /* To please picky compilers */
int bravalue = 0;
int reqvary, tempreqvary;
- int after_manual_callout = 0;
int c;
- uschar* code = *codeptr;
- uschar* tempcode;
+ unsigned char* code = *codeptr;
+ unsigned char* tempcode;
bool groupsetfirstbyte = false;
const UChar* ptr = *ptrptr;
const UChar* tempptr;
- uschar* previous = NULL;
- uschar* previous_callout = NULL;
- uschar classbits[32];
+ unsigned char* previous = NULL;
+ unsigned char classbits[32];
bool class_utf8;
- uschar* class_utf8data;
- uschar utf8_char[6];
+ unsigned char* class_utf8data;
+ unsigned char utf8_char[6];
/* Initialize no first byte, no required byte. REQ_UNSET means "no char
matching encountered yet". It gets changed to REQ_NONE if we hit something that
int subreqbyte;
int subfirstbyte;
int mclength;
- uschar mcbuffer[8];
+ unsigned char mcbuffer[8];
/* Next byte in the pattern */
/* Fill in length of a previous callout, except when the next thing is
a quantifier. */
- bool is_quantifier = c == '*' || c == '+' || c == '?' || (c == '{' && is_counted_repeat(ptr + 1, patternEnd));
-
- if (!is_quantifier && previous_callout && after_manual_callout-- <= 0) {
- complete_callout(previous_callout, ptr, cd);
- previous_callout = NULL;
- }
+ bool is_quantifier = c == '*' || c == '+' || c == '?' || (c == '{' && isCountedRepeat(ptr + 1, patternEnd));
switch (c) {
/* The branch terminates at end of string, |, or ). */
character (< 256), because in that case the compiled code doesn't use the
bit map. */
- memset(classbits, 0, 32 * sizeof(uschar));
+ memset(classbits, 0, 32 * sizeof(unsigned char));
/* Process characters until ] is reached. The first pass
through the regex checked the overall syntax, so we don't need to be very
character in them, so set class_charcount bigger than one. */
if (c == '\\') {
- c = check_escape(&ptr, patternEnd, errorcodeptr, *brackets, true);
+ c = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, true);
if (c < 0) {
class_charcount += 2; /* Greater than 1 is what matters */
switch (-c) {
if (d == '\\') {
const UChar* oldptr = ptr;
- d = check_escape(&ptr, patternEnd, errorcodeptr, *brackets, true);
+ d = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, true);
/* \X is literal X; any other special means the '-' was literal */
if (d < 0) {
int occ, ocd;
int cc = c;
int origd = d;
- while (get_othercase_range(&cc, origd, &occ, &ocd)) {
+ while (getOthercaseRange(&cc, origd, &occ, &ocd)) {
if (occ >= c && ocd <= d)
continue; /* Skip embedded ranges */
*class_utf8data++ = XCL_SINGLE;
else {
*class_utf8data++ = XCL_RANGE;
- class_utf8data += _pcre_ord2utf8(occ, class_utf8data);
+ class_utf8data += encodeUTF8(occ, class_utf8data);
}
- class_utf8data += _pcre_ord2utf8(ocd, class_utf8data);
+ class_utf8data += encodeUTF8(ocd, class_utf8data);
}
}
overlapping ranges. */
*class_utf8data++ = XCL_RANGE;
- class_utf8data += _pcre_ord2utf8(c, class_utf8data);
- class_utf8data += _pcre_ord2utf8(d, class_utf8data);
+ class_utf8data += encodeUTF8(c, class_utf8data);
+ class_utf8data += encodeUTF8(d, class_utf8data);
/* With UCP support, we are done. Without UCP support, there is no
caseless matching for UTF-8 characters > 127; we can use the bit map
if ((c > 255 || ((options & IgnoreCaseOption) && c > 127))) {
class_utf8 = true;
*class_utf8data++ = XCL_SINGLE;
- class_utf8data += _pcre_ord2utf8(c, class_utf8data);
+ class_utf8data += encodeUTF8(c, class_utf8data);
if (options & IgnoreCaseOption) {
int othercase;
- if ((othercase = _pcre_ucp_othercase(c)) >= 0) {
+ if ((othercase = kjs_pcre_ucp_othercase(c)) >= 0) {
*class_utf8data++ = XCL_SINGLE;
- class_utf8data += _pcre_ord2utf8(othercase, class_utf8data);
+ class_utf8data += encodeUTF8(othercase, class_utf8data);
}
}
} else {
/* Now fill in the complete length of the item */
- putOpcodeValueAtOffset(previous, 1, code - previous);
+ putLinkValue(previous + 1, code - previous);
break; /* End of class handling */
}
case '{':
if (!is_quantifier)
goto NORMAL_CHAR;
- ptr = read_repeat_counts(ptr+1, &repeat_min, &repeat_max, errorcodeptr);
+ ptr = readRepeatCounts(ptr + 1, &repeat_min, &repeat_max, errorcodeptr);
if (*errorcodeptr)
goto FAILED;
goto REPEAT;
/* Save start of previous item, in case we have to move it up to make space
for an inserted OP_ONCE for the additional '+' extension. */
+ /* FIXME: Probably don't need this because we don't use OP_ONCE. */
tempcode = previous;
length rather than a small character. */
if (code[-1] & 0x80) {
- uschar *lastchar = code - 1;
+ unsigned char *lastchar = code - 1;
while((*lastchar & 0xc0) == 0x80)
lastchar--;
c = code - lastchar; /* Length of UTF-8 character */
int prop_type = -1;
int prop_value = -1;
- uschar* oldcode = code;
+ unsigned char* oldcode = code;
code = previous; /* Usually overwrite previous item */
/* If the maximum is zero then the minimum must also be zero; Perl allows
*code++ = OP_QUERY + repeat_type;
else {
*code++ = OP_UPTO + repeat_type;
- put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_max);
+ put2ByteValueAndAdvance(code, repeat_max);
}
}
if (repeat_max == 1)
goto END_REPEAT;
*code++ = OP_UPTO + repeat_type;
- put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_max - 1);
+ put2ByteValueAndAdvance(code, repeat_max - 1);
}
}
else {
*code++ = OP_EXACT + op_type; /* NB EXACT doesn't have repeat_type */
- put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_min);
+ put2ByteValueAndAdvance(code, repeat_min);
/* If the maximum is unlimited, insert an OP_STAR. Before doing so,
we have to insert the character for the previous code. For a repeated
}
repeat_max -= repeat_min;
*code++ = OP_UPTO + repeat_type;
- put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_max);
+ put2ByteValueAndAdvance(code, repeat_max);
}
}
*code++ = OP_CRQUERY + repeat_type;
else {
*code++ = OP_CRRANGE + repeat_type;
- put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_min);
+ put2ByteValueAndAdvance(code, repeat_min);
if (repeat_max == -1)
repeat_max = 0; /* 2-byte encoding for max */
- put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_max);
+ put2ByteValueAndAdvance(code, repeat_max);
}
}
/* If previous was a bracket group, we may have to replicate it in certain
cases. */
- else if (*previous >= OP_BRA || *previous == OP_ONCE) {
+ else if (*previous >= OP_BRA) {
int ketoffset = 0;
int len = code - previous;
- uschar* bralink = NULL;
+ unsigned char* bralink = NULL;
/* If the maximum repeat count is unlimited, find the end of the bracket
by scanning through from the start, and compute the offset back to it
pointer. */
if (repeat_max == -1) {
- uschar* ket = previous;
- do {
- ket += getOpcodeValueAtOffset(ket, 1);
- } while (*ket != OP_KET);
+ const unsigned char* ket = previous;
+ advanceToEndOfBracket(ket);
ketoffset = code - ket;
}
int offset = (!bralink) ? 0 : previous - bralink;
bralink = previous;
- putOpcodeValueAtOffsetAndAdvance(previous, 0, offset);
+ putLinkValueAllowZeroAndAdvance(previous, offset);
}
repeat_max--;
*code++ = OP_BRA;
int offset = (!bralink) ? 0 : code - bralink;
bralink = code;
- putOpcodeValueAtOffsetAndAdvance(code, 0, offset);
+ putLinkValueAllowZeroAndAdvance(code, offset);
}
memcpy(code, previous, len);
while (bralink) {
int offset = code - bralink + 1;
- uschar* bra = code - offset;
- int oldlinkoffset = getOpcodeValueAtOffset(bra, 1);
- bralink = oldlinkoffset ? bralink - oldlinkoffset : 0;
+ unsigned char* bra = code - offset;
+ int oldlinkoffset = getLinkValueAllowZero(bra + 1);
+ bralink = (!oldlinkoffset) ? 0 : bralink - oldlinkoffset;
*code++ = OP_KET;
- putOpcodeValueAtOffsetAndAdvance(code, 0, offset);
- putOpcodeValueAtOffset(bra, 1, offset);
+ putLinkValueAndAdvance(code, offset);
+ putLinkValue(bra + 1, offset);
}
}
if (*(++ptr) == '?') {
switch (*(++ptr)) {
- case ':': /* Non-extracting bracket */
- bravalue = OP_BRA;
- ptr++;
- break;
-
- case '=': /* Positive lookahead */
- bravalue = OP_ASSERT;
- ptr++;
- break;
-
- case '!': /* Negative lookahead */
- bravalue = OP_ASSERT_NOT;
- ptr++;
- break;
-
+ case ':': /* Non-extracting bracket */
+ bravalue = OP_BRA;
+ ptr++;
+ break;
+
+ case '=': /* Positive lookahead */
+ bravalue = OP_ASSERT;
+ ptr++;
+ break;
+
+ case '!': /* Negative lookahead */
+ bravalue = OP_ASSERT_NOT;
+ ptr++;
+ break;
+
/* Character after (? not specially recognized */
-
- default: /* Option setting */
- *errorcodeptr = ERR12;
- goto FAILED;
- }
+
+ default:
+ *errorcodeptr = ERR12;
+ goto FAILED;
+ }
}
/* Else we have a referencing group; adjust the opcode. If the bracket
if (++(*brackets) > EXTRACT_BASIC_MAX) {
bravalue = OP_BRA + EXTRACT_BASIC_MAX + 1;
code[1 + LINK_SIZE] = OP_BRANUMBER;
- put2ByteOpcodeValueAtOffset(code, 2+LINK_SIZE, *brackets);
+ put2ByteValue(code + 2 + LINK_SIZE, *brackets);
skipbytes = 3;
}
else
to pass its address because some compilers complain otherwise. Pass in a
new setting for the ims options if they have changed. */
- previous = (bravalue >= OP_ONCE) ? code : 0;
+ previous = (bravalue >= OP_BRAZERO) ? code : 0;
*code = bravalue;
tempcode = code;
tempreqvary = cd.req_varyopt; /* Save value before bracket */
zerofirstbyte = firstbyte;
groupsetfirstbyte = false;
- if (bravalue >= OP_BRA || bravalue == OP_ONCE) {
+ if (bravalue >= OP_BRA) {
/* If we have not yet set a firstbyte in this branch, take it from the
subpattern, remembering that it was set here so that a repeat of more
than one can replicate it as reqbyte if necessary. If the subpattern has
case '\\':
tempptr = ptr;
- c = check_escape(&ptr, patternEnd, errorcodeptr, *brackets, false);
+ c = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, false);
/* Handle metacharacters introduced by \. For ones like \d, the ESC_ values
are arranged to be the negation of the corresponding OP_values. For the
int number = -c - ESC_REF;
previous = code;
*code++ = OP_REF;
- put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, number);
+ put2ByteValueAndAdvance(code, number);
}
/* For the rest, we can obtain the OP value by negating the escape
*code++ = c;
}
} else {
- mclength = _pcre_ord2utf8(c, mcbuffer);
+ mclength = encodeUTF8(c, mcbuffer);
*code++ = (options & IgnoreCaseOption) ? OP_CHAR_IGNORING_CASE : OP_CHAR;
for (c = 0; c < mclength; c++)
return false;
}
-
-
-
/*************************************************
* Compile sequence of alternatives *
*************************************************/
*/
static bool
-compileBracket(int options, int* brackets, uschar** codeptr,
+compileBracket(int options, int* brackets, unsigned char** codeptr,
const UChar** ptrptr, const UChar* patternEnd, ErrorCode* errorcodeptr, int skipbytes,
int* firstbyteptr, int* reqbyteptr, CompileData& cd)
{
const UChar* ptr = *ptrptr;
- uschar* code = *codeptr;
- uschar* last_branch = code;
- uschar* start_bracket = code;
+ unsigned char* code = *codeptr;
+ unsigned char* last_branch = code;
+ unsigned char* start_bracket = code;
int firstbyte = REQ_UNSET;
int reqbyte = REQ_UNSET;
/* Offset is set zero to mark that this bracket is still open */
- putOpcodeValueAtOffset(code, 1, 0);
+ putLinkValueAllowZero(code + 1, 0);
code += 1 + LINK_SIZE + skipbytes;
/* Loop for each alternative branch */
if (ptr >= patternEnd || *ptr != '|') {
int length = code - last_branch;
do {
- int prev_length = getOpcodeValueAtOffset(last_branch, 1);
- putOpcodeValueAtOffset(last_branch, 1, length);
+ int prev_length = getLinkValueAllowZero(last_branch + 1);
+ putLinkValue(last_branch + 1, length);
length = prev_length;
last_branch -= length;
} while (length > 0);
/* Fill in the ket */
*code = OP_KET;
- putOpcodeValueAtOffset(code, 1, code - start_bracket);
+ putLinkValue(code + 1, code - start_bracket);
code += 1 + LINK_SIZE;
/* Set values to pass back */
zero offset until it is closed, making it possible to detect recursion. */
*code = OP_ALT;
- putOpcodeValueAtOffset(code, 1, code - last_branch);
+ putLinkValue(code + 1, code - last_branch);
last_branch = code;
code += 1 + LINK_SIZE;
ptr++;
ASSERT_NOT_REACHED();
}
-
/*************************************************
* Check for anchored expression *
*************************************************/
backrefMap the back reference bitmap
*/
-static bool branchIsAnchored(const uschar* code)
+static bool branchIsAnchored(const unsigned char* code)
{
- const uschar* scode = firstSignificantOpCode(code);
+ const unsigned char* scode = firstSignificantOpcode(code);
int op = *scode;
/* Brackets */
- if (op >= OP_BRA || op == OP_ASSERT || op == OP_ONCE)
+ if (op >= OP_BRA || op == OP_ASSERT)
return bracketIsAnchored(scode);
/* Check for explicit anchoring */
return op == OP_CIRC;
}
-static bool bracketIsAnchored(const uschar* code)
+static bool bracketIsAnchored(const unsigned char* code)
{
do {
if (!branchIsAnchored(code + 1 + LINK_SIZE))
return false;
- code += getOpcodeValueAtOffset(code, 1);
+ code += getLinkValue(code + 1);
} while (*code == OP_ALT); /* Loop for each alternative */
return true;
}
backrefMap the back reference bitmap
*/
-static bool branchNeedsLineStart(const uschar* code, unsigned captureMap, unsigned backrefMap)
+static bool branchNeedsLineStart(const unsigned char* code, unsigned captureMap, unsigned backrefMap)
{
- const uschar* scode = firstSignificantOpCode(code);
+ const unsigned char* scode = firstSignificantOpcode(code);
int op = *scode;
/* Capturing brackets */
if (op > OP_BRA) {
int captureNum = op - OP_BRA;
if (captureNum > EXTRACT_BASIC_MAX)
- captureNum = get2ByteOpcodeValueAtOffset(scode, 2 + LINK_SIZE);
+ captureNum = get2ByteValue(scode + 2 + LINK_SIZE);
int bracketMask = (captureNum < 32) ? (1 << captureNum) : 1;
return bracketNeedsLineStart(scode, captureMap | bracketMask, backrefMap);
}
/* Other brackets */
- if (op == OP_BRA || op == OP_ASSERT || op == OP_ONCE)
+ if (op == OP_BRA || op == OP_ASSERT)
return bracketNeedsLineStart(scode, captureMap, backrefMap);
/* .* means "start at start or after \n" if it isn't in brackets that
return op == OP_CIRC;
}
-static bool bracketNeedsLineStart(const uschar* code, unsigned captureMap, unsigned backrefMap)
+static bool bracketNeedsLineStart(const unsigned char* code, unsigned captureMap, unsigned backrefMap)
{
do {
if (!branchNeedsLineStart(code + 1 + LINK_SIZE, captureMap, backrefMap))
return false;
- code += getOpcodeValueAtOffset(code, 1);
+ code += getLinkValue(code + 1);
} while (*code == OP_ALT); /* Loop for each alternative */
return true;
}
Returns: -1 or the fixed first char
*/
-static int branchFindFirstAssertedCharacter(const uschar* code, bool inassert)
+static int branchFindFirstAssertedCharacter(const unsigned char* code, bool inassert)
{
- const uschar* scode = firstSignificantOpCodeSkippingAssertions(code);
+ const unsigned char* scode = firstSignificantOpcodeSkippingAssertions(code);
int op = *scode;
if (op >= OP_BRA)
case OP_BRA:
case OP_ASSERT:
- case OP_ONCE:
return bracketFindFirstAssertedCharacter(scode, op == OP_ASSERT);
case OP_EXACT:
}
}
-static int bracketFindFirstAssertedCharacter(const uschar* code, bool inassert)
+static int bracketFindFirstAssertedCharacter(const unsigned char* code, bool inassert)
{
int c = -1;
do {
c = d;
else if (c != d)
return -1;
- code += getOpcodeValueAtOffset(code, 1);
+ code += getLinkValue(code + 1);
} while (*code == OP_ALT);
return c;
}
int lastitemlength = 0;
unsigned brastackptr = 0;
int brastack[BRASTACK_SIZE];
- uschar bralenstack[BRASTACK_SIZE];
+ unsigned char bralenstack[BRASTACK_SIZE];
int bracount = 0;
const UChar* ptr = (const UChar*)(pattern - 1);
character type. */
case '\\':
- c = check_escape(&ptr, patternEnd, &errorcode, bracount, false);
+ c = checkEscape(&ptr, patternEnd, &errorcode, bracount, false);
if (errorcode != 0)
return -1;
if (c > 127) {
int i;
- for (i = 0; i < _pcre_utf8_table1_size; i++)
- if (c <= _pcre_utf8_table1[i]) break;
+ for (i = 0; i < kjs_pcre_utf8_table1_size; i++)
+ if (c <= kjs_pcre_utf8_table1[i]) break;
length += i;
lastitemlength += i;
}
if (refnum > cd.top_backref)
cd.top_backref = refnum;
length += 2; /* For single back reference */
- if (safelyCheckNextChar(ptr, patternEnd, '{') && is_counted_repeat(ptr + 2, patternEnd)) {
- ptr = read_repeat_counts(ptr + 2, &minRepeats, &maxRepeats, &errorcode);
+ if (safelyCheckNextChar(ptr, patternEnd, '{') && isCountedRepeat(ptr + 2, patternEnd)) {
+ ptr = readRepeatCounts(ptr + 2, &minRepeats, &maxRepeats, &errorcode);
if (errorcode)
return -1;
if ((minRepeats == 0 && (maxRepeats == 1 || maxRepeats == -1)) ||
class, or back reference. */
case '{':
- if (!is_counted_repeat(ptr+1, patternEnd))
+ if (!isCountedRepeat(ptr + 1, patternEnd))
goto NORMAL_CHAR;
- ptr = read_repeat_counts(ptr+1, &minRepeats, &maxRepeats, &errorcode);
+ ptr = readRepeatCounts(ptr + 1, &minRepeats, &maxRepeats, &errorcode);
if (errorcode != 0)
return -1;
/* Check for escapes */
if (*ptr == '\\') {
- c = check_escape(&ptr, patternEnd, &errorcode, bracount, true);
+ c = checkEscape(&ptr, patternEnd, &errorcode, bracount, true);
if (errorcode != 0)
return -1;
UChar const *hyptr = ptr++;
if (safelyCheckNextChar(ptr, patternEnd, '\\')) {
ptr++;
- d = check_escape(&ptr, patternEnd, &errorcode, bracount, true);
+ d = checkEscape(&ptr, patternEnd, &errorcode, bracount, true);
if (errorcode != 0)
return -1;
}
}
if ((d > 255 || (ignoreCase && d > 127))) {
- uschar buffer[6];
+ unsigned char buffer[6];
if (!class_utf8) /* Allow for XCLASS overhead */
{
class_utf8 = true;
int occ, ocd;
int cc = c;
int origd = d;
- while (get_othercase_range(&cc, origd, &occ, &ocd)) {
+ while (getOthercaseRange(&cc, origd, &occ, &ocd)) {
if (occ >= c && ocd <= d)
continue; /* Skip embedded */
/* An extra item is needed */
- length += 1 + _pcre_ord2utf8(occ, buffer) +
- ((occ == ocd) ? 0 : _pcre_ord2utf8(ocd, buffer));
+ length += 1 + encodeUTF8(occ, buffer) +
+ ((occ == ocd) ? 0 : encodeUTF8(ocd, buffer));
}
}
/* The length of the (possibly extended) range */
- length += 1 + _pcre_ord2utf8(c, buffer) + _pcre_ord2utf8(d, buffer);
+ length += 1 + encodeUTF8(c, buffer) + encodeUTF8(d, buffer);
}
}
else {
if ((c > 255 || (ignoreCase && c > 127))) {
- uschar buffer[6];
+ unsigned char buffer[6];
class_optcount = 10; /* Ensure > 1 */
if (!class_utf8) /* Allow for XCLASS overhead */
{
class_utf8 = true;
length += LINK_SIZE + 2;
}
- length += (ignoreCase ? 2 : 1) * (1 + _pcre_ord2utf8(c, buffer));
+ length += (ignoreCase ? 2 : 1) * (1 + encodeUTF8(c, buffer));
}
}
}
/* A repeat needs either 1 or 5 bytes. If it is a possessive quantifier,
we also need extra for wrapping the whole thing in a sub-pattern. */
- if (safelyCheckNextChar(ptr, patternEnd, '{') && is_counted_repeat(ptr+2, patternEnd)) {
- ptr = read_repeat_counts(ptr+2, &minRepeats, &maxRepeats, &errorcode);
+ if (safelyCheckNextChar(ptr, patternEnd, '{') && isCountedRepeat(ptr + 2, patternEnd)) {
+ ptr = readRepeatCounts(ptr + 2, &minRepeats, &maxRepeats, &errorcode);
if (errorcode != 0)
return -1;
if ((minRepeats == 0 && (maxRepeats == 1 || maxRepeats == -1)) ||
if (safelyCheckNextChar(ptr, patternEnd, '?')) {
switch (c = (ptr + 2 < patternEnd ? ptr[2] : 0)) {
- /* Non-referencing groups and lookaheads just move the pointer on, and
- then behave like a non-special bracket, except that they don't increment
- the count of extracting brackets. Ditto for the "once only" bracket,
- which is in Perl from version 5.005. */
+ /* Non-referencing groups and lookaheads just move the pointer on, and
+ then behave like a non-special bracket, except that they don't increment
+ the count of extracting brackets. Ditto for the "once only" bracket,
+ which is in Perl from version 5.005. */
case ':':
case '=':
ptr += 2;
break;
- /* Else loop checking valid options until ) is met. Anything else is an
- error. If we are without any brackets, i.e. at top level, the settings
- act as if specified in the options, so massage the options immediately.
- This is for backward compatibility with Perl 5.004. */
+ /* Else loop checking valid options until ) is met. Anything else is an
+ error. If we are without any brackets, i.e. at top level, the settings
+ act as if specified in the options, so massage the options immediately.
+ This is for backward compatibility with Perl 5.004. */
default:
errorcode = ERR12;
else
duplength = 0;
- /* Leave ptr at the final char; for read_repeat_counts this happens
+ /* Leave ptr at the final char; for readRepeatCounts this happens
automatically; for the others we need an increment. */
- if ((ptr + 1 < patternEnd) && (c = ptr[1]) == '{' && is_counted_repeat(ptr+2, patternEnd)) {
- ptr = read_repeat_counts(ptr+2, &minRepeats, &maxRepeats, &errorcode);
+ if ((ptr + 1 < patternEnd) && (c = ptr[1]) == '{' && isCountedRepeat(ptr + 2, patternEnd)) {
+ ptr = readRepeatCounts(ptr + 2, &minRepeats, &maxRepeats, &errorcode);
if (errorcode)
return -1;
} else if (c == '*') {
if (c > 127) {
int i;
- for (i = 0; i < _pcre_utf8_table1_size; i++)
- if (c <= _pcre_utf8_table1[i])
+ for (i = 0; i < kjs_pcre_utf8_table1_size; i++)
+ if (c <= kjs_pcre_utf8_table1[i])
break;
length += i;
lastitemlength += i;
with errorptr and erroroffset set
*/
-static JSRegExp* returnError(ErrorCode errorcode, const char** errorptr)
+static inline JSRegExp* returnError(ErrorCode errorcode, const char** errorptr)
{
- *errorptr = error_text(errorcode);
+ *errorptr = errorText(errorcode);
return 0;
}
/* The starting points of the name/number translation table and of the code are
passed around in the compile data block. */
- const uschar* codeStart = (const uschar*)(re + 1);
+ const unsigned char* codeStart = (const unsigned char*)(re + 1);
cd.start_code = codeStart;
cd.start_pattern = (const UChar*)pattern;
const UChar* ptr = (const UChar*)pattern;
const UChar* patternEnd = pattern + patternLength;
- uschar* code = (uschar*)codeStart;
+ unsigned char* code = (unsigned char*)codeStart;
int firstbyte, reqbyte;
int bracketCount = 0;
if (!cd.needOuterBracket)
/* Function arguments that may change */
struct {
const UChar* subjectPtr;
- const uschar* instructionPtr;
- int offset_top;
+ const unsigned char* instructionPtr;
+ int offsetTop;
const UChar* subpatternStart;
} args;
stack-based local variables are not safe to use. Instead we have to
store local variables on the current MatchFrame. */
struct {
- const uschar* data;
- const uschar* startOfRepeatingBracket;
+ const unsigned char* data;
+ const unsigned char* startOfRepeatingBracket;
const UChar* subjectPtrAtStartOfInstruction; // Several instrutions stash away a subjectPtr here for later compare
- const uschar* instructionPtrAtStartOfOnce;
+ const unsigned char* instructionPtrAtStartOfOnce;
- int repeat_othercase;
+ int repeatOthercase;
int ctype;
int fc;
int max;
int number;
int offset;
- int save_offset1;
- int save_offset2;
- int save_offset3;
+ int saveOffset1;
+ int saveOffset2;
+ int saveOffset3;
const UChar* subpatternStart;
} locals;
doing traditional NFA matching, so that they are thread-safe. */
struct MatchData {
- int* offset_vector; /* Offset vector */
- int offset_end; /* One past the end */
- int offset_max; /* The maximum usable for return data */
- bool offset_overflow; /* Set if too many extractions */
- const UChar* start_subject; /* Start of the subject string */
- const UChar* end_subject; /* End of the subject string */
- const UChar* end_match_ptr; /* Subject position at end match */
- int end_offset_top; /* Highwater mark at end of match */
+ int* offsetVector; /* Offset vector */
+ int offsetEnd; /* One past the end */
+ int offsetMax; /* The maximum usable for return data */
+ bool offsetOverflow; /* Set if too many extractions */
+ const UChar* startSubject; /* Start of the subject string */
+ const UChar* endSubject; /* End of the subject string */
+ const UChar* endMatchPtr; /* Subject position at end match */
+ int endOffsetTop; /* Highwater mark at end of match */
bool multiline;
bool ignoreCase;
};
/* Non-error returns from the match() function. Error returns are externally
-defined PCRE_ERROR_xxx codes, which are all negative. */
+defined error codes, which are all negative. */
#define MATCH_MATCH 1
#define MATCH_NOMATCH 0
+/* The maximum remaining length of subject we are prepared to search for a
+req_byte match. */
+
+#define REQ_BYTE_MAX 1000
+
+/* The below limit restricts the number of recursive match calls in order to
+limit the maximum amount of storage.
+
+This limit is tied to the size of MatchFrame. Right now we allow PCRE to allocate up
+to MATCH_RECURSION_LIMIT - 16 * sizeof(MatchFrame) bytes of "stack" space before we give up.
+Currently that's 100000 - 16 * (23 * 4) ~ 90MB. */
+
+#define MATCH_RECURSION_LIMIT 100000
+
#ifdef DEBUG
/*************************************************
* Debugging function to print chars *
Arguments:
p points to characters
length number to print
- is_subject true if printing from within md.start_subject
- md pointer to matching data block, if is_subject is true
+ isSubject true if printing from within md.startSubject
+ md pointer to matching data block, if isSubject is true
*/
-static void pchars(const UChar* p, int length, bool is_subject, const MatchData& md)
+static void pchars(const UChar* p, int length, bool isSubject, const MatchData& md)
{
- if (is_subject && length > md.end_subject - p)
- length = md.end_subject - p;
+ if (isSubject && length > md.endSubject - p)
+ length = md.endSubject - p;
while (length-- > 0) {
int c;
if (isprint(c = *(p++)))
}
#endif
-
-
/*************************************************
* Match a back-reference *
*************************************************/
Returns: true if matched
*/
-static bool match_ref(int offset, const UChar* subjectPtr, int length, const MatchData& md)
+static bool matchRef(int offset, const UChar* subjectPtr, int length, const MatchData& md)
{
- const UChar* p = md.start_subject + md.offset_vector[offset];
+ const UChar* p = md.startSubject + md.offsetVector[offset];
#ifdef DEBUG
- if (subjectPtr >= md.end_subject)
+ if (subjectPtr >= md.endSubject)
printf("matching subject <null>");
else {
printf("matching subject ");
/* Always fail if not enough characters left */
- if (length > md.end_subject - subjectPtr)
+ if (length > md.endSubject - subjectPtr)
return false;
/* Separate the caselesss case for speed */
if (md.ignoreCase) {
while (length-- > 0) {
UChar c = *p++;
- int othercase = _pcre_ucp_othercase(c);
+ int othercase = kjs_pcre_ucp_othercase(c);
UChar d = *subjectPtr++;
if (c != d && othercase != d)
return false;
#endif
-#define CHECK_RECURSION_LIMIT \
- if (stack.size >= MATCH_LIMIT_RECURSION) \
- return matchError(JSRegExpErrorRecursionLimit, stack);
-
-#define RECURSE_WITH_RETURN_NUMBER(num) \
- CHECK_RECURSION_LIMIT \
+#define RECURSIVE_MATCH_COMMON(num) \
+ if (stack.size >= MATCH_RECURSION_LIMIT) \
+ return matchError(JSRegExpErrorRecursionLimit, stack); \
goto RECURSE;\
- RRETURN_##num:
+ RRETURN_##num: \
+ stack.popCurrentFrame();
#define RECURSIVE_MATCH(num, ra, rb) \
-{\
- stack.pushNewFrame((ra), (rb), RMATCH_WHERE(num)); \
- RECURSE_WITH_RETURN_NUMBER(num) \
- stack.popCurrentFrame(); \
-}
+ do { \
+ stack.pushNewFrame((ra), (rb), RMATCH_WHERE(num)); \
+ RECURSIVE_MATCH_COMMON(num) \
+ } while (0)
#define RECURSIVE_MATCH_STARTNG_NEW_GROUP(num, ra, rb) \
-{\
- stack.pushNewFrame((ra), (rb), RMATCH_WHERE(num)); \
- startNewGroup(stack.currentFrame); \
- RECURSE_WITH_RETURN_NUMBER(num) \
- stack.popCurrentFrame(); \
-}
+ do { \
+ stack.pushNewFrame((ra), (rb), RMATCH_WHERE(num)); \
+ startNewGroup(stack.currentFrame); \
+ RECURSIVE_MATCH_COMMON(num) \
+ } while (0)
#define RRETURN goto RRETURN_LABEL
-#define RRETURN_NO_MATCH \
- {\
- is_match = false;\
- RRETURN;\
- }
+#define RRETURN_NO_MATCH do { isMatch = false; RRETURN; } while (0)
/*************************************************
* Match from current position *
Arguments:
subjectPtr pointer in subject
instructionPtr position in code
- offset_top current top pointer
+ offsetTop current top pointer
md pointer to "static" info for the match
Returns: MATCH_MATCH if matched ) these values are >= 0
MATCH_NOMATCH if failed to match )
- a negative PCRE_ERROR_xxx value if aborted by an error condition
+ a negative error value if aborted by an error condition
(e.g. stopped by repeated call or recursion limit)
*/
return new MatchFrame;
}
- inline void pushNewFrame(const uschar* instructionPtr, const UChar* subpatternStart, ReturnLocation returnLocation)
+ inline void pushNewFrame(const unsigned char* instructionPtr, const UChar* subpatternStart, ReturnLocation returnLocation)
{
MatchFrame* newframe = allocateNextFrame();
newframe->previousFrame = currentFrame;
newframe->args.subjectPtr = currentFrame->args.subjectPtr;
- newframe->args.offset_top = currentFrame->args.offset_top;
+ newframe->args.offsetTop = currentFrame->args.offsetTop;
newframe->args.instructionPtr = instructionPtr;
newframe->args.subpatternStart = subpatternStart;
newframe->returnLocation = returnLocation;
/* Get the next UTF-8 character, not advancing the pointer, incrementing length
if there are extra bytes. This is called when we know we are in UTF-8 mode. */
-static inline void getUTF8CharAndIncrementLength(int& c, const uschar* subjectPtr, int& len)
+static inline void getUTF8CharAndIncrementLength(int& c, const unsigned char* subjectPtr, int& len)
{
c = *subjectPtr;
if ((c & 0xc0) == 0xc0) {
- int gcaa = _pcre_utf8_table4[c & 0x3f]; /* Number of additional bytes */
+ int gcaa = kjs_pcre_utf8_table4[c & 0x3f]; /* Number of additional bytes */
int gcss = 6 * gcaa;
- c = (c & _pcre_utf8_table3[gcaa]) << gcss;
+ c = (c & kjs_pcre_utf8_table3[gcaa]) << gcss;
for (int gcii = 1; gcii <= gcaa; gcii++) {
gcss -= 6;
c |= (subjectPtr[gcii] & 0x3f) << gcss;
maximumRepeats = maximumRepeatsFromInstructionOffset[instructionOffset];
}
-static int match(const UChar* subjectPtr, const uschar* instructionPtr, int offset_top, MatchData& md)
+static int match(const UChar* subjectPtr, const unsigned char* instructionPtr, int offsetTop, MatchData& md)
{
- int is_match = false;
+ int isMatch = false;
int min;
bool minimize = false; /* Initialization not really needed, but some compilers think so. */
/* The opcode jump table. */
#ifdef USE_COMPUTED_GOTO_FOR_MATCH_OPCODE_LOOP
#define EMIT_JUMP_TABLE_ENTRY(opcode) &&LABEL_OP_##opcode,
- static void* opcode_jump_table[256] = { FOR_EACH_OPCODE(EMIT_JUMP_TABLE_ENTRY) };
+ static void* opcodeJumpTable[256] = { FOR_EACH_OPCODE(EMIT_JUMP_TABLE_ENTRY) };
#undef EMIT_JUMP_TABLE_ENTRY
#endif
/* One-time setup of the opcode jump table. */
#ifdef USE_COMPUTED_GOTO_FOR_MATCH_OPCODE_LOOP
- for (int i = 255; !opcode_jump_table[i]; i--)
- opcode_jump_table[i] = &&CAPTURING_BRACKET;
+ for (int i = 255; !opcodeJumpTable[i]; i--)
+ opcodeJumpTable[i] = &&CAPTURING_BRACKET;
#endif
#ifdef USE_COMPUTED_GOTO_FOR_MATCH_RECURSION
#endif
stack.currentFrame->args.subjectPtr = subjectPtr;
stack.currentFrame->args.instructionPtr = instructionPtr;
- stack.currentFrame->args.offset_top = offset_top;
+ stack.currentFrame->args.offsetTop = offsetTop;
stack.currentFrame->args.subpatternStart = 0;
startNewGroup(stack.currentFrame);
#ifdef USE_COMPUTED_GOTO_FOR_MATCH_OPCODE_LOOP
#define BEGIN_OPCODE(opcode) LABEL_OP_##opcode
-#define NEXT_OPCODE goto *opcode_jump_table[*stack.currentFrame->args.instructionPtr]
+#define NEXT_OPCODE goto *opcodeJumpTable[*stack.currentFrame->args.instructionPtr]
#else
#define BEGIN_OPCODE(opcode) case OP_##opcode
#define NEXT_OPCODE continue
DPRINTF(("start bracket 0\n"));
do {
RECURSIVE_MATCH_STARTNG_NEW_GROUP(2, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);
} while (*stack.currentFrame->args.instructionPtr == OP_ALT);
DPRINTF(("bracket 0 failed\n"));
RRETURN;
/* End of the pattern. */
BEGIN_OPCODE(END):
- md.end_match_ptr = stack.currentFrame->args.subjectPtr; /* Record where we ended */
- md.end_offset_top = stack.currentFrame->args.offset_top; /* and how many extracts were taken */
- is_match = true;
+ md.endMatchPtr = stack.currentFrame->args.subjectPtr; /* Record where we ended */
+ md.endOffsetTop = stack.currentFrame->args.offsetTop; /* and how many extracts were taken */
+ isMatch = true;
RRETURN;
/* Assertion brackets. Check the alternative branches in turn - the
BEGIN_OPCODE(ASSERT):
do {
RECURSIVE_MATCH_STARTNG_NEW_GROUP(6, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, NULL);
- if (is_match)
+ if (isMatch)
break;
- stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);
} while (*stack.currentFrame->args.instructionPtr == OP_ALT);
if (*stack.currentFrame->args.instructionPtr == OP_KET)
RRETURN_NO_MATCH;
/* Continue from after the assertion, updating the offsets high water
mark, since extracts may have been taken during the assertion. */
- moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->args.instructionPtr);
+ advanceToEndOfBracket(stack.currentFrame->args.instructionPtr);
stack.currentFrame->args.instructionPtr += 1 + LINK_SIZE;
- stack.currentFrame->args.offset_top = md.end_offset_top;
+ stack.currentFrame->args.offsetTop = md.endOffsetTop;
NEXT_OPCODE;
/* Negative assertion: all branches must fail to match */
BEGIN_OPCODE(ASSERT_NOT):
do {
RECURSIVE_MATCH_STARTNG_NEW_GROUP(7, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, NULL);
- if (is_match)
+ if (isMatch)
RRETURN_NO_MATCH;
- stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);
} while (*stack.currentFrame->args.instructionPtr == OP_ALT);
stack.currentFrame->args.instructionPtr += 1 + LINK_SIZE;
NEXT_OPCODE;
- /* "Once" brackets are like assertion brackets except that after a match,
- the point in the subject string is not moved back. Thus there can never be
- a move back into the brackets. Friedl calls these "atomic" subpatterns.
- Check the alternative branches in turn - the matching won't pass the KET
- for this kind of subpattern. If any one branch matches, we carry on as at
- the end of a normal bracket, leaving the subject pointer. */
-
- BEGIN_OPCODE(ONCE):
- stack.currentFrame->locals.instructionPtrAtStartOfOnce = stack.currentFrame->args.instructionPtr;
- stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
-
- do {
- RECURSIVE_MATCH_STARTNG_NEW_GROUP(9, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
- if (is_match)
- break;
- stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
- } while (*stack.currentFrame->args.instructionPtr == OP_ALT);
-
- /* If hit the end of the group (which could be repeated), fail */
-
- if (*stack.currentFrame->args.instructionPtr != OP_ONCE && *stack.currentFrame->args.instructionPtr != OP_ALT)
- RRETURN;
-
- /* Continue as from after the assertion, updating the offsets high water
- mark, since extracts may have been taken. */
-
- moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->args.instructionPtr);
-
- stack.currentFrame->args.offset_top = md.end_offset_top;
- stack.currentFrame->args.subjectPtr = md.end_match_ptr;
-
- /* For a non-repeating ket, just continue at this level. This also
- happens for a repeating ket if no characters were matched in the group.
- This is the forcible breaking of infinite loops as implemented in Perl
- 5.005. If there is an options reset, it will get obeyed in the normal
- course of events. */
-
- if (*stack.currentFrame->args.instructionPtr == OP_KET || stack.currentFrame->args.subjectPtr == stack.currentFrame->locals.subjectPtrAtStartOfInstruction) {
- stack.currentFrame->args.instructionPtr += 1 + LINK_SIZE;
- NEXT_OPCODE;
- }
-
- /* The repeating kets try the rest of the pattern or restart from the
- preceding bracket, in the appropriate order. We need to reset any options
- that changed within the bracket before re-running it, so check the next
- opcode. */
-
- if (*stack.currentFrame->args.instructionPtr == OP_KETRMIN) {
- RECURSIVE_MATCH(10, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
- if (is_match)
- RRETURN;
- RECURSIVE_MATCH_STARTNG_NEW_GROUP(11, stack.currentFrame->locals.instructionPtrAtStartOfOnce, stack.currentFrame->args.subpatternStart);
- if (is_match)
- RRETURN;
- } else { /* OP_KETRMAX */
- RECURSIVE_MATCH_STARTNG_NEW_GROUP(12, stack.currentFrame->locals.instructionPtrAtStartOfOnce, stack.currentFrame->args.subpatternStart);
- if (is_match)
- RRETURN;
- RECURSIVE_MATCH(13, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
- if (is_match)
- RRETURN;
- }
- RRETURN;
-
/* An alternation is the end of a branch; scan along to find the end of the
bracketed group and go to there. */
BEGIN_OPCODE(ALT):
- moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->args.instructionPtr);
+ advanceToEndOfBracket(stack.currentFrame->args.instructionPtr);
NEXT_OPCODE;
/* BRAZERO and BRAMINZERO occur just before a bracket group, indicating
BEGIN_OPCODE(BRAZERO): {
stack.currentFrame->locals.startOfRepeatingBracket = stack.currentFrame->args.instructionPtr + 1;
RECURSIVE_MATCH_STARTNG_NEW_GROUP(14, stack.currentFrame->locals.startOfRepeatingBracket, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->locals.startOfRepeatingBracket);
+ advanceToEndOfBracket(stack.currentFrame->locals.startOfRepeatingBracket);
stack.currentFrame->args.instructionPtr = stack.currentFrame->locals.startOfRepeatingBracket + 1 + LINK_SIZE;
NEXT_OPCODE;
}
BEGIN_OPCODE(BRAMINZERO): {
stack.currentFrame->locals.startOfRepeatingBracket = stack.currentFrame->args.instructionPtr + 1;
- moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->locals.startOfRepeatingBracket);
+ advanceToEndOfBracket(stack.currentFrame->locals.startOfRepeatingBracket);
RECURSIVE_MATCH_STARTNG_NEW_GROUP(15, stack.currentFrame->locals.startOfRepeatingBracket + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
stack.currentFrame->args.instructionPtr++;
NEXT_OPCODE;
BEGIN_OPCODE(KET):
BEGIN_OPCODE(KETRMIN):
BEGIN_OPCODE(KETRMAX):
- stack.currentFrame->locals.instructionPtrAtStartOfOnce = stack.currentFrame->args.instructionPtr - getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ stack.currentFrame->locals.instructionPtrAtStartOfOnce = stack.currentFrame->args.instructionPtr - getLinkValue(stack.currentFrame->args.instructionPtr + 1);
stack.currentFrame->args.subpatternStart = stack.currentFrame->locals.subpatternStart;
stack.currentFrame->locals.subpatternStart = stack.currentFrame->previousFrame->args.subpatternStart;
- if (*stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ASSERT || *stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ASSERT_NOT || *stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ONCE) {
- md.end_match_ptr = stack.currentFrame->args.subjectPtr; /* For ONCE */
- md.end_offset_top = stack.currentFrame->args.offset_top;
- is_match = true;
+ if (*stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ASSERT || *stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ASSERT_NOT) {
+ md.endOffsetTop = stack.currentFrame->args.offsetTop;
+ isMatch = true;
RRETURN;
}
the number from a dummy opcode at the start. */
if (stack.currentFrame->locals.number > EXTRACT_BASIC_MAX)
- stack.currentFrame->locals.number = get2ByteOpcodeValueAtOffset(stack.currentFrame->locals.instructionPtrAtStartOfOnce, 2+LINK_SIZE);
+ stack.currentFrame->locals.number = get2ByteValue(stack.currentFrame->locals.instructionPtrAtStartOfOnce + 2 + LINK_SIZE);
stack.currentFrame->locals.offset = stack.currentFrame->locals.number << 1;
#ifdef DEBUG
the OP_END is reached. */
if (stack.currentFrame->locals.number > 0) {
- if (stack.currentFrame->locals.offset >= md.offset_max)
- md.offset_overflow = true;
+ if (stack.currentFrame->locals.offset >= md.offsetMax)
+ md.offsetOverflow = true;
else {
- md.offset_vector[stack.currentFrame->locals.offset] =
- md.offset_vector[md.offset_end - stack.currentFrame->locals.number];
- md.offset_vector[stack.currentFrame->locals.offset+1] = stack.currentFrame->args.subjectPtr - md.start_subject;
- if (stack.currentFrame->args.offset_top <= stack.currentFrame->locals.offset)
- stack.currentFrame->args.offset_top = stack.currentFrame->locals.offset + 2;
+ md.offsetVector[stack.currentFrame->locals.offset] =
+ md.offsetVector[md.offsetEnd - stack.currentFrame->locals.number];
+ md.offsetVector[stack.currentFrame->locals.offset+1] = stack.currentFrame->args.subjectPtr - md.startSubject;
+ if (stack.currentFrame->args.offsetTop <= stack.currentFrame->locals.offset)
+ stack.currentFrame->args.offsetTop = stack.currentFrame->locals.offset + 2;
}
}
if (*stack.currentFrame->args.instructionPtr == OP_KETRMIN) {
RECURSIVE_MATCH(16, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
RECURSIVE_MATCH_STARTNG_NEW_GROUP(17, stack.currentFrame->locals.instructionPtrAtStartOfOnce, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
} else { /* OP_KETRMAX */
RECURSIVE_MATCH_STARTNG_NEW_GROUP(18, stack.currentFrame->locals.instructionPtrAtStartOfOnce, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
RECURSIVE_MATCH(19, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
}
RRETURN;
/* Start of subject, or after internal newline if multiline. */
BEGIN_OPCODE(CIRC):
- if (stack.currentFrame->args.subjectPtr != md.start_subject && (!md.multiline || !isNewline(stack.currentFrame->args.subjectPtr[-1])))
+ if (stack.currentFrame->args.subjectPtr != md.startSubject && (!md.multiline || !isNewline(stack.currentFrame->args.subjectPtr[-1])))
RRETURN_NO_MATCH;
stack.currentFrame->args.instructionPtr++;
NEXT_OPCODE;
/* End of subject, or before internal newline if multiline. */
BEGIN_OPCODE(DOLL):
- if (stack.currentFrame->args.subjectPtr < md.end_subject && (!md.multiline || !isNewline(*stack.currentFrame->args.subjectPtr)))
+ if (stack.currentFrame->args.subjectPtr < md.endSubject && (!md.multiline || !isNewline(*stack.currentFrame->args.subjectPtr)))
RRETURN_NO_MATCH;
stack.currentFrame->args.instructionPtr++;
NEXT_OPCODE;
bool currentCharIsWordChar = false;
bool previousCharIsWordChar = false;
- if (stack.currentFrame->args.subjectPtr > md.start_subject)
+ if (stack.currentFrame->args.subjectPtr > md.startSubject)
previousCharIsWordChar = isWordChar(stack.currentFrame->args.subjectPtr[-1]);
- if (stack.currentFrame->args.subjectPtr < md.end_subject)
+ if (stack.currentFrame->args.subjectPtr < md.endSubject)
currentCharIsWordChar = isWordChar(*stack.currentFrame->args.subjectPtr);
/* Now see if the situation is what we want */
/* Match a single character type; inline for speed */
BEGIN_OPCODE(NOT_NEWLINE):
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
if (isNewline(*stack.currentFrame->args.subjectPtr++))
RRETURN_NO_MATCH;
NEXT_OPCODE;
BEGIN_OPCODE(NOT_DIGIT):
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
if (isASCIIDigit(*stack.currentFrame->args.subjectPtr++))
RRETURN_NO_MATCH;
NEXT_OPCODE;
BEGIN_OPCODE(DIGIT):
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
if (!isASCIIDigit(*stack.currentFrame->args.subjectPtr++))
RRETURN_NO_MATCH;
NEXT_OPCODE;
BEGIN_OPCODE(NOT_WHITESPACE):
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
if (isSpaceChar(*stack.currentFrame->args.subjectPtr++))
RRETURN_NO_MATCH;
NEXT_OPCODE;
BEGIN_OPCODE(WHITESPACE):
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
if (!isSpaceChar(*stack.currentFrame->args.subjectPtr++))
RRETURN_NO_MATCH;
NEXT_OPCODE;
BEGIN_OPCODE(NOT_WORDCHAR):
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
if (isWordChar(*stack.currentFrame->args.subjectPtr++))
RRETURN_NO_MATCH;
NEXT_OPCODE;
BEGIN_OPCODE(WORDCHAR):
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
if (!isWordChar(*stack.currentFrame->args.subjectPtr++))
RRETURN_NO_MATCH;
loops). */
BEGIN_OPCODE(REF):
- stack.currentFrame->locals.offset = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1) << 1; /* Doubled ref number */
+ stack.currentFrame->locals.offset = get2ByteValue(stack.currentFrame->args.instructionPtr + 1) << 1; /* Doubled ref number */
stack.currentFrame->args.instructionPtr += 3; /* Advance past item */
/* If the reference is unset, set the length to be longer than the amount
can't just fail here, because of the possibility of quantifiers with zero
minima. */
- if (stack.currentFrame->locals.offset >= stack.currentFrame->args.offset_top || md.offset_vector[stack.currentFrame->locals.offset] < 0)
+ if (stack.currentFrame->locals.offset >= stack.currentFrame->args.offsetTop || md.offsetVector[stack.currentFrame->locals.offset] < 0)
stack.currentFrame->locals.length = 0;
else
- stack.currentFrame->locals.length = md.offset_vector[stack.currentFrame->locals.offset+1] - md.offset_vector[stack.currentFrame->locals.offset];
+ stack.currentFrame->locals.length = md.offsetVector[stack.currentFrame->locals.offset+1] - md.offsetVector[stack.currentFrame->locals.offset];
/* Set up for repetition, or handle the non-repeated case */
case OP_CRRANGE:
case OP_CRMINRANGE:
minimize = (*stack.currentFrame->args.instructionPtr == OP_CRMINRANGE);
- min = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
- stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 3);
+ min = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
+ stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 3);
if (stack.currentFrame->locals.max == 0)
stack.currentFrame->locals.max = INT_MAX;
stack.currentFrame->args.instructionPtr += 5;
break;
default: /* No repeat follows */
- if (!match_ref(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
+ if (!matchRef(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
RRETURN_NO_MATCH;
stack.currentFrame->args.subjectPtr += stack.currentFrame->locals.length;
NEXT_OPCODE;
/* First, ensure the minimum number of matches are present. */
for (int i = 1; i <= min; i++) {
- if (!match_ref(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
+ if (!matchRef(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
RRETURN_NO_MATCH;
stack.currentFrame->args.subjectPtr += stack.currentFrame->locals.length;
}
if (minimize) {
for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
RECURSIVE_MATCH(20, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || !match_ref(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
+ if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || !matchRef(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
RRETURN;
stack.currentFrame->args.subjectPtr += stack.currentFrame->locals.length;
}
else {
stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (!match_ref(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
+ if (!matchRef(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
break;
stack.currentFrame->args.subjectPtr += stack.currentFrame->locals.length;
}
while (stack.currentFrame->args.subjectPtr >= stack.currentFrame->locals.subjectPtrAtStartOfInstruction) {
RECURSIVE_MATCH(21, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
stack.currentFrame->args.subjectPtr -= stack.currentFrame->locals.length;
}
case OP_CRRANGE:
case OP_CRMINRANGE:
minimize = (*stack.currentFrame->args.instructionPtr == OP_CRMINRANGE);
- min = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
- stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 3);
+ min = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
+ stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 3);
if (stack.currentFrame->locals.max == 0)
stack.currentFrame->locals.max = INT_MAX;
stack.currentFrame->args.instructionPtr += 5;
/* First, ensure the minimum number of matches are present. */
for (int i = 1; i <= min; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
int c = *stack.currentFrame->args.subjectPtr++;
if (c > 255) {
if (minimize) {
for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
RECURSIVE_MATCH(22, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN;
int c = *stack.currentFrame->args.subjectPtr++;
if (c > 255) {
stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int c = *stack.currentFrame->args.subjectPtr;
if (c > 255) {
}
for (;;) {
RECURSIVE_MATCH(24, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
break; /* Stop if tried at original pos */
BEGIN_OPCODE(XCLASS):
stack.currentFrame->locals.data = stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE; /* Save for matching */
- stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1); /* Advance past the item */
+ stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1); /* Advance past the item */
switch (*stack.currentFrame->args.instructionPtr) {
case OP_CRSTAR:
case OP_CRRANGE:
case OP_CRMINRANGE:
minimize = (*stack.currentFrame->args.instructionPtr == OP_CRMINRANGE);
- min = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
- stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 3);
+ min = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
+ stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 3);
if (stack.currentFrame->locals.max == 0)
stack.currentFrame->locals.max = INT_MAX;
stack.currentFrame->args.instructionPtr += 5;
default: /* No repeat follows */
min = stack.currentFrame->locals.max = 1;
- }
+ }
/* First, ensure the minimum number of matches are present. */
for (int i = 1; i <= min; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
int c = *stack.currentFrame->args.subjectPtr++;
- if (!_pcre_xclass(c, stack.currentFrame->locals.data))
+ if (!kjs_pcre_xclass(c, stack.currentFrame->locals.data))
RRETURN_NO_MATCH;
}
if (minimize) {
for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
RECURSIVE_MATCH(26, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN;
int c = *stack.currentFrame->args.subjectPtr++;
- if (!_pcre_xclass(c, stack.currentFrame->locals.data))
+ if (!kjs_pcre_xclass(c, stack.currentFrame->locals.data))
RRETURN;
}
/* Control never reaches here */
else {
stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int c = *stack.currentFrame->args.subjectPtr;
- if (!_pcre_xclass(c, stack.currentFrame->locals.data))
+ if (!kjs_pcre_xclass(c, stack.currentFrame->locals.data))
break;
++stack.currentFrame->args.subjectPtr;
}
for(;;) {
RECURSIVE_MATCH(27, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
break; /* Stop if tried at original pos */
stack.currentFrame->args.instructionPtr++;
getUTF8CharAndIncrementLength(stack.currentFrame->locals.fc, stack.currentFrame->args.instructionPtr, stack.currentFrame->locals.length);
stack.currentFrame->args.instructionPtr += stack.currentFrame->locals.length;
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
if (stack.currentFrame->locals.fc != *stack.currentFrame->args.subjectPtr++)
RRETURN_NO_MATCH;
stack.currentFrame->args.instructionPtr++;
getUTF8CharAndIncrementLength(stack.currentFrame->locals.fc, stack.currentFrame->args.instructionPtr, stack.currentFrame->locals.length);
stack.currentFrame->args.instructionPtr += stack.currentFrame->locals.length;
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
int dc = *stack.currentFrame->args.subjectPtr++;
- if (stack.currentFrame->locals.fc != dc && _pcre_ucp_othercase(stack.currentFrame->locals.fc) != dc)
+ if (stack.currentFrame->locals.fc != dc && kjs_pcre_ucp_othercase(stack.currentFrame->locals.fc) != dc)
RRETURN_NO_MATCH;
NEXT_OPCODE;
}
/* Match a single ASCII character. */
BEGIN_OPCODE(ASCII_CHAR):
- if (md.end_subject == stack.currentFrame->args.subjectPtr)
+ if (md.endSubject == stack.currentFrame->args.subjectPtr)
RRETURN_NO_MATCH;
if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->args.instructionPtr[1])
RRETURN_NO_MATCH;
/* Match one of two cases of an ASCII letter. */
BEGIN_OPCODE(ASCII_LETTER_IGNORING_CASE):
- if (md.end_subject == stack.currentFrame->args.subjectPtr)
+ if (md.endSubject == stack.currentFrame->args.subjectPtr)
RRETURN_NO_MATCH;
if ((*stack.currentFrame->args.subjectPtr | 0x20) != stack.currentFrame->args.instructionPtr[1])
RRETURN_NO_MATCH;
/* Match a single character repeatedly; different opcodes share code. */
BEGIN_OPCODE(EXACT):
- min = stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ min = stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
minimize = false;
stack.currentFrame->args.instructionPtr += 3;
goto REPEATCHAR;
BEGIN_OPCODE(UPTO):
BEGIN_OPCODE(MINUPTO):
min = 0;
- stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
minimize = *stack.currentFrame->args.instructionPtr == OP_MINUPTO;
stack.currentFrame->args.instructionPtr += 3;
goto REPEATCHAR;
stack.currentFrame->locals.length = 1;
getUTF8CharAndIncrementLength(stack.currentFrame->locals.fc, stack.currentFrame->args.instructionPtr, stack.currentFrame->locals.length);
- if (min * (stack.currentFrame->locals.fc > 0xFFFF ? 2 : 1) > md.end_subject - stack.currentFrame->args.subjectPtr)
+ if (min * (stack.currentFrame->locals.fc > 0xFFFF ? 2 : 1) > md.endSubject - stack.currentFrame->args.subjectPtr)
RRETURN_NO_MATCH;
stack.currentFrame->args.instructionPtr += stack.currentFrame->locals.length;
if (stack.currentFrame->locals.fc <= 0xFFFF) {
- int othercase = md.ignoreCase ? _pcre_ucp_othercase(stack.currentFrame->locals.fc) : -1;
+ int othercase = md.ignoreCase ? kjs_pcre_ucp_othercase(stack.currentFrame->locals.fc) : -1;
for (int i = 1; i <= min; i++) {
if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc && *stack.currentFrame->args.subjectPtr != othercase)
NEXT_OPCODE;
if (minimize) {
- stack.currentFrame->locals.repeat_othercase = othercase;
+ stack.currentFrame->locals.repeatOthercase = othercase;
for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
RECURSIVE_MATCH(28, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN;
- if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc && *stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.repeat_othercase)
+ if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc && *stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.repeatOthercase)
RRETURN;
++stack.currentFrame->args.subjectPtr;
}
} else {
stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc && *stack.currentFrame->args.subjectPtr != othercase)
break;
}
while (stack.currentFrame->args.subjectPtr >= stack.currentFrame->locals.subjectPtrAtStartOfInstruction) {
RECURSIVE_MATCH(29, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
--stack.currentFrame->args.subjectPtr;
}
if (minimize) {
for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
RECURSIVE_MATCH(30, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN;
if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc)
RRETURN;
} else {
stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr > md.end_subject - 2)
+ if (stack.currentFrame->args.subjectPtr > md.endSubject - 2)
break;
if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc)
break;
}
while (stack.currentFrame->args.subjectPtr >= stack.currentFrame->locals.subjectPtrAtStartOfInstruction) {
RECURSIVE_MATCH(31, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
stack.currentFrame->args.subjectPtr -= 2;
}
/* Match a negated single one-byte character. */
BEGIN_OPCODE(NOT): {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN_NO_MATCH;
stack.currentFrame->args.instructionPtr++;
int c = *stack.currentFrame->args.subjectPtr++;
about... */
BEGIN_OPCODE(NOTEXACT):
- min = stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ min = stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
minimize = false;
stack.currentFrame->args.instructionPtr += 3;
goto REPEATNOTCHAR;
BEGIN_OPCODE(NOTUPTO):
BEGIN_OPCODE(NOTMINUPTO):
min = 0;
- stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
minimize = *stack.currentFrame->args.instructionPtr == OP_NOTMINUPTO;
stack.currentFrame->args.instructionPtr += 3;
goto REPEATNOTCHAR;
subject. */
REPEATNOTCHAR:
- if (min > md.end_subject - stack.currentFrame->args.subjectPtr)
+ if (min > md.endSubject - stack.currentFrame->args.subjectPtr)
RRETURN_NO_MATCH;
stack.currentFrame->locals.fc = *stack.currentFrame->args.instructionPtr++;
if (minimize) {
for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
RECURSIVE_MATCH(38, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
int d = *stack.currentFrame->args.subjectPtr++;
if (d < 128)
d = toLowerCase(d);
- if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject || stack.currentFrame->locals.fc == d)
+ if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject || stack.currentFrame->locals.fc == d)
RRETURN;
}
/* Control never reaches here */
stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int d = *stack.currentFrame->args.subjectPtr;
if (d < 128)
}
for (;;) {
RECURSIVE_MATCH(40, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
break; /* Stop if tried at original pos */
if (minimize) {
for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
RECURSIVE_MATCH(42, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
int d = *stack.currentFrame->args.subjectPtr++;
- if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject || stack.currentFrame->locals.fc == d)
+ if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject || stack.currentFrame->locals.fc == d)
RRETURN;
}
/* Control never reaches here */
stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int d = *stack.currentFrame->args.subjectPtr;
if (stack.currentFrame->locals.fc == d)
}
for (;;) {
RECURSIVE_MATCH(44, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
break; /* Stop if tried at original pos */
repeat it in the interests of efficiency. */
BEGIN_OPCODE(TYPEEXACT):
- min = stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ min = stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
minimize = true;
stack.currentFrame->args.instructionPtr += 3;
goto REPEATTYPE;
BEGIN_OPCODE(TYPEUPTO):
BEGIN_OPCODE(TYPEMINUPTO):
min = 0;
- stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
minimize = *stack.currentFrame->args.instructionPtr == OP_TYPEMINUPTO;
stack.currentFrame->args.instructionPtr += 3;
goto REPEATTYPE;
(i.e. keep it out of the loop). Also we can test that there are at least
the minimum number of characters before we start. */
- if (min > md.end_subject - stack.currentFrame->args.subjectPtr)
+ if (min > md.endSubject - stack.currentFrame->args.subjectPtr)
RRETURN_NO_MATCH;
if (min > 0) {
switch (stack.currentFrame->locals.ctype) {
if (minimize) {
for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
RECURSIVE_MATCH(48, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
RRETURN;
int c = *stack.currentFrame->args.subjectPtr++;
switch (stack.currentFrame->locals.ctype) {
case OP_NOT_NEWLINE:
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject || isNewline(*stack.currentFrame->args.subjectPtr))
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject || isNewline(*stack.currentFrame->args.subjectPtr))
break;
stack.currentFrame->args.subjectPtr++;
}
case OP_NOT_DIGIT:
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int c = *stack.currentFrame->args.subjectPtr;
if (isASCIIDigit(c))
case OP_DIGIT:
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int c = *stack.currentFrame->args.subjectPtr;
if (!isASCIIDigit(c))
case OP_NOT_WHITESPACE:
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int c = *stack.currentFrame->args.subjectPtr;
if (isSpaceChar(c))
case OP_WHITESPACE:
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int c = *stack.currentFrame->args.subjectPtr;
if (!isSpaceChar(c))
case OP_NOT_WORDCHAR:
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int c = *stack.currentFrame->args.subjectPtr;
if (isWordChar(c))
case OP_WORDCHAR:
for (int i = min; i < stack.currentFrame->locals.max; i++) {
- if (stack.currentFrame->args.subjectPtr >= md.end_subject)
+ if (stack.currentFrame->args.subjectPtr >= md.endSubject)
break;
int c = *stack.currentFrame->args.subjectPtr;
if (!isWordChar(c))
for (;;) {
RECURSIVE_MATCH(52, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
break; /* Stop if tried at original pos */
number from a dummy opcode at the start. */
if (stack.currentFrame->locals.number > EXTRACT_BASIC_MAX)
- stack.currentFrame->locals.number = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 2+LINK_SIZE);
+ stack.currentFrame->locals.number = get2ByteValue(stack.currentFrame->args.instructionPtr + 2 + LINK_SIZE);
stack.currentFrame->locals.offset = stack.currentFrame->locals.number << 1;
#ifdef DEBUG
printf("\n");
#endif
- if (stack.currentFrame->locals.offset < md.offset_max) {
- stack.currentFrame->locals.save_offset1 = md.offset_vector[stack.currentFrame->locals.offset];
- stack.currentFrame->locals.save_offset2 = md.offset_vector[stack.currentFrame->locals.offset + 1];
- stack.currentFrame->locals.save_offset3 = md.offset_vector[md.offset_end - stack.currentFrame->locals.number];
+ if (stack.currentFrame->locals.offset < md.offsetMax) {
+ stack.currentFrame->locals.saveOffset1 = md.offsetVector[stack.currentFrame->locals.offset];
+ stack.currentFrame->locals.saveOffset2 = md.offsetVector[stack.currentFrame->locals.offset + 1];
+ stack.currentFrame->locals.saveOffset3 = md.offsetVector[md.offsetEnd - stack.currentFrame->locals.number];
- DPRINTF(("saving %d %d %d\n", stack.currentFrame->locals.save_offset1, stack.currentFrame->locals.save_offset2, stack.currentFrame->locals.save_offset3));
- md.offset_vector[md.offset_end - stack.currentFrame->locals.number] = stack.currentFrame->args.subjectPtr - md.start_subject;
+ DPRINTF(("saving %d %d %d\n", stack.currentFrame->locals.saveOffset1, stack.currentFrame->locals.saveOffset2, stack.currentFrame->locals.saveOffset3));
+ md.offsetVector[md.offsetEnd - stack.currentFrame->locals.number] = stack.currentFrame->args.subjectPtr - md.startSubject;
do {
RECURSIVE_MATCH_STARTNG_NEW_GROUP(1, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
- if (is_match)
+ if (isMatch)
RRETURN;
- stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
+ stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);
} while (*stack.currentFrame->args.instructionPtr == OP_ALT);
DPRINTF(("bracket %d failed\n", stack.currentFrame->locals.number));
- md.offset_vector[stack.currentFrame->locals.offset] = stack.currentFrame->locals.save_offset1;
- md.offset_vector[stack.currentFrame->locals.offset + 1] = stack.currentFrame->locals.save_offset2;
- md.offset_vector[md.offset_end - stack.currentFrame->locals.number] = stack.currentFrame->locals.save_offset3;
+ md.offsetVector[stack.currentFrame->locals.offset] = stack.currentFrame->locals.saveOffset1;
+ md.offsetVector[stack.currentFrame->locals.offset + 1] = stack.currentFrame->locals.saveOffset2;
+ md.offsetVector[md.offsetEnd - stack.currentFrame->locals.number] = stack.currentFrame->locals.saveOffset3;
RRETURN;
}
#endif
RETURN:
- ASSERT(is_match == MATCH_MATCH || is_match == MATCH_NOMATCH);
- return is_match;
+ ASSERT(isMatch == MATCH_MATCH || isMatch == MATCH_NOMATCH);
+ return isMatch;
}
}
}
-static bool tryRequiredByteOptimization(const UChar*& subjectPtr, const UChar* endSubject, int req_byte, int req_byte2, bool req_byte_caseless, bool hasFirstByte, const UChar*& req_byte_ptr)
+static bool tryRequiredByteOptimization(const UChar*& subjectPtr, const UChar* endSubject, int req_byte, int req_byte2, bool req_byte_caseless, bool hasFirstByte, const UChar*& reqBytePtr)
{
/* If req_byte is set, we know that that character must appear in the subject
for the match to succeed. If the first character is set, req_byte must be
/* We don't need to repeat the search if we haven't yet reached the
place we found it at last time. */
- if (p > req_byte_ptr) {
+ if (p > reqBytePtr) {
if (req_byte_caseless) {
while (p < endSubject) {
int pp = *p++;
found it, so that we don't search again next time round the loop if
the start hasn't passed this character yet. */
- req_byte_ptr = p;
+ reqBytePtr = p;
}
}
return false;
ASSERT(offsetcount >= 0);
ASSERT(offsets || offsetcount == 0);
- MatchData match_block;
- match_block.start_subject = subject;
- match_block.end_subject = match_block.start_subject + length;
- const UChar* end_subject = match_block.end_subject;
+ MatchData matchBlock;
+ matchBlock.startSubject = subject;
+ matchBlock.endSubject = matchBlock.startSubject + length;
+ const UChar* endSubject = matchBlock.endSubject;
- match_block.multiline = (re->options & MatchAcrossMultipleLinesOption);
- match_block.ignoreCase = (re->options & IgnoreCaseOption);
+ matchBlock.multiline = (re->options & MatchAcrossMultipleLinesOption);
+ matchBlock.ignoreCase = (re->options & IgnoreCaseOption);
/* If the expression has got more back references than the offsets supplied can
hold, we get a temporary chunk of working store to use during the matching.
bool using_temporary_offsets = false;
if (re->top_backref > 0 && re->top_backref >= ocount/3) {
ocount = re->top_backref * 3 + 3;
- match_block.offset_vector = new int[ocount];
- if (!match_block.offset_vector)
+ matchBlock.offsetVector = new int[ocount];
+ if (!matchBlock.offsetVector)
return JSRegExpErrorNoMemory;
using_temporary_offsets = true;
} else
- match_block.offset_vector = offsets;
+ matchBlock.offsetVector = offsets;
- match_block.offset_end = ocount;
- match_block.offset_max = (2*ocount)/3;
- match_block.offset_overflow = false;
+ matchBlock.offsetEnd = ocount;
+ matchBlock.offsetMax = (2*ocount)/3;
+ matchBlock.offsetOverflow = false;
/* Compute the minimum number of offsets that we need to reset each time. Doing
this makes a huge difference to execution time when there aren't many brackets
never be used unless previously set, but they get saved and restored, and so we
initialize them to avoid reading uninitialized locations. */
- if (match_block.offset_vector) {
- int* iptr = match_block.offset_vector + ocount;
+ if (matchBlock.offsetVector) {
+ int* iptr = matchBlock.offsetVector + ocount;
int* iend = iptr - resetcount/2 + 1;
while (--iptr >= iend)
*iptr = -1;
/* Loop for handling unanchored repeated matching attempts; for anchored regexs
the loop runs just once. */
- const UChar* start_match = subject + start_offset;
- const UChar* req_byte_ptr = start_match - 1;
+ const UChar* startMatch = subject + start_offset;
+ const UChar* reqBytePtr = startMatch - 1;
bool useMultiLineFirstCharOptimization = re->options & UseMultiLineFirstByteOptimizationOption;
do {
/* Reset the maximum number of extractions we might see. */
- if (match_block.offset_vector) {
- int* iptr = match_block.offset_vector;
+ if (matchBlock.offsetVector) {
+ int* iptr = matchBlock.offsetVector;
int* iend = iptr + resetcount;
while (iptr < iend)
*iptr++ = -1;
}
- tryFirstByteOptimization(start_match, end_subject, first_byte, first_byte_caseless, useMultiLineFirstCharOptimization, match_block.start_subject + start_offset);
- if (tryRequiredByteOptimization(start_match, end_subject, req_byte, req_byte2, req_byte_caseless, first_byte >= 0, req_byte_ptr))
+ tryFirstByteOptimization(startMatch, endSubject, first_byte, first_byte_caseless, useMultiLineFirstCharOptimization, matchBlock.startSubject + start_offset);
+ if (tryRequiredByteOptimization(startMatch, endSubject, req_byte, req_byte2, req_byte_caseless, first_byte >= 0, reqBytePtr))
break;
/* When a match occurs, substrings will be set for all internal extractions;
if certain parts of the pattern were not used. */
/* The code starts after the JSRegExp block and the capture name table. */
- const uschar* start_code = (const uschar*)(re + 1);
+ const unsigned char* start_code = (const unsigned char*)(re + 1);
- int returnCode = match(start_match, start_code, 2, match_block);
+ int returnCode = match(startMatch, start_code, 2, matchBlock);
/* When the result is no match, advance the pointer to the next character
and continue. */
if (returnCode == MATCH_NOMATCH) {
- start_match++;
+ startMatch++;
continue;
}
if (using_temporary_offsets) {
if (offsetcount >= 4) {
- memcpy(offsets + 2, match_block.offset_vector + 2, (offsetcount - 2) * sizeof(int));
+ memcpy(offsets + 2, matchBlock.offsetVector + 2, (offsetcount - 2) * sizeof(int));
DPRINTF(("Copied offsets from temporary memory\n"));
}
- if (match_block.end_offset_top > offsetcount)
- match_block.offset_overflow = true;
+ if (matchBlock.endOffsetTop > offsetcount)
+ matchBlock.offsetOverflow = true;
DPRINTF(("Freeing temporary memory\n"));
- delete [] match_block.offset_vector;
+ delete [] matchBlock.offsetVector;
}
- returnCode = match_block.offset_overflow ? 0 : match_block.end_offset_top / 2;
+ returnCode = matchBlock.offsetOverflow ? 0 : matchBlock.endOffsetTop / 2;
if (offsetcount < 2)
returnCode = 0;
else {
- offsets[0] = start_match - match_block.start_subject;
- offsets[1] = match_block.end_match_ptr - match_block.start_subject;
+ offsets[0] = startMatch - matchBlock.startSubject;
+ offsets[1] = matchBlock.endMatchPtr - matchBlock.startSubject;
}
DPRINTF((">>>> returning %d\n", rc));
return returnCode;
- } while (start_match <= end_subject);
+ } while (startMatch <= endSubject);
if (using_temporary_offsets) {
DPRINTF(("Freeing temporary memory\n"));
- delete [] match_block.offset_vector;
+ delete [] matchBlock.offsetVector;
}
DPRINTF((">>>> returning PCRE_ERROR_NOMATCH\n"));
#pragma warning(disable: 4244)
#endif
+#include "pcre.h"
+
/* The value of LINK_SIZE determines the number of bytes used to store links as
offsets within the compiled regex. The default is 2, which allows for compiled
-patterns up to 64K long. This covers the vast majority of cases. However, PCRE
-can also be compiled to use 3 or 4 bytes instead. This allows for longer
-patterns in extreme cases. On systems that support it, "configure" can be used
-to override this default. */
+patterns up to 64K long. */
#define LINK_SIZE 2
-/* The below limit restricts the number of recursive match calls in order to
-limit the maximum amount of stack (or heap, if NO_RECURSE is defined) that is used. The
-value of MATCH_LIMIT_RECURSION applies only to recursive calls of match().
-
- This limit is tied to the size of MatchFrame. Right now we allow PCRE to allocate up
- to MATCH_LIMIT_RECURSION - 16 * sizeof(MatchFrame) bytes of "stack" space before we give up.
- Currently that's 100000 - 16 * (23 * 4) ~ 90MB
- */
-
-#define MATCH_LIMIT_RECURSION 100000
-
-#define _pcre_default_tables kjs_pcre_default_tables
-#define _pcre_ord2utf8 kjs_pcre_ord2utf8
-#define _pcre_utf8_table1 kjs_pcre_utf8_table1
-#define _pcre_utf8_table2 kjs_pcre_utf8_table2
-#define _pcre_utf8_table3 kjs_pcre_utf8_table3
-#define _pcre_utf8_table4 kjs_pcre_utf8_table4
-#define _pcre_xclass kjs_pcre_xclass
-
/* Define DEBUG to get debugging output on stdout. */
#if 0
#define DPRINTF(p) /*nothing*/
#endif
-/* Standard C headers plus the external interface definition. The only time
-setjmp and stdarg are used is when NO_RECURSE is set. */
-
-#include <ctype.h>
-#include <limits.h>
-#include <setjmp.h>
-#include <stdarg.h>
-#include <stddef.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-/* Include the public PCRE header and the definitions of UCP character property
-values. */
-
-#include "pcre.h"
-
-typedef unsigned short pcre_uint16;
-typedef unsigned pcre_uint32;
-typedef unsigned char uschar;
-
/* PCRE keeps offsets in its compiled code as 2-byte quantities (always stored
in big-endian order) by default. These are used, for example, to link from the
start of a subpattern to its alternatives and its end. The use of 2 bytes per
offset limits the size of the compiled regex to around 64K, which is big enough
for almost everybody. However, I received a request for an even bigger limit.
For this reason, and also to make the code easier to maintain, the storing and
-loading of offsets from the byte string is now handled by the macros that are
-defined here.
-
-The macros are controlled by the value of LINK_SIZE. This defaults to 2 in
-the config.h file, but can be overridden by using -D on the command line. This
-is automated on Unix systems via the "configure" command. */
+loading of offsets from the byte string is now handled by the functions that are
+defined here. */
-#if LINK_SIZE == 2
-
-static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned short value)
-{
- opcodePtr[offset] = value >> 8;
- opcodePtr[offset + 1] = value & 255;
-}
+/* PCRE uses some other 2-byte quantities that do not change when the size of
+offsets changes. There are used for repeat counts and for other things such as
+capturing parenthesis numbers in back references. */
-static inline short getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
+static inline void put2ByteValue(unsigned char* opcodePtr, int value)
{
- return ((opcodePtr[offset] << 8) | opcodePtr[offset + 1]);
+ ASSERT(value >= 0 && value <= 0xFFFF);
+ opcodePtr[0] = value >> 8;
+ opcodePtr[1] = value;
}
-#define MAX_PATTERN_SIZE (1 << 16)
-
-#elif LINK_SIZE == 3
-
-static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned value)
+static inline int get2ByteValue(const unsigned char* opcodePtr)
{
- ASSERT(!(value & 0xFF000000)); // This function only allows values < 2^24
- opcodePtr[offset] = value >> 16;
- opcodePtr[offset + 1] = value >> 8;
- opcodePtr[offset + 2] = value & 255;
+ return (opcodePtr[0] << 8) | opcodePtr[1];
}
-static inline int getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
+static inline void put2ByteValueAndAdvance(unsigned char*& opcodePtr, int value)
{
- return ((opcodePtr[offset] << 16) | (opcodePtr[offset + 1] << 8) | opcodePtr[offset + 2]);
+ put2ByteValue(opcodePtr, value);
+ opcodePtr += 2;
}
-#define MAX_PATTERN_SIZE (1 << 24)
-
-#elif LINK_SIZE == 4
-
-static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned value)
+static inline void putLinkValueAllowZero(unsigned char* opcodePtr, int value)
{
- opcodePtr[offset] = value >> 24;
- opcodePtr[offset + 1] = value >> 16;
- opcodePtr[offset + 2] = value >> 8;
- opcodePtr[offset + 3] = value & 255;
+ put2ByteValue(opcodePtr, value);
}
-static inline int getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
+static inline int getLinkValueAllowZero(const unsigned char* opcodePtr)
{
- return ((opcodePtr[offset] << 24) | (opcodePtr[offset + 1] << 16) | (opcodePtr[offset + 2] << 8) | opcodePtr[offset + 3]);
+ return get2ByteValue(opcodePtr);
}
-#define MAX_PATTERN_SIZE (1 << 30) /* Keep it positive */
-
-#else
-#error LINK_SIZE must be either 2, 3, or 4
-#endif
+#define MAX_PATTERN_SIZE (1 << 16)
-static inline void putOpcodeValueAtOffsetAndAdvance(uschar*& opcodePtr, size_t offset, unsigned short value)
+static inline void putLinkValue(unsigned char* opcodePtr, int value)
{
- putOpcodeValueAtOffset(opcodePtr, offset, value);
- opcodePtr += LINK_SIZE;
+ ASSERT(value);
+ putLinkValueAllowZero(opcodePtr, value);
}
-/* PCRE uses some other 2-byte quantities that do not change when the size of
-offsets changes. There are used for repeat counts and for other things such as
-capturing parenthesis numbers in back references. */
-
-static inline void put2ByteOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned short value)
+static inline int getLinkValue(const unsigned char* opcodePtr)
{
- opcodePtr[offset] = value >> 8;
- opcodePtr[offset + 1] = value & 255;
+ int value = getLinkValueAllowZero(opcodePtr);
+ ASSERT(value);
+ return value;
}
-static inline short get2ByteOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
+static inline void putLinkValueAndAdvance(unsigned char*& opcodePtr, int value)
{
- return ((opcodePtr[offset] << 8) | opcodePtr[offset + 1]);
+ putLinkValue(opcodePtr, value);
+ opcodePtr += LINK_SIZE;
}
-static inline void put2ByteOpcodeValueAtOffsetAndAdvance(uschar*& opcodePtr, size_t offset, unsigned short value)
+static inline void putLinkValueAllowZeroAndAdvance(unsigned char*& opcodePtr, int value)
{
- put2ByteOpcodeValueAtOffset(opcodePtr, offset, value);
- opcodePtr += 2;
+ putLinkValueAllowZero(opcodePtr, value);
+ opcodePtr += LINK_SIZE;
}
// FIXME: These are really more of a "compiled regexp state" than "regexp options"
MatchAcrossMultipleLinesOption = 0x00000002
};
-/* Negative values for the firstchar and reqchar variables */
-
-#define REQ_UNSET (-2)
-#define REQ_NONE (-1)
-
-/* The maximum remaining length of subject we are prepared to search for a
-req_byte match. */
-
-#define REQ_BYTE_MAX 1000
-
/* Flags added to firstbyte or reqbyte; a "non-literal" item is either a
variable-length repeat, or a anything other than literal characters. */
macro(ASSERT) \
macro(ASSERT_NOT) \
\
- macro(ONCE) \
- \
macro(BRAZERO) \
macro(BRAMINZERO) \
macro(BRANUMBER) \
character sequences easier. */
/* The highest extraction number before we have to start using additional
-bytes. (Originally PCRE didn't have support for extraction counts highter than
+bytes. (Originally PCRE didn't have support for extraction counts higher than
this number.) The value is limited by the number of opcodes left after OP_BRA,
i.e. 255 - OP_BRA. We actually set it a bit lower to leave room for additional
opcodes. */
-#define EXTRACT_BASIC_MAX 100
-
-/* This macro defines the length of fixed length operations in the compiled
-regex. The lengths are used when searching for specific things, and also in the
-debugging printing of a compiled regex. We use a macro so that it can be
-defined close to the definitions of the opcodes themselves.
-
-As things have been extended, some of these are no longer fixed lenths, but are
-minima instead. For example, the length of a single-character repeat may vary
-in UTF-8 mode. The code that uses this table must know about such things. */
-
-#define OP_LENGTHS \
- 1, /* End */ \
- 1, 1, 1, 1, 1, 1, 1, 1, /* \B, \b, \D, \d, \S, \s, \W, \w */ \
- 1, /* Any */ \
- 1, 1, /* ^, $ */ \
- 2, 2, /* Char, Charnc - minimum lengths */ \
- 2, 2, /* ASCII char or non-cased */ \
- 2, /* not */ \
- /* Positive single-char repeats ** These are */ \
- 2, 2, 2, 2, 2, 2, /* *, *?, +, +?, ?, ?? ** minima in */ \
- 4, 4, 4, /* upto, minupto, exact ** UTF-8 mode */ \
- /* Negative single-char repeats - only for chars < 256 */ \
- 2, 2, 2, 2, 2, 2, /* NOT *, *?, +, +?, ?, ?? */ \
- 4, 4, 4, /* NOT upto, minupto, exact */ \
- /* Positive type repeats */ \
- 2, 2, 2, 2, 2, 2, /* Type *, *?, +, +?, ?, ?? */ \
- 4, 4, 4, /* Type upto, minupto, exact */ \
- /* Character class & ref repeats */ \
- 1, 1, 1, 1, 1, 1, /* *, *?, +, +?, ?, ?? */ \
- 5, 5, /* CRRANGE, CRMINRANGE */ \
- 33, /* CLASS */ \
- 33, /* NCLASS */ \
- 0, /* XCLASS - variable length */ \
- 3, /* REF */ \
- 1 + LINK_SIZE, /* Alt */ \
- 1 + LINK_SIZE, /* Ket */ \
- 1 + LINK_SIZE, /* KetRmax */ \
- 1 + LINK_SIZE, /* KetRmin */ \
- 1 + LINK_SIZE, /* Assert */ \
- 1 + LINK_SIZE, /* Assert not */ \
- 1 + LINK_SIZE, /* Once */ \
- 1, 1, /* BRAZERO, BRAMINZERO */ \
- 3, /* BRANUMBER */ \
- 1 + LINK_SIZE /* BRA */ \
+/* FIXME: Note that OP_BRA + 100 is > 128, so the two comments above
+are in conflict! */
+#define EXTRACT_BASIC_MAX 100
/* The index of names and the
code vector run on as long as necessary after the end. We store an explicit
*/
struct JSRegExp {
- pcre_uint32 options;
+ unsigned options;
- pcre_uint16 top_bracket;
- pcre_uint16 top_backref;
+ unsigned short top_bracket;
+ unsigned short top_backref;
- // jsRegExpExecute && jsRegExpCompile currently only how to handle ASCII
- // chars for thse optimizations, however it would be trivial to add support
- // for optimized UChar first_byte/req_byte scans
- pcre_uint16 first_byte;
- pcre_uint16 req_byte;
+ unsigned short first_byte;
+ unsigned short req_byte;
};
/* Internal shared data tables. These are tables that are used by more than one
but are not part of the PCRE public API. The data for these tables is in the
pcre_tables.c module. */
-#define _pcre_utf8_table1_size 6
+#define kjs_pcre_utf8_table1_size 6
-extern const int _pcre_utf8_table1[6];
-extern const int _pcre_utf8_table2[6];
-extern const int _pcre_utf8_table3[6];
-extern const uschar _pcre_utf8_table4[0x40];
+extern const int kjs_pcre_utf8_table1[6];
+extern const int kjs_pcre_utf8_table2[6];
+extern const int kjs_pcre_utf8_table3[6];
+extern const unsigned char kjs_pcre_utf8_table4[0x40];
-extern const uschar _pcre_default_tables[tables_length];
+extern const unsigned char kjs_pcre_default_tables[tables_length];
-static inline uschar toLowerCase(uschar c)
+static inline unsigned char toLowerCase(unsigned char c)
{
- static const uschar* lowerCaseChars = _pcre_default_tables + lcc_offset;
+ static const unsigned char* lowerCaseChars = kjs_pcre_default_tables + lcc_offset;
return lowerCaseChars[c];
}
-static inline uschar flipCase(uschar c)
+static inline unsigned char flipCase(unsigned char c)
{
- static const uschar* flippedCaseChars = _pcre_default_tables + fcc_offset;
+ static const unsigned char* flippedCaseChars = kjs_pcre_default_tables + fcc_offset;
return flippedCaseChars[c];
}
-static inline uschar classBitmapForChar(uschar c)
+static inline unsigned char classBitmapForChar(unsigned char c)
{
- static const uschar* charClassBitmaps = _pcre_default_tables + cbits_offset;
+ static const unsigned char* charClassBitmaps = kjs_pcre_default_tables + cbits_offset;
return charClassBitmaps[c];
}
-static inline uschar charTypeForChar(uschar c)
+static inline unsigned char charTypeForChar(unsigned char c)
{
- const uschar* charTypeMap = _pcre_default_tables + ctypes_offset;
+ const unsigned char* charTypeMap = kjs_pcre_default_tables + ctypes_offset;
return charTypeMap[c];
}
static inline bool isWordChar(UChar c)
{
- /* UTF8 Characters > 128 are assumed to be "non-word" characters. */
- return (c < 128 && (charTypeForChar(c) & ctype_word));
+ return c < 128 && (charTypeForChar(c) & ctype_word);
}
static inline bool isSpaceChar(UChar c)
{
- return (c < 128 && (charTypeForChar(c) & ctype_space));
+ return c < 128 && (charTypeForChar(c) & ctype_space);
}
-/* Internal shared functions. These are functions that are used by more than
-one of the exported public functions. They have to be "external" in the C
-sense, but are not part of the PCRE public API. */
-
-extern int _pcre_ucp_othercase(const unsigned int);
-extern bool _pcre_xclass(int, const uschar*);
-
static inline bool isNewline(UChar nl)
{
return (nl == 0xA || nl == 0xD || nl == 0x2028 || nl == 0x2029);
}
-// FIXME: It's unclear to me if this moves the opcode ptr to the start of all branches
-// or to the end of all branches -- ecs
-// FIXME: This abstraction is poor since it assumes that you want to jump based on whatever
-// the next value in the stream is, and *then* follow any OP_ALT branches.
-static inline void moveOpcodePtrPastAnyAlternateBranches(const uschar*& opcodePtr)
+static inline bool isBracketStartOpcode(unsigned char opcode)
+{
+ if (opcode >= OP_BRA)
+ return true;
+ switch (opcode) {
+ case OP_ASSERT:
+ case OP_ASSERT_NOT:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static inline void advanceToEndOfBracket(const unsigned char*& opcodePtr)
{
- do {
- opcodePtr += getOpcodeValueAtOffset(opcodePtr, 1);
- } while (*opcodePtr == OP_ALT);
+ ASSERT(isBracketStartOpcode(*opcodePtr) || *opcodePtr == OP_ALT);
+ do
+ opcodePtr += getLinkValue(opcodePtr + 1);
+ while (*opcodePtr == OP_ALT);
}
+/* Internal shared functions. These are functions that are used in more
+that one of the source files. They have to have external linkage, but
+but are not part of the public API and so not exported from the library. */
+
+extern int kjs_pcre_ucp_othercase(unsigned);
+extern bool kjs_pcre_xclass(int, const unsigned char*);
+
#endif
#endif
/* These are the breakpoints for different numbers of bytes in a UTF-8
character. */
-const int _pcre_utf8_table1[6] =
+const int kjs_pcre_utf8_table1[6] =
{ 0x7f, 0x7ff, 0xffff, 0x1fffff, 0x3ffffff, 0x7fffffff};
/* These are the indicator bits and the mask for the data bits to set in the
first byte of a character, indexed by the number of additional bytes. */
-const int _pcre_utf8_table2[6] = { 0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
-const int _pcre_utf8_table3[6] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
+const int kjs_pcre_utf8_table2[6] = { 0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
+const int kjs_pcre_utf8_table3[6] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
/* Table of the number of extra characters, indexed by the first character
masked with 0x3f. The highest number for a valid UTF-8 character is in fact
0x3d. */
-const uschar _pcre_utf8_table4[0x40] = {
+const unsigned char kjs_pcre_utf8_table4[0x40] = {
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
Returns: the other case or -1 if none
*/
-int _pcre_ucp_othercase(const unsigned c)
+int kjs_pcre_ucp_othercase(unsigned c)
{
int bot = 0;
int top = sizeof(ucp_table) / sizeof(cnode);
/* Get the next UTF-8 character, advancing the pointer. This is called when we
know we are in UTF-8 mode. */
-static inline void getUTF8CharAndAdvancePointer(int& c, const uschar*& subjectPtr)
+static inline void getUTF8CharAndAdvancePointer(int& c, const unsigned char*& subjectPtr)
{
c = *subjectPtr++;
if ((c & 0xc0) == 0xc0) {
- int gcaa = _pcre_utf8_table4[c & 0x3f]; /* Number of additional bytes */
+ int gcaa = kjs_pcre_utf8_table4[c & 0x3f]; /* Number of additional bytes */
int gcss = 6 * gcaa;
- c = (c & _pcre_utf8_table3[gcaa]) << gcss;
+ c = (c & kjs_pcre_utf8_table3[gcaa]) << gcss;
while (gcaa-- > 0) {
gcss -= 6;
c |= (*subjectPtr++ & 0x3f) << gcss;
}
}
-bool _pcre_xclass(int c, const uschar* data)
+bool kjs_pcre_xclass(int c, const unsigned char* data)
{
bool negated = (*data & XCL_NOT);
words that form a data item in the table. */
typedef struct cnode {
- pcre_uint32 f0;
- pcre_uint32 f1;
+ unsigned f0;
+ unsigned f1;
} cnode;
/* Things for the f0 field */
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline bool isASCIIAlpha(wchar_t c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
#endif
+ inline bool isASCIIAlpha(int c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
inline bool isASCIIAlphanumeric(char c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
inline bool isASCIIAlphanumeric(unsigned short c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline bool isASCIIAlphanumeric(wchar_t c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
#endif
+ inline bool isASCIIAlphanumeric(int c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
inline bool isASCIIDigit(char c) { return (c >= '0') & (c <= '9'); }
inline bool isASCIIDigit(unsigned short c) { return (c >= '0') & (c <= '9'); }
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline bool isASCIIHexDigit(wchar_t c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'f'; }
#endif
+ inline bool isASCIIHexDigit(int c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'f'; }
inline bool isASCIILower(char c) { return c >= 'a' && c <= 'z'; }
inline bool isASCIILower(unsigned short c) { return c >= 'a' && c <= 'z'; }
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline bool isASCIILower(wchar_t c) { return c >= 'a' && c <= 'z'; }
#endif
+ inline bool isASCIILower(int c) { return c >= 'a' && c <= 'z'; }
inline bool isASCIISpace(char c) { return c == '\t' || c == '\n' || c == '\v' || c =='\f' || c == '\r' || c == ' '; }
inline bool isASCIISpace(unsigned short c) { return c == '\t' || c == '\n' || c == '\v' || c =='\f' || c == '\r' || c == ' '; }
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline bool isASCIISpace(wchar_t c) { return c == '\t' || c == '\n' || c == '\v' || c =='\f' || c == '\r' || c == ' '; }
#endif
+ inline bool isASCIISpace(int c) { return c == '\t' || c == '\n' || c == '\v' || c =='\f' || c == '\r' || c == ' '; }
inline char toASCIILower(char c) { return c | ((c >= 'A' && c <= 'Z') << 5); }
inline unsigned short toASCIILower(unsigned short c) { return c | ((c >= 'A' && c <= 'Z') << 5); }
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline wchar_t toASCIILower(wchar_t c) { return c | ((c >= 'A' && c <= 'Z') << 5); }
#endif
+ inline int toASCIILower(int c) { return c | ((c >= 'A' && c <= 'Z') << 5); }
inline char toASCIIUpper(char c) { return static_cast<char>(c & ~((c >= 'a' && c <= 'z') << 5)); }
inline unsigned short toASCIIUpper(unsigned short c) { return static_cast<unsigned short>(c & ~((c >= 'a' && c <= 'z') << 5)); }
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline wchar_t toASCIIUpper(wchar_t c) { return static_cast<wchar_t>(c & ~((c >= 'a' && c <= 'z') << 5)); }
#endif
+ inline int toASCIIUpper(int c) { return static_cast<int>(c & ~((c >= 'a' && c <= 'z') << 5)); }
}