Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex with a back-reference to a positive look-behind fails to match #334

Open
GoogleCodeExporter opened this issue Aug 18, 2015 · 6 comments

Comments

@GoogleCodeExporter
Copy link

With the text:

    AbbbAc

This regular expression should match "bbbAc":

    /\v(A)@<=b+\1c

But it matches nothing.

Other regular expressions that do work are

    /\v(A)@<=b+\1       " to capture "bbbA"
    /\v(A)@<=b+(\1c)@=  " to capture "bbb"
    /\v(A)b+(\1c)       " to capture "AbbAc"

What version of the product are you using? On what operating system?

Windows 8.1 64-bit
VIM - Vi IMproved 7.4 (2013 Aug 10, compiled Aug 10 2013 14:33:40)
MS-Windows 32-bit console version
See attached for more of the report.

Original issue reported on code.google.com by michae...@google.com on 20 Feb 2015 at 10:39

Attachments:

@GoogleCodeExporter
Copy link
Author

This is documented; the second part matches first, so you need to define the 
group there. See :help /\@<=

I'm frankly surprised the three working examples you give actually work.

Original comment by fritzoph...@gmail.com on 23 Feb 2015 at 5:01

@GoogleCodeExporter
Copy link
Author

Interesting. The documentation doesn't specify with which engine (or both) 
referencing a group from inside the preceding atom shouldn't work. And :h \#= 
makes it sound like the new engine supports only a subset of what the old 
engine supports, so maybe my 3 working examples illustrate the real bug here? 
At the very least the discrepancy is confusing to somebody new to vim (ie me).

Original comment by michae...@google.com on 23 Feb 2015 at 5:21

@GoogleCodeExporter
Copy link
Author

@Ben, you probably refer to this:

>   In the old regexp engine the part of the pattern after "\@<=" and
>   "\@<!" are checked for a match first, thus things like "\1" don't work
>   to reference \(\) inside the preceding atom.  It does work the other
>   way around:

However, this bug is with the default / new NFA-based regexp engine, which 
doesn't have this odd quirk:

>   However, the new regexp engine works differently [...]

In fact, by swapping the capturing group and reference and switching to the old 
engine, this then works. So, a clear indication of an inconsistency and bug.

Original comment by sw...@ingo-karkat.de on 23 Feb 2015 at 7:46

@GoogleCodeExporter
Copy link
Author

Oh, I guess my documentation was out of date. Disregard my #2, then.

This issue still repros in 7.4.638 which has the updated documentation.

Original comment by michae...@google.com on 23 Feb 2015 at 8:38

@GoogleCodeExporter
Copy link
Author

@ingo, yes, that's the help text I was referring to. I had missed the "however, 
the new regexp engine works differently..." text as you suspected.

However, I *don't* need to switch engines to see the pattern match, when I swap 
the capture group and reference. I *can't* get the unswapped pattern to match 
regardless of the regexpengine setting. So does the new engine have this quirk 
after all, in some situations?

Original comment by fritzoph...@gmail.com on 23 Feb 2015 at 2:39

@GoogleCodeExporter
Copy link
Author

@Ben, I see that as well (and don't understand why). Maybe the reduced example 
here is just bad; the original problem was more complex:

<div>Test div</div>More words
     ^^^^^^^^^^^^^^
This works works but leaves off the trailing >:

/\v%(\<(\w+)\>)@<=.*\<\/\1

So I'd expect this to work, but it captures nothing:

/\v%(\<(\w+)\>)@<=.*\<\/\1\>

Original comment by sw...@ingo-karkat.de on 23 Feb 2015 at 3:31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants