
由网友(尸体派对)分享简介:我怎样写一个前pression匹配相同的字符(或理想,同组)的确切的N次重复?基本上, \ 1 {N-1} 没有,但有一个重要的限制():前pression应该失败,如果这个问题被重复的更多的比N倍。例如,给定 N = 4 和字符串 xxaaaayyybbbbbzzccccxx ,前pressions应该与 AAAA...

我怎样写一个前pression匹配相同的字符(或理想,同组)的确切的N次重复?基本上, 1 {N-1} 没有,但有一个重要的限制():前pression应该失败,如果这个问题被重复的更多的比N倍。例如,给定 N = 4 和字符串 xxaaaayyybbbbbzzccccxx ,前pressions应该与 AAAA 中交,而不是 BBBB

How do I write an expression that matches exactly N repetitions of the same character (or, ideally, the same group)? Basically, what (.)1{N-1} does, but with one important limitation: the expression should fail if the subject is repeated more than N times. For example, given N=4 and the string xxaaaayyybbbbbzzccccxx, the expressions should match aaaa and cccc and not bbbb.


I'm not focused on any specific dialect, feel free to use any language. Please do not post code that works for this specific example only, I'm looking for a general solution.



Use negative lookahead and negative lookbehind.

这将是正则表达式:();(?! 1)(?!&LT 1) 1 {N-1} 除了Python的re模块坏了(见此链接)。

This would be the regex: (.)(?<!1.)1{N-1}(?!1) except that Python's re module is broken (see this link).


English translation: "Match any character. Make sure that after you match that character, the character before it isn't also that character. Match N-1 more repetitions of that character. Make sure that the character after those repetitions is not also that character."


Unfortunately, the re module (and most regular expression engines) are broken, in that you can't use backreferences in a lookbehind assertion. Lookbehind assertions are required to be constant length, and the compilers aren't smart enough to infer that it is when a backreference is used (even though, like in this case, the backref is of constant length). We have to handhold the regex compiler through this, as so:

在实际的答案将不得不梅西耶:(。) R(?≤(= 1)..!?) 1 {N-1 }(?! 1)

The actual answer will have to be messier: r"(.)(?<!(?=1)..)1{N-1}(?!1)"

本使用作品围绕re模块中的错误(?= 1).. 而不是 1。(这些都是等效的大部分时间。)这允许正则表达式引擎知道向后断言完全相同的宽度,所以它在PCRE和重新等

This works around that bug in the re module by using (?=1).. instead of 1. (these are equivalent most of the time.) This lets the regex engine know exactly the width of the lookbehind assertion, so it works in PCRE and re and so on.

当然,现实世界的解决方案是像 [x.group()对于x的re.finditer(R(。) 1 *,xxaaaayyybbbbbzzccccxx)如果len( x.group())== 4]

Of course, a real-world solution is something like [x.group() for x in re.finditer(r"(.)1*", "xxaaaayyybbbbbzzccccxx") if len(x.group()) == 4]


