문자열이 파이썬에서 한 문자로 구성되어 있는지 효율적으로 확인

code

문자열이 파이썬에서 한 문자로 구성되어 있는지 효율적으로 확인

codestyles 2020. 11. 29. 11:44

문자열이 파이썬에서 한 문자로 구성되어 있는지 효율적으로 확인

s파이썬 의 문자열 이 한 문자로만 구성되어 있는지 확인하는 효율적인 방법은 무엇입니까 'A'? 다음과 같이 all_equal(s, 'A')동작하는 것 :

all_equal("AAAAA", "A") = True

all_equal("AAAAAAAAAAA", "A") = True

all_equal("AAAAAfAAAAA", "A") = False

비효율적으로 보이는 두 가지 방법은 먼저 문자열을 목록으로 변환하고 각 요소를 확인하거나 두 번째로 정규식을 사용하는 것입니다. 더 효율적인 방법이 있습니까, 아니면 이것이 파이썬에서 할 수있는 최선의 방법입니까? 감사.

이것은 훨씬 더 빠르고 몇 배나 더 빠르며, count()그 훌륭한 mgilson의 타이밍 제품군을 사용하면됩니다 .

s == len(s) * s[0]

여기서 모든 검사는 Python C 코드 내에서 수행됩니다.

len (s) 문자를 할당합니다.
첫 번째 문자로 공간을 채 웁니다.
두 문자열을 비교합니다.

문자열이 길수록 시간 보너스가 커집니다. 그러나 mgilson이 작성하는 것처럼 문자열의 사본을 생성하므로 문자열 길이가 수백만 개의 기호 인 경우 문제가 될 수 있습니다.

타이밍 결과에서 알 수 있듯이 일반적으로 작업을 해결하는 가장 빠른 방법은 각 기호에 대해 Python 코드를 실행하지 않습니다. 그러나 set()솔루션은 Python 라이브러리의 C 코드 내에서 모든 작업을 수행하지만 Python 개체 인터페이스를 통한 문자열 운영으로 인해 여전히 느립니다.

UPD : 빈 문자열 대소 문자 관련. 그것으로 무엇을 할 것인가는 작업에 크게 좌우됩니다. 작업이 "문자열의 모든 기호가 동일한 지 확인"이면 s == len(s) * s[0]유효한 대답입니다 (기호 없음은 오류를 의미하고 예외는 괜찮음). 작업이 "정확히 하나의 고유 한 기호가 있는지 확인"인 경우 빈 문자열은 False를 제공해야하며 대답은 s and s == len(s) * s[0]이거나 bool(s) and s == len(s) * s[0]부울 값 수신을 선호하는 경우입니다. 마지막으로 작업을 "다른 기호가 없는지 확인"으로 이해하면 빈 문자열의 결과는 True이고 대답은 not s or s == len(s) * s[0]입니다.

>>> s = 'AAAAAAAAAAAAAAAAAAA'
>>> s.count(s[0]) == len(s)
True

이것은 단락이 아닙니다. 단락을 수행하는 버전은 다음과 같습니다.

>>> all(x == s[0] for x in s)
True

그러나 최적화 된 C 구현으로 인해 비 단락 버전이 일부 문자열 (크기 등에 따라 다름)에서 더 잘 수행 될 것이라고 생각합니다.

다음 timeit은 게시 된 다른 옵션 중 일부를 테스트 하는 간단한 스크립트입니다.

import timeit
import re

def test_regex(s,regex=re.compile(r'^(.)\1*$')):
    return bool(regex.match(s))

def test_all(s):
    return all(x == s[0] for x in s)

def test_count(s):
    return s.count(s[0]) == len(s)

def test_set(s):
    return len(set(s)) == 1

def test_replace(s):
    return not s.replace(s[0],'')

def test_translate(s):
    return not s.translate(None,s[0])

def test_strmul(s):
    return s == s[0]*len(s)

tests = ('test_all','test_count','test_set','test_replace','test_translate','test_strmul','test_regex')

print "WITH ALL EQUAL"
for test in tests:
    print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="AAAAAAAAAAAAAAAAA"'%test)
    if globals()[test]("AAAAAAAAAAAAAAAAA") != True:
        print globals()[test]("AAAAAAAAAAAAAAAAA")
        raise AssertionError

print
print "WITH FIRST NON-EQUAL"
for test in tests:
    print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="FAAAAAAAAAAAAAAAA"'%test)
    if globals()[test]("FAAAAAAAAAAAAAAAA") != False:
        print globals()[test]("FAAAAAAAAAAAAAAAA")
        raise AssertionError

내 컴퓨터에서 (OS-X 10.5.8, core2duo, python2.7.3)이 고안 (짧은) 문자열은 str.count담배 set와 all및 비트 str.replace조금에 의해, 그러나에 의해 예리하게되고 str.translate그리고 strmul좋은 마진에 의해 리드 현재 :

WITH ALL EQUAL
test_all 5.83863711357
test_count 0.947771072388
test_set 2.01028490067
test_replace 1.24682998657
test_translate 0.941282987595
test_strmul 0.629556179047
test_regex 2.52913498878

WITH FIRST NON-EQUAL
test_all 2.41147494316
test_count 0.942595005035
test_set 2.00480484962
test_replace 0.960338115692
test_translate 0.924381017685
test_strmul 0.622269153595
test_regex 1.36632800102

타이밍은 다른 시스템과 다른 스트링에 따라 약간 (또는 훨씬 더) 다를 수 있으므로 통과하려는 실제 스트링으로 살펴볼 가치가 있습니다.

Eventually, if you hit the best case for all enough, and your strings are long enough, you might want to consider that one. It's a better algorithm ... I would avoid the set solution though as I don't see any case where it could possibly beat out the count solution.

If memory could be an issue, you'll need to avoid str.translate, str.replace and strmul as those create a second string, but this isn't usually a concern these days.

You could convert to a set and check there is only one member:

len(set("AAAAAAAA"))

Try using the built-in function all:

all(c == 'A' for c in s)

Adding another solution to this problem

>>> not "AAAAAA".translate(None,"A")
True

If you need to check if all the characters in the string are same and is equal to a given character, you need to remove all duplicates and check if the final result equals the single character.

>>> set("AAAAA") == set("A")
True

In case you desire to find if there is any duplicate, just check the length

>>> len(set("AAAAA")) == 1
True

Interesting answers so far. Here's another:

flag = True
for c in 'AAAAAAAfAAAA':
    if not c == 'A': 
        flag = False
        break

The only advantage I can think of to mine is that it doesn't need to traverse the entire string if it finds an inconsistent character.

not len("AAAAAAAAA".replace('A', ''))

참고URL : https://stackoverflow.com/questions/14320909/efficiently-checking-that-string-consists-of-one-character-in-python

'code' 카테고리의 다른 글

Visual Studio 배포 프로젝트-배포 된 실행 파일에 대한 바로 가기 만들기 (0)	2020.11.29
'동적 표현식을 컴파일하는 데 필요한 하나 이상의 유형을 찾을 수 없습니다.'라는 메시지가 표시되는 이유는 무엇입니까? (0)	2020.11.29
pip를 사용하여 pylibmc를 설치할 때 오류 발생 (0)	2020.11.29
Visual Studio 패키지 관리자 콘솔의 바로 가기 키? (0)	2020.11.29
jQuery로 모든 ID를 얻는 방법은 무엇입니까? (0)	2020.11.29

현재글문자열이 파이썬에서 한 문자로 구성되어 있는지 효율적으로 확인

codestyle

문자열이 파이썬에서 한 문자로 구성되어 있는지 효율적으로 확인

문자열이 파이썬에서 한 문자로 구성되어 있는지 효율적으로 확인

'code' 카테고리의 다른 글

'code'의 다른글

티스토리툴바

문자열이 파이썬에서 한 문자로 구성되어 있는지 효율적으로 확인

문자열이 파이썬에서 한 문자로 구성되어 있는지 효율적으로 확인

'code' 카테고리의 다른 글

'code'의 다른글

관련글

티스토리툴바