MOON
Server: Apache
System: Linux server30c.hostingraja.org 3.10.0-962.3.2.lve1.5.63.el7.x86_64 #1 SMP Fri Oct 8 12:03:35 UTC 2021 x86_64
User: jibhires (1887)
PHP: 8.1.30
Disabled: show_source, system, shell_exec, passthru, exec, popen, proc_open, allow_url_fopen, symlink, escapeshellcmd, pcntl_exec
Upload Files
File: //opt/alt/python35/lib/python3.5/site-packages/chardet/__pycache__/charsetprober.cpython-35.pyc


<_9Y��@sBddlZddlZddlmZGdd�de�ZdS)�N�)�ProbingStatec@s�eZdZdZddd�Zdd�Zedd��Zd	d
�Zedd��Z	d
d�Z
edd��Zedd��Z
edd��ZdS)�
CharSetProbergffffff�?NcCs(d|_||_tjt�|_dS)N)�_state�lang_filter�logging�	getLogger�__name__�logger)�selfr�r�/charsetprober.py�__init__'s		zCharSetProber.__init__cCstj|_dS)N)rZ	DETECTINGr)rrrr
�reset,szCharSetProber.resetcCsdS)Nr)rrrr
�charset_name/szCharSetProber.charset_namecCsdS)Nr)r�bufrrr
�feed3szCharSetProber.feedcCs|jS)N)r)rrrr
�state6szCharSetProber.statecCsdS)Ngr)rrrr
�get_confidence:szCharSetProber.get_confidencecCstjdd|�}|S)Ns([-])+� )�re�sub)rrrr
�filter_high_byte_only=sz#CharSetProber.filter_high_byte_onlycCs�t�}tjd|�}xa|D]Y}|j|dd��|dd�}|j�rn|dkrnd}|j|�q"W|S)u9
        We define three types of bytes:
        alphabet: english alphabets [a-zA-Z]
        international: international characters [€-ÿ]
        marker: everything else [^a-zA-Z€-ÿ]

        The input buffer can be thought to contain a series of words delimited
        by markers. This function works to filter all words that contain at
        least one international character. All contiguous sequences of markers
        are replaced by a single space ascii character.

        This filter applies to all scripts which do not use English characters.
        s%[a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?Nrs�r���r)�	bytearrayr�findall�extend�isalpha)r�filteredZwordsZwordZ	last_charrrr
�filter_international_wordsBs			
z(CharSetProber.filter_international_wordscCs�t�}d}d}x�tt|��D]�}|||d�}|dkrWd}n|dkrid}|dkr(|j�r(||kr�|r�|j|||��|jd�|d}q(W|s�|j||d	��|S)
a�
        Returns a copy of ``buf`` that retains only the sequences of English
        alphabet and high byte characters that are not between <> characters.
        Also retains English alphabet and high byte characters immediately
        before occurrences of >.

        This filter can be applied to all scripts which contain both English
        characters and extended ASCII characters, but is currently only used by
        ``Latin1Prober``.
        Frr�>�<Ts�rN)r�range�lenrr)rrZin_tag�prevZcurrZbuf_charrrr
�filter_with_english_lettersgs"		
z)CharSetProber.filter_with_english_letters)r	�
__module__�__qualname__ZSHORTCUT_THRESHOLDrr�propertyrrrr�staticmethodrrr%rrrr
r#s%r)rrZenumsr�objectrrrrr
�<module>s