code

Linux 명령 : 텍스트 파일 만 '찾기'방법은 무엇입니까?

codestyles 2020. 9. 5. 09:58
반응형

Linux 명령 : 텍스트 파일 만 '찾기'방법은 무엇입니까?


Google에서 몇 번 검색 한 후 다음과 같은 결과를 얻었습니다.

find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text

매우 불편하고 MIME 유형 정보와 같은 불필요한 텍스트를 출력합니다. 더 나은 솔루션이 있습니까? 검색해야하는 많은 텍스트 파일과 함께 동일한 폴더에 많은 이미지와 기타 바이너리 파일이 있습니다.


나는 이것이 오래된 스레드라는 것을 알고 있지만 그것을 우연히 발견하고 find바이너리가 아닌 파일 만 찾는 데 사용하는 매우 빠른 방법 인 내 방법을 공유 할 것이라고 생각했습니다 .

find . -type f -exec grep -Iq . {} \; -print

-I그렙에 대한 옵션은 즉시 바이너리 파일과 무시하도록 지시 .과 함께 옵션을 -q즉시 확인이 매우 빠르게 진행 있도록 텍스트 파일을 일치합니다. 공백이 염려되는 경우 -print를 a -print0변경할 수 있습니다 xargs -0(팁 @ lucas.werkmeister!).

또한 첫 번째 점은 findOS X와 ​​같은 특정 BSD 버전에만 필요 하지만 별칭이나 무언가에 넣으려는 경우 항상 거기에 두는 것만으로도 손상되지 않습니다.

편집 : @ruslan이 올바르게 지적했듯이은 암시 적이므로 -and생략 할 수 있습니다.


이 SO 질문을 바탕으로 :

grep -rIl "needle text" my_folder


왜 불편한가요? 자주 사용해야하고 매번 타자하고 싶지 않은 경우 bash 함수를 정의하면됩니다.

function findTextInAsciiFiles {
    # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
    find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text
}

그것을 넣고 .bashrc실행하십시오.

findTextInAsciiFiles your_folder "needle text"

당신이 원할 때마다.


편집은 영업 이익의 편집을 반영하기 :

MIME 정보를 잘라내려면 MIME 정보를 필터링하는 파이프 라인에 추가 단계를 추가하면됩니다. 이것은 앞에 오는 것만 취함으로써 트릭을 수행해야합니다 :: cut -d':' -f1:

function findTextInAsciiFiles {
    # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
    find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text | cut -d ':' -f1
}

find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"

이것은 안타깝게도 공간 절약이 아닙니다. 이것을 bash 스크립트에 넣으면 조금 더 쉬워집니다.

이것은 공간 안전입니다.

#!/bin/bash
#if [ ! "$1" ] ; then
    echo "Usage: $0 <search>";
    exit
fi

find . -type f -print0 \
  | xargs -0 file \
  | grep -P text \
  | cut -d: -f1 \
  | xargs -i% grep -Pil "$1" "%"

이건 어때요:

$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'

파일 유형없이 파일 이름을 원하면 최종 sed필터를 추가하기 만하면 됩니다.

$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'

-e 'type'마지막 grep명령에 더 많은 옵션을 추가하여 불필요한 파일 유형을 필터링 할 수 있습니다 .

편집하다:

If your xargs version supports the -d option, the commands above become simpler:

$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'

Here's how I've done it ...

1 . make a small script to test if a file is plain text istext:

#!/bin/bash
[[ "$(file -bi $1)" == *"file"* ]]

2 . use find as before

find . -type f -exec istext {} \; -exec grep -nHi mystring {} \;

I have two issues with histumness' answer:

  • It only list text files. It does not actually search them as requested. To actually search, use

    find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
    
  • It spawns a grep process for every file, which is very slow. A better solution is then

    find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
    

    or simply

    find . -type f -print0 | xargs -0 grep -I "needle text"
    

    This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.

Also, nobody cited ag, the Silver Searcher or ack-grep¸as alternatives. If one of these are available, they are much better alternatives:

ag -t "needle text"    # Much faster than ack
ack -t "needle text"   # or ack-grep

As a last note, beware of false positives (binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.


Although it is an old question, I think this info bellow will add to the quality of the answers here.

When ignoring files with the executable bit set, I just use this command:

find . ! -perm -111

To keep it from recursively enter into other directories:

find . -maxdepth 1 ! -perm -111

No need for pipes to mix lots of commands, just the powerful plain find command.

  • Disclaimer: it is not exactly what OP asked, because it doesn't check if the file is binary or not. It will, for example, filter out bash script files, that are text themselves but have the executable bit set.

That said, I hope this is useful to anyone.


Another way of doing this:

# find . |xargs file {} \; |grep "ASCII text"

If you want empty files too:

#  find . |xargs file {} \; |egrep "ASCII text|empty"

I do it this way: 1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:

find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &

2) create a function in .bashrc:

findex() {
    cat ~/.src_list | xargs grep "$*" 2>/dev/null
}

Then I can use below command to do the search:

findex "needle text"

HTH:)


I prefer xargs

find . -type f | xargs grep -I "needle text"

if your filenames are weird look up using the -0 options:

find . -type f -print0 | xargs -0 grep -I "needle text"

  • bash example to serach text "eth0" in /etc in all text/ascii files

grep eth0 $(find /etc/ -type f -exec file {} \; | egrep -i "text|ascii" | cut -d ':' -f1)


Here's a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.

If you were to write out the problem in steps, it would look like this:

// For every file in this directory
// Check the filetype
// If it's an ASCII file, then print out the filename

To achieve this, we can use three UNIX commands: find, file, and grep.

find will check every file in the directory.

file will give us the filetype. In our case, we're looking for a return of 'ASCII text'

grep will look for the keyword 'ASCII' in the output from file

So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).

find ./ -exec file {} ";" | grep 'ASCII'

Looks complicated, but not bad when we break it down:

find ./ = look through every file in this directory. The find command prints out the filename of any file that matches the 'expression', or whatever comes after the path, which in our case is the current directory or ./

The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.

-exec = this flag is an option within the find command that allows us to use the result of some other command as the search expression. It's like calling a function within a function.

file {} = the command being called inside of find. The file command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt. In our case, we want it to use whatever file is being looked at by the find command, so we put in the curly braces {} to act as an empty variable, or parameter. In other words, we're just asking for the system to output a string for every file in the directory.

";" = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for 'find' for more explanation if you need it by running man find.

| grep 'ASCII' = | is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find command (a string that is the filetype of a single file) and tests it to see if it contains the string 'ASCII'. If it does, it returns true.

NOW, the expression to the right of find ./ will return true when the grep command returns true. Voila.


If you are interested in finding any file type by their magic bytes using the awesome file utility combined with power of find, this can come in handy:

$ # Let's make some test files
$ mkdir ASCII-finder
$ cd ASCII-finder
$ dd if=/dev/urandom of=binary.file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.009023 s, 116 MB/s
$ file binary.file
binary.file: data
$ echo 123 > text.txt
$ # Let the magic begin
$ find -type f -print0 | \
    xargs -0 -I @@ bash -c 'file "$@" | grep ASCII &>/dev/null && echo "file is ASCII: $@"' -- @@

Output:

file is ASCII: ./text.txt

Legend: $ is the interactive shell prompt where we enter our commands

You can modify the part after && to call some other script or do some other stuff inline as well, i.e. if that file contains given string, cat the entire file or look for a secondary string in it.

Explanation:

  • find items that are files
  • Make xargs feed each item as a line into one liner bash command/script
  • file checks type of file by magic byte, grep checks if ASCII exists, if so, then after && your next command executes.
  • find prints results null separated, this is good to escape filenames with spaces and meta-characters in it.
  • xargs , using -0 option, reads them null separated, -I @@ takes each record and uses as positional parameter/args to bash script.
  • --for bash는 그것이 bash 옵션으로 해석 될 수 있는 -like -c시작하는 경우에도 인수 인 뒤에 오는 모든 것을 보장합니다.

ASCII 이외의 유형을 찾으려면 다음 grep ASCII과 같이 다른 유형으로 바꾸십시오.grep "PDF document, version 1.4"


이건 어때요

 find . -type f|xargs grep "needle text"

참고 URL : https://stackoverflow.com/questions/4767396/linux-command-how-to-find-only-text-files

반응형