단어 카운트 - 첫번째에 걸리는 것만..

상황 :

log 파일이 있다고 가정 해 보자.

내가 원하는 단어가 이 파일안에 전체 몇개가 있는지 또는 패턴으로 이루어진 로그 파일이기때문에 첫 단어별로 카운트를 셀 수 도 있을 것이다.

고로 여기에서 해보고자 하는 것은 로그 파일을 분석하기 위해 단어별로 카운트를 세어보려고 한다.

[파일 내용 : 대충 어디서 긁어왔다..]

다운로드 :

[org.mybatis.spring.SqlSessionUtils][ INFO] - Creating a new SqlSession

[org.mybatis.spring.SqlSessionUtils][ INFO] - SqlSession [org.apache.ibatis.session.defaults.DefaultSqlSession@4ccdd1f] was not registered for synchronization because synchronization is not active

[org.springframework.jdbc.datasource.DataSourceUtils][DEBUG] - Fetching JDBC Connection from DataSource

[org.mybatis.spring.transaction.SpringManagedTransaction][DEBUG] - JDBC Connection [ProxyConnection[PooledConnection[com.mysql.jdbc.JDBC4Connection@3cfde82]]] will not be managed by Spring

[java.sql.Connection][DEBUG] - ooo Using Connection [ProxyConnection[PooledConnection[com.mysql.jdbc.JDBC4Connection@3cfde82]]]

[java.sql.Connection][DEBUG] - ==> Preparing: SELECT col FROM table WHERE col1=? AND col2=?

[java.sql.PreparedStatement][DEBUG] - ==> Parameters: 93(Integer), 4(Integer)

[org.mybatis.spring.SqlSessionUtils][DEBUG] - Closing non transactional SqlSession [org.apache.ibatis.session.defaults.DefaultSqlSession@4ccdd1f]

[org.springframework.jdbc.datasource.DataSourceUtils][DEBUG] - Returning JDBC Connect

[ 첫 단어 걸리는데로 카운트 세기]

package kr.pe.acet.wordCount;

import static org.junit.Assert.*;

import org.junit.Test;

import static org.junit.Assert.*;

import java.io.BufferedReader;

import java.io.FileNotFoundException;

import java.io.FileReader;

import java.io.IOException;

import org.junit.Test;

public class WordCountFileVerTest {

private static final String FPATH="D:\\";

//private static final String FNAME="cuiSvr21-kt-log.log.2014-03-11-09";

private static final String FNAME="test.log";

//private static int cnt=0;

private BufferedReader br;

private String[] word;

@Test

public void wordCountTest() {

String readStr="";

word = new String[]{"SqlSessionUtils",

"[ INFO]",

"[DEBUG]",

"SchedulingAllocSyncJob",

"Preparing",

"WMSCDLOG-dsScheduleData"

};

try {

//br = new BufferedReader(new FileReader(FPATH+"\\"+FNAME));

for(int i=0; i < word.length; i++){

br = new BufferedReader(new FileReader(FPATH+"\\"+FNAME));

int cnt = 0;

//br.mark(17);

while((readStr = br.readLine()) != null){

int checkkeyWord = readStr.indexOf(word[i]);

if(checkkeyWord > -1){

cnt++;

//System.out.println("readStr :"+readStr);

}

//br.reset();

System.out.println("word :"+word[i]+" / wordCntResult : "+cnt);

br.close();

}

} catch (FileNotFoundException e) {

// TODO Auto-generated catch block

e.printStackTrace();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

} finally{

try {

br.close();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

br.mark(17); br.reset(); 는 셋트로 사용이 되어지는데..잘 사용하지 않으면 아래와 같이 오류가 난다.

error : java io ioexception stream not marked reset

그래서 그냥 close()한 뒤에 다시 파일을 읽어들이는게 편하겠다.

[결 과]

word :SqlSessionUtils / wordCntResult : 3

word :[ INFO] / wordCntResult : 2

word :[DEBUG] / wordCntResult : 7

word :SchedulingAllocSyncJob / wordCntResult : 0

word :Preparing / wordCntResult : 1

word :WMSCDLOG-dsScheduleData / wordCntResult : 0

다른 log 파일로 돌려보니..ㅎㅎㅎ 아래와 같네요.

word :SqlSessionUtils / wordCntResult : 73266

word :[ INFO] / wordCntResult : 189739

word :[DEBUG] / wordCntResult : 277613

word :SchedulingAllocSyncJob / wordCntResult : 360

word :Preparing / wordCntResult : 23445

word :WMSCDLOG-dsScheduleData / wordCntResult : 11

전체 문자열 카운트는..

"boyer moore 알고리즘"을 보면 좋을 것 같다.

좋은 알고리즘 책을 아신다면..추천 좀 해주세요~~

- END -

저작자표시 비영리 변경금지 (새창열림)

'Language > Java' 카테고리의 다른 글

[Java] JVM 메모리 구조 (0)	2014.05.15
sort 관련(vo) (0)	2014.03.19
[Ace-T의 기초튼튼] for문 잘 알고 쓰자 (0)	2013.12.16
java 정규표현식 - String의 숫자문자 검증 (0)	2013.09.12
[Eclipse] comment 자동으로 생성하기 (0)	2013.08.23

Developer 태하팍

단어 카운트 - 첫번째에 걸리는 것만..

'Language > Java' 카테고리의 다른 글

티스토리툴바

단어 카운트 - 첫번째에 걸리는 것만..

'Language > Java' 카테고리의 다른 글

관련글

티스토리툴바