EchoPoint
1.0

echopoint.util
Class HtmlBodyExtractor

java.lang.Object
  extended byechopoint.util.HtmlBodyExtractor

public class HtmlBodyExtractor
extends java.lang.Object

This class will read a File or String that contains HTML and return the BODY contents as well as an array of Strings containing CSS links and Styles.

This class can be used to obtain the "useable" HTML within a .html file so it can be safely embedded within the body of another HTML document.

If the HTML and BODY tags are not encountered during the parsing of the stage, then the whole of the input will be captured as safe content text.

To obtain the safe HTML text called the getHtmlText() method.


Field Summary
protected  java.util.List cssList
           
protected  java.lang.String htmlText
           
protected  java.io.File srcFile
           
protected  java.lang.String srcString
           
 
Constructor Summary
HtmlBodyExtractor()
          Constructs an empty HtmlBodyExtractor.
HtmlBodyExtractor(java.io.File newHtmlFile)
          Constructs a HtmlBodyExtractor that reads the contents of the file
HtmlBodyExtractor(java.lang.String newHtmText)
          Constructs a HtmlBodyExtractor that reads the contents of the string
 
Method Summary
 java.util.List getCssList()
          Returns a list of Strings that represent external CSS LINKs files from the HTML as well as STYLE tag text.
 java.lang.String getHtmlText()
          Returns the Html BODY content text.
 void setHtmlText(java.io.File newHtmlFile)
          Sets in the File containing the HTML Text to be parsed.
 void setHtmlText(java.lang.String newHtmlText)
          Sets in the String containing the HTML Text to be parsed.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

cssList

protected java.util.List cssList

htmlText

protected java.lang.String htmlText

srcFile

protected java.io.File srcFile

srcString

protected java.lang.String srcString
Constructor Detail

HtmlBodyExtractor

public HtmlBodyExtractor()
Constructs an empty HtmlBodyExtractor.


HtmlBodyExtractor

public HtmlBodyExtractor(java.io.File newHtmlFile)
                  throws java.io.IOException
Constructs a HtmlBodyExtractor that reads the contents of the file


HtmlBodyExtractor

public HtmlBodyExtractor(java.lang.String newHtmText)
                  throws java.io.IOException
Constructs a HtmlBodyExtractor that reads the contents of the string

Method Detail

getCssList

public java.util.List getCssList()
Returns a list of Strings that represent external CSS LINKs files from the HTML as well as STYLE tag text. These are the CSS releated tags that occurred in the HEAD of the document.

Returns:
java.util.List

getHtmlText

public java.lang.String getHtmlText()
Returns the Html BODY content text.

Returns:
java.lang.String

setHtmlText

public void setHtmlText(java.io.File newHtmlFile)
                 throws java.io.IOException
Sets in the File containing the HTML Text to be parsed.

Parameters:
newHtmlFile - java.io.File to be processed
Throws:
java.io.IOException

setHtmlText

public void setHtmlText(java.lang.String newHtmlText)
                 throws java.io.IOException
Sets in the String containing the HTML Text to be parsed.

Parameters:
newHtmlText - java.lang.String
Throws:
java.io.IOException

EchoPoint
1.0