Screen Scraping ASP.NET Application - Prof. John C. Sandvig, Study notes of Introduction to Business Management

An asp.net application code example for screen scraping using regular expressions. The application allows users to input a url and a search pattern, then it scrapes the html content and returns the matched results. The document also includes a c# class named screenscrape with methods for getting html content from a url and parsing it using a regular expression.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-7c0
koofers-user-7c0 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ScreenScrape.aspx
<%@ Page Language="C#" ValidateRequest="false" %>
<%@ Import Namespace="System.Net" %>
<%@ Import Namespace="System.IO" %>
<script runat="server">
void scrape_click(object sender, EventArgs e)
{
ScreenScrape myScrape = new ScreenScrape();
lblMessage.Text = "";
string HTML = "";
try
{
//Get the HTML for the page
HTML = Server.HtmlDecode(myScrape.GetHTML(tbURL.Text));
}
catch (Exception ex)
{
lblMessage.Text = ex.Message;
return;
}
if (rblDisplay.SelectedValue == "regex")
{
//Scrape using regular expressions
try
{
lblOutput.Text = myScrape.ParseHTML(HTML, tbPattern.Text);
}
catch (Exception ex)
{
lblMessage.Text += "Error in pattern: " + ex.Message;
}
//add <br> tags at end of lines to improve readability
lblOutput.Text = lblOutput.Text.Replace("\n", "<br>\n");
lblCount.Text = myScrape.ItemCount + " items found";
}
else
{
//display the original html
lblOutput.Text = HTML;
lblCount.Text = HTML.Length + " characters scraped";
}
if (cbEncodeHTML.Checked)
{
lblOutput.Text = Server.HtmlEncode(lblOutput.Text);
lblOutput.Text = lblOutput.Text.Replace("\n", "<br>\n");
}
}
</script>
<html>
<head>
<title>Screen Scraping</title>
</head>
<body>
<form runat="server">
<center>
<h3>
Simple Screen Scraper</h3>
<hr />
Target Page URL:<br />
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
pf3

Partial preview of the text

Download Screen Scraping ASP.NET Application - Prof. John C. Sandvig and more Study notes Introduction to Business Management in PDF only on Docsity!

ScreenScrape.aspx

<%@ Page Language="C#" ValidateRequest="false" %> <%@ Import Namespace="System.Net" %> <%@ Import Namespace="System.IO" %>

Screen Scraping

Simple Screen Scraper

Target Page URL:



Regular expression search pattern:

Display HTML Scrape using regular expression




ScreenScraping.cs

using System; using System.Web; using System.Net; using System.IO; using System.Text.RegularExpressions; /// /// Summary description for ScreenScraping /// public class ScreenScrape { private int _ItemCount = 0; public int ItemCount { get { return _ItemCount; } } public string GetHTML(string URL) { WebResponse myResponse; WebRequest myRequest; StreamReader myStreamReader; string strHTML = ""; try { myRequest = System.Net.HttpWebRequest.Create(URL); myRequest.Timeout = 6000; //milliseconds myResponse = myRequest.GetResponse(); myStreamReader = new StreamReader(myResponse.GetResponseStream()); strHTML = myStreamReader.ReadToEnd(); myStreamReader.Close(); } catch (Exception ex) { throw new Exception(ex.Message); } return strHTML; } public string ParseHTML(string HTML, string Pattern)