Extracting the details from any web page URL is not so easy task. Because you need something to track that page. In this article we are going to extract the details like Title, Description and collection of Images. To do this we need HTML Agility Utility in our web application. When we share a link on Facebook or Google+ we see that the Image and description comes automatically after few seconds. Exactly they have coding to perform this. As we will proceed in this article we will learn step by step to do this. I am attaching a link to download the HTML Agility Utility and also the demo project that you can download on your PC for reference.




 

Let me know if you getting any problem with this example. To get HTML Agility Utility you can download here. To Download the Demo of this example download here.

HtmlAgilityPack.1.4.6 Demo imageconvert

* Namespaces that are Required:

using System.Net;
using System.IO;
using System.Data;
using HtmlAgilityPack;

* Coding to get the Title, Description and Images:

protected void btnGet_Click(object sender, EventArgs e)
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(txturl.Text));
request.Method = WebRequestMethods.Http.Get;

HttpWebResponse response = (HttpWebResponse)request.GetResponse();

StreamReader reader = new StreamReader(response.GetResponseStream());

String responseString = reader.ReadToEnd();

response.Close();

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseString);

String title = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "title"
select x.InnerText).FirstOrDefault();

String desc = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "meta"
&& x.Attributes["name"] != null
&& x.Attributes["name"].Value.ToLower() == "description"
select x.Attributes["content"].Value).FirstOrDefault();

List<String> imgs = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "img"
select x.Attributes["src"].Value).ToList<String>();

txturl0.Text = title;
txturl1.Text = desc;
Image1.ImageUrl = imgs[0];

}
admin (156 Posts)


One Response to How to Extract a URL’s Title, Description and Images using HTML Agility Utility

  1. [...] How to Extract a URL’s Title, Description and Images using HTML Agility Utility – Sourabh Sharma gives a short sample of the amazing power of the HTML Agility library – a fantastic tool for working with HTML content when you need to be able to extract information from the content which may or may not be well formed. [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.