how to parsing HTML with Nokogiri

Installation

Installation is very easy. Just add to your Gemfile.

gem "nokogiri"

Learn how to Generate HTML.

Quick start to parsing HTML

Parsing HTML is easy, and you can take advantage of CSS selectors or XPath queries to find things in your document:

require 'open-uri'
require 'nokogiri'

# Perform a google search
doc = Nokogiri::HTML(open('http://google.com/search?q=tenderlove'))

# Print out each link using a CSS selector
doc.css('h3.r > a.l').each do |link|
  puts link.content
end

Here is an example parsing some HTML and searching it using a combination of CSS selectors and XPath selectors:

require 'nokogiri'

doc = Nokogiri::HTML.parse(<<-eohtml)
<html>
  <head>
    <title>Hello World</title>
  </head>
  <body>
    <h1>This is an awesome document</h1>
    <p>
      I am a paragraph
        <a href="http://google.ca">I am a link</a>
    </p>
  </body>
</html>
eohtml

####
# Search for nodes by css
doc.css('p > a').each do |a_tag|
  puts a_tag.content
end

####
# Search for nodes by xpath
doc.xpath('//p/a').each do |a_tag|
  puts a_tag.content
end

####
# Or mix and match.
doc.search('//p/a', 'p > a').each do |a_tag|
  puts a_tag.content
end

###
# Find attributes and their values
doc.search('a').first['href']

Set Up SSH Keys

We use SSH keys to establish a secure connection between your computer and GitHub. Setting them up is fairly easy, but does involve a number of steps.

To make sure you generate a brand new key, you need to check if one already exists. First, you need to open an app called Terminal.
1. First, check for existing ssh keys on your computer:

cd ~/.ssh

2. Backup and remove existing SSH keys. Since there is already an SSH directory you’ll want to back the old one up and remove it:

$ ls
$ mkdir key_backup
$ cp id_rsa* key_backup
$ rm id_rsa*

3. Generate a new SSH key. To generate a new SSH key, enter the code below. We want the default settings so when asked to enter a file in which to save the key, just press enter.

$ ssh-keygen -t rsa -C "your_email@youremail.com"

Now you need to enter a passphrase.

Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Nokogiri Rails 3

Nokogiri is a simple HTML / XML parser with much of its interface borrowed from Hpricot. It uses libxml2 to parse and search, so it is very fast.

Installation

Installation is very easy. Just use the following command and add to your Gemfile

gem "nokogiri"

Quick start to parsing HTML

Parsing HTML is easy, and you can take advantage of CSS selectors or XPath queries to find things in your document:

require 'open-uri'
require 'nokogiri'

# Perform a google search
doc = Nokogiri::HTML(open('http://google.com/search?q=tenderlove'))

# Print out each link using a CSS selector
doc.css('h3.r > a.l').each do |link|
  puts link.content
end

Here is an example parsing some HTML and searching it using a combination of CSS selectors and XPath selectors:

require 'nokogiri'

doc = Nokogiri::HTML.parse(<<-eohtml)
<html>
  <head>
    <title>Hello World</title>
  </head>
  <body>
    <h1>This is an awesome document</h1>
    <p>
      I am a paragraph
        <a href="http://google.ca">I am a link</a>
    </p>
  </body>
</html>
eohtml

####
# Search for nodes by css
doc.css('p > a').each do |a_tag|
  puts a_tag.content
end

####
# Search for nodes by xpath
doc.xpath('//p/a').each do |a_tag|
  puts a_tag.content
end

####
# Or mix and match.
doc.search('//p/a', 'p > a').each do |a_tag|
  puts a_tag.content
end

###
# Find attributes and their values
doc.search('a').first['href']

Generate

require 'rubygems'
require 'nokogiri'

@builder = Nokogiri::HTML::Builder.new do |doc|
  doc.html {
    doc.head {
      doc.script {
        doc.text "alert('hello world');"
      }
      doc.style {
        doc.text "div#thing { background: red; }"
      }
      doc.title "Awesome Page" 
    }
    doc.body {
      doc.div.rad.thing! {
        doc.h1 "This is an h1"
        doc.text "This is a div with class 'rad' and id 'thing'"

        doc.div( :some_attribute => 'foo' ) {
          doc.p "This is an awesome paragraph!"
        }
      }
    }
  }
end

puts @builder.to_html

Is there an easy way to create html partials e.g. menu instead of a full HTML document?
The only workaround I’ve found is to use inner_html:

require 'rubygems'
require 'nokogiri'

@builder = Nokogiri::HTML::Builder.new do |doc|
  doc.ul {
    doc.li 'hello'
  }
end

puts @builder.doc.inner_html 
# <ul><li>hello</li></ul>

Facebook Apps Using Koala

Installation

add this in your Gemfile

gem "koala"

Configuration file

If you’re using the OAuth class (or even the RealtimeUpdates class) it gets a little redundant always passing in your Facebook application ID and secret to create new instances of the OAuth class. To fix that, we’ll create a configuration file with your Facebook application ID and secret and extend Koala to always use those given values.

First we’ll put a YAML file into the config directory:

# config/facebook.yml
development:
  app_id: YOUR APP ID
  secret_key: YOUR SECRET
test:
  ...
production:
  ...

Now you can add a ruby file to read the configuration file and extend Koala when Rails is initialized in the config/initializers directory:

# config/initializers/koala.rb
# Monkey-patch in Facebook config so Koala knows to 
# automatically use Facebook settings from here if none are given

module Facebook
  CONFIG = YAML.load_file(Rails.root.join("config/facebook.yml"))[Rails.env]
  APP_ID = CONFIG['app_id']
  SECRET = CONFIG['secret_key']
end

Koala::Facebook::OAuth.class_eval do
  def initialize_with_default_settings(*args)
    case args.size
      when 0, 1
        raise "application id and/or secret are not specified in the config" unless Facebook::APP_ID && Facebook::SECRET
        initialize_without_default_settings(Facebook::APP_ID.to_s, Facebook::SECRET.to_s, args.first)
      when 2, 3
        initialize_without_default_settings(*args) 
    end
  end 

  alias_method_chain :initialize, :default_settings 
end

This overrides OAuth#initialize to take any number of arguments. If OAuth.new gets zero or one parameter, we’ll use our configuration file’s values, otherwise we’ll initialize the OAuth object using the old initializer.

Now creating an OAuth instance is as easy as

Koala::Facebook::OAuth.new
OR
Koala::Facebook::OAuth.new(oauth_callback_url)

Authentication
Facebook Connect website
Javascript-based Authentication

Koala’s OAuth class allows easy verification and parsing of the cookies Facebook passes to your application, whether it be a website or Facebook iframe application. As a side note, make sure you’re using the new Facebook JavaScript SDK since the cookie format differs from the older Facebook Connect scripts.

One way to get the cookie data is to setup a before_filter and assign a local variable to store the information:

# app/controller/foo_controller.rb
before_filter :parse_facebook_cookies

def parse_facebook_cookies
  @facebook_cookies ||= Koala::Facebook::OAuth.new(YOUR_APP_ID, YOUR_SECRET).get_user_info_from_cookie(cookies)

  # If you've setup a configuration file as shown above then you can just do
  # @facebook_cookies ||= Koala::Facebook::OAuth.new.get_user_info_from_cookie(cookies)
end

def index
  ...
  @access_token = @facebook_cookies["access_token"]
  @graph = Koala::Facebook::GraphAPI.new(@access_token)
  ...
end

Of course you can add a callback_url when creating the OAuth object, depending on how you’re handling authentication.

If you won’t necessarily need the data from the Facebook cookies on every request, a method in ApplicationController is probably good enough:

# app/controllers/application_controller.rb
def facebook_cookies
    @facebook_cookies ||= Koala::Facebook::OAuth.new(YOUR_APP_ID, YOUR_SECRET).get_user_info_from_cookie(cookies)
end

# app/controllers/foo_controller.rb
def index
  ...
  @access_token = facebook_cookies['access_token']
  @graph = Koala::Facebook::GraphAPI.new(@access_token)
  ...
end

Authentication via redirects

Note: I’ve never actually implemented Facebook authorization using redirects, but this is what I’ve gathered from the Facebook documentaiton. Let us know if this example actually doesn’t work! – Chris

OAuth supports an authentication flow based on redirects, which is outlined on the official Facebook developers site

To authorize a Facebook user present them with a link to the Facebook authentication page. With Koala this might look like:

<%# app/views/welcome.html.erb %>
...
  <%= link_to 'Login', Koala::Facebook::OAuth.new.url_for_oauth_code(:callback => oauth_redirect_url) %>
...

Where oauth_redirect_url is the URL to an action which will handle the rest of the authentication flow.

Assuming oauth_redirect_url points to the OAuthController#redirect action, you can finish off authentication with the following bit of code:

# app/controllers/oauth_controller
def redirect
  session[:access_token] = Koala::Facebook::OAuth.new(oauth_redirect_url).get_access_token(params[:code]) if params[:code]

  redirect_to session[:access_token] ? success_path : failure_path
end

This will store the access_token string in session[:access_token]. Obviously, you can store this value whatever way seems fit given your application/
IFrame Applications on Facebook.com

Using Koala with an iframe application is very similar to using it with an external Facebook Connect application. You can use the Javascript-based authentication methods to authenticate the user and start using the Graph or REST API; Facebook also provides parameters on tab load, which you could associate with the user as well.

Storing Facebook User IDs

As you may or may not be aware, Facebook UIDs are so large that they should be stored in most databases as big ints rather than plain ints. Therefore, if you wish to store a Facebook UID in your database, your migration should be of the form:

add_column :users, :facebook_id, :bigint

Graph API

The Graph API is the simple, slick new interface to Facebook’s data. Using it with Koala is quite straightforward:

@graph = Koala::Facebook::API.new(oauth_access_token)
# in 1.1 or earlier, use GraphAPI instead of API

profile = @graph.get_object("me")
friends = @graph.get_connections("me", "friends")
@graph.put_object("me", "feed", :message => "I am writing on my wall!")

# three-part queries are easy too!
@graph.get_connection("me", "mutualfriends/#{friend_id}")

# you can even use the new Timeline API
# see https://developers.facebook.com/docs/beta/opengraph/tutorial/
@graph.put_connections("me", "namespace:action", :object => object_url)

Simple authentication with Warden

There are a lot of Ruby authentication libraries out there, which can do about everything like sending confirmation emails and resetting passwords. I didn’t really want that. My plan was to write a little application that could authenticate using Github credentials (more on Github authentication in “Authenticating via Github with Guestlist”).

This meant I didn’t need email confirmations, password reset functionality or even registration. Also, I didn’t want to log in using an email address and password or check my own database to see if the user exists. So, no Authlogic or Clearance for me. I had to go find a more low-level solution.

It didn’t take long before I found Warden, a “General Rack Authentication Framework”.

“Warden is rack based middleware, designed to provide a mechanism for authentication in Ruby web applications. It is a common mechanism that fits into the Rack Machinery to offer powerful options for authentication.”

Remember: it does not do registration, confirmation and the like. If you want anything like that, use Devise, a Rails authentication system based on Warden. @rbates also did a great Railscast on Devise.

“Warden uses the concept of cascading strategies to determine if a request should be authenticated. Warden will try strategies one after another until either one succeeds, no Strategies are found relevant or a strategy fails.”

An example of a strategy would be a user logging in with his username and password:

Warden::Strategies.add(:my_strategy) do

  def valid?
    params[:username] && params[:password]
  end

  def authenticate!
    u = User.find_by_username_and_password(
      params[:username],
      params[:password] # you should encrypt this. 😉
    )

    u.nil? ? fail!("Couldn't log in") : success!(u)
  end
end

The valid? method checks if the strategy is valid. In the example above it will return false when the username and password aren’t both in the params. In that case it will fail without even having to try and find the user.

authenticate! calls fail! with a message when the authentication failed. If the authentication passes, it’ll pass the User instance to success!. Pretty simple.

I’m not going into any specific stuff here, but if you’re using Rails you might want to check out rails_warden_mongoid_example. It’s a pretty simple and understandable application that shows you how to use Warden. Also, be sure to read the wiki, it has pretty good setup and example pages and there’s a lot more cool stuff in there.

ElasticSearch Rails 3 Part 1

add to Gamefile

gem 'tire'

articles_controller.rb

def index
  @articles = Article.search(params)
end

models/article.rb

include Tire::Model::Search
include Tire::Model::Callbacks

def self.search(params)
  tire.search(load: true) do
    query { string params[:query], default_operator: "AND" } if params[:query].present?
    filter :range, published_at: {lte: Time.zone.now}
  end
end

articles/index.html.erb

<%= form_tag articles_path, method: :get do %>
  <p>
    <%= text_field_tag :query, params[:query] %>
    <%= submit_tag "Search", name: nil %>
  </p>
<% end %>

Sortable List in Ruby on Rails 3 Using jQuery

1. Go to Gamefile

gem "rails"
gem "mysql2"
gem "acts_as_list"

# go to console:
$ bundle install

Add acts_as_list to model

class Book < ActiveRecord::Base
  acts_as_list
end

add jquery to view and setup javascript content_for, Go to app/views/layouts/books.html.erb
Change this:

<%= javascript_include_tag :defaults %>

To this:

<%= javascript_include_tag "https://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js", "https://ajax.googleapis.com/ajax/libs/jqueryui/1.8.14/jquery-ui.min.js", "rails.js" %>

Also before /body add:

<%= yield :javascript %>

edit index for li – tables don’t work

<h1>Listing books</h1>
<ul id="books"> <% @books.each do |book| %>
  <li id="book_<%= book.id %>"><span class="handle">[drag]</span><%= book.name %></li>
<% end %></ul>
<%= link_to 'New book', new_book_path %>

add javascript in view

# index.html.erb
<% content_for :javascript do %>
<%= javascript_tag do %>
// Sorting the list

$(document).ready(function(){
   $('#books').sortable({
       axis: 'y',
       dropOnEmpty: false,
       handle: '.handle',
       cursor: 'crosshair',
       items: 'li',
       opacity: 0.4,
       scroll: true,
       update: function(){
          $.ajax({
              type: 'post',
              data: $('#books').sortable('serialize'),
              dataType: 'script',
              complete: function(request){
                 $('#books').effect('highlight');
              },
                 url: '/books/sort'})
              }
          });
     });
 });
     <% end %>
<% end %>

and your controller

def index
   @books = Book.order('books.position ASC')
end

def sort
   @books = Book.all
   @books.each do |book|
   book.position = params['book'].index(book.id.to_s) + 1
   book.save
end

render :nothing => true
end

update your routes

root to: "books#index"
resources :books do
   post :sort, on: :collection
   # ruby 1.8 would go :on => :collection
end