BackSpace is now on Udemy. Get your free access!!!



Many of you have been requesting we put our courses on Udemy and we have finally listened. We are now listed on Udemy and future courses will also be on Udemy.

As an introductory offer we are giving the following deals:

We will of course continue to sell and support courses with exam engines at backspace.academy.

Super Fast Dynamic Websites with CloudFront, ReactJS, and NodeJS - Part 1


CloudFront should be an essential component of any web based application deployment. It not only instantly provides super low-latency performance, it also dramatically reduces server costs while providing maximum server uptime.

Creating low latency static websites with CloudFront is a relatively simple process. You simply upload your site to S3 and create a CloudFront distribution for it. This is great for HTML5 websites and static blogs such as Jeckyl. But what about dynamic sites that need real time information presented to the end user? A different strategy is clearly required. Much has been published about different methods of caching dynamic websites but, I will present the most common sense and reliable technique to achieve this end.

Server Side v Browser Side Rendering

If you read any book on NodeJS you will no doubt find plenty of examples of rendering Jade templates with Express on the server. If you are still doing this, then you are wasting valuable server resources. A far better option is to render on the browser side. There are a number of frameworks specifically for this, the most popular being Facebook's ReactJS and Google's AngularJS (see State of JS report). I personally use ReactJS and the example will be in ReactJS, but either is fine.

Creating your site using ReactJS or AngularJS and uploading it to your NodeJS public directory will shift the rendering of your site from your server to the client's browser. Users of your app will no longer be waiting for rendered pages and will see pages appear with the click of a button.

You can now create a CloudFront distribution for your ReactJS or AngularJS site.

Although pages may be rendered instantly in the browser, any dynamic data required for the pages will be cached by CloudFront. We most probably do not want our dynamic data cached. We will still need a solution for delivering this data to the browser.


Handling Dynamic Data


Although there are many elaborate techniques published for handling dynamic data with CloudFront, the best way is to deliver this data without caching at all from CloudFront.

Not all HTTP methods are cached by CloudFront, only responses to GET and HEAD requests (although you can also configure CloudFront to cache responses to OPTIONS requests). If we use a different HTTP method, such as POST, PUT or DELETE  the request will not be cached by Cloudfront. CloudFront will simply proxy these requests back to our server.

Our EC2 NodeJS server can now be used to respond to requests for dynamic data by creating an API for our application that responds to POST requests from the client browser.




Some of you might be wondering why I haven't used serverless technology such as AWS Lambda or API Gateway. Rest assured I will be posting another series using this but, I consider EC2 as the preferred technology for most applications. First of all, costs are rarely mentioned in the serverless discussion. If you have an application that has significant traffic, the conventional EC2/ELB architecture will be the most cost effective. Secondly, many modern web applications are utilising websocket connections. Connections like this are possible with EC2 directly and also behind an ELB when utilizing proxy protocol. This is not possible with serverless technology as connections are short lived.

In the next post in this series we will set up our NodeJS server on EC2, create a CloudFront distribution and, create our API for handling dynamic data.

Be sure to subscribe to the blog so that you can get the latest updates.

For more AWS training and tutorials check out backspace.academy

Attention all exam preppers! AWS Answers



AWS has just released a new AWS Answers page that is essential reading for those preparing for the certification exams.
It provides a great overview of AWS architecture considerations.

YAML and CloudFormation. Yippee!!!

YAML: YAML Ain't Markup Language


I spend a heck of a lot of time coding and, like many devops guys, love Coffeescript, Jade, Stylus and YAML. No chasing missing semicolons, commas and curly braces. I just write clean code how it should be and, at least twice as fast.

JSON, like plain javascript, is a lot cleaner, quicker and easier to read when you remove all those curly braces, commas etc. YAML does just that!

AWS just announced support for YAML with CloudFormation templates. I would thoroughly recommend you check it out and start using YAML. It will make big difference to your productivity and, your templates will be much easier to read understand.

YAML, like Coffeescript, Jade and Stylus, makes use of indenting in code to eliminate the need for braces and commas. When you're learning YAML, you can use a JSON to YAML converter (eg http://www.json2yaml.com) to convert your existing JSON to YAML.

(Very) Basics of YAML

Collections using Indentation eliminate the need for braces and commas with Objects:

JSON 
"WebsiteConfiguration": {
"IndexDocument": "index.html",
"ErrorDocument": "error.html"
}

YAML
WebsiteConfiguration:
  IndexDocument: index.html
  ErrorDocument: error.html

Sequences with Dashes eliminate the need for square brackets and commas with Arrays:

JSON 
[
"S3Bucket",
"DomainName"
]

YAML
  - S3Bucket
  - DomainName

Full Example

Here is a full example I created for S3. I'll let you be the judge which one is better!

JSON:

{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "AWS CloudFormation Sample Template",
"Resources": {
"S3Bucket": {
"Type": "AWS::S3::Bucket",
"Properties": {
"AccessControl": "PublicRead",
"WebsiteConfiguration": {
"IndexDocument": "index.html",
"ErrorDocument": "error.html"
}
},
"DeletionPolicy": "Retain"
}
},
"Outputs": {
"WebsiteURL": {
"Value": {
"Fn::GetAtt": [
"S3Bucket",
"WebsiteURL"
]
},
"Description": "URL for website hosted on S3"
},
"S3BucketSecureURL": {
"Value": {
"Fn::Join": [
"",
[
"https://",
{
"Fn::GetAtt": [
"S3Bucket",
"DomainName"
]
}
]
]
},
"Description": "Name of S3 bucket to hold website content"
}
}
}

YAML:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: AWS CloudFormation Sample Template
Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: PublicRead
      WebsiteConfiguration:
        IndexDocument: index.html
        ErrorDocument: error.html
    DeletionPolicy: Retain
Outputs:
  WebsiteURL:
    Value:
      Fn::GetAtt:
      - S3Bucket
      - WebsiteURL
    Description: URL for website hosted on S3
  S3BucketSecureURL:
    Value:
      Fn::Join:
      - ''
      - - https://
        - Fn::GetAtt:
          - S3Bucket
          - DomainName
    Description: Name of S3 bucket to hold website content



Shared Responsibility 3 - Identify and Destroy the Bots



Please note: You should have a link in your login for blind or vision impaired people. These techniques will prevent them from using your application. They could be accommodated using an alternative dual factor authentication process.

In my last post I detailed how to create dynamic CSS selectors to make life difficult for reliable Bot scripts to be written. The next part of this series is to identify the Bot and take retaliatory action. The code for this post is available at Gist in case Blogger screws it up again.. The code for creating dynamic CSS selectors including some additional decoy elements looks like this:

function dynamicCSS(){
 var username, password
 x = ''
 if ((Math.random()*2) > 1)
  x += ''
 else
  x += ''
 x += '
' x += '

Please Login

' y = Math.floor((Math.random()*5)) + 2 for (var a=0; a' x += '' x += '' } for (var a=0; a
' x += '' x += '' return x }

In a real app you would set this all up using a jade template but for simplicity we will just send raw html.

Analysing Header Information

The first thing we can look at is the header information sent to our NodeJS EC2 instance. I conducted some tests using a number of different browsers and also using the very popular PhantomJS headless webkit to find any clear differences between a real browser and a headless browser. Below are the results.

Request headers from Chrome:

{
  "host": "54.197.212.141",
  "connection": "keep-alive",
  "content-length": "31",
  "accept": "*/*",
  "origin": "http://54.197.212.141",
  "x-requested-with": "XMLHttpRequest",
  "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36",
  "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
  "referer": "http://54.197.212.141/",
  "accept-encoding": "gzip, deflate",
  "accept-language": "en-GB,en-US;q=0.8,en;q=0.6",
}

Request headers from Firefox:

{
  "host": "54.197.212.141",
  "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0",
  "accept": "*/*",
  "accept-language": "en-US,en;q=0.5",
  "accept-encoding": "gzip, deflate",
  "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
  "x-requested-with": "XMLHttpRequest",
  "referer": "http://54.197.212.141/",
  "content-length": "10",
  "connection": "keep-alive"
}

Request headers from Safari:

{
  "host": "54.197.212.141",
  "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
  "origin": "http://54.197.212.141",
  "accept-encoding": "gzip, deflate",
  "content-length": "9",
  "connection": "keep-alive",
  "accept": "*/*",
  "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.7.8 (KHTML, like Gecko) Version/9.1.3 Safari/601.7.8",
  "referer": "http://54.197.212.141/",
  "accept-language": "en-us",
  "x-requested-with": "XMLHttpRequest"
}

Request headers from Internet Explorer:

{
  "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
  "accept": "*/*",
  "x-requested-with": "XMLHttpRequest",
  "referer": "http://54.197.212.141/",
  "accept-language": "en-US;q=0.5",
  "accept-encoding": "gzip, deflate",
  "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
  "host": "54.197.212.141",
  "content-length": "9",
  "dnt": "1",
  "connection": "Keep-Alive",
  "cache-control": "no-cache"
}

Request headers from PhantomJS:

{
  "accept": "*/*",
  "referer": "http://54.197.212.141/",
  "origin": "http://54.197.212.141",
  "x-requested-with": "XMLHttpRequest",
  "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0",
  "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
  "content-length": "9",
  "connection": "Keep-Alive",
  "accept-encoding": "gzip, deflate",
  "accept-language": "en-AU,*",
  "host": "54.197.212.141"
}

I won't put all the results here for all the browsers tested, but there is a clear structure to the PhantomJS header that is unique from conventional browsers. In particular starting with "accept" and finishing with "host".
Analysing the header structure can be used to help identify a Bot from a real person.

Plugin Support

Support for plugins with headless browsers is minimal or non-existent. We can also look at using the javascript Navigator method on the client to get information on supported plugins and other browser features. A simple test would be to check the plugins array length. In order to do this we need to create a javascript file that runs on the client.

In the public directory of your NodeJS create a new file called login.js (use your IP address of course)

function submitForm(username, password) {
  loginURL = 'http://54.197.212.141' + '/login'
  user = $('#' + username).val()
  pass = $('#' + password).val()
  $.post( loginURL, { 
    username: 'user',
    password: 'pass',
    plugins: '  '//navigator.plugins.length
  })
    .done(function( result ) {
      alert(result)
      }
    })
}

This code will post back to your NodeJS server information about the browser. Although headless browsers can post forms, from my experiments they don't process jQuery post commands running on the client. In contrast it has worked on all normal browsers without issue

If the post is submitted then the further identification can occur. In this example we will just check the length of the plugins array and also send the filled out input fields. If it is a Bot then the wrong input fields will be submitted and the plugins array length will be zero.

You will also need to install and enable bodyParser in your index.js file to receive the information:

var bodyParser = require('body-parser')
app.use(bodyParser.urlencoded({ extended: true }));

You will also need to update the dynamicCSS function to include jQuery on the client:

x += '<script src="https://code.jquery.com/jquery-3.1.0.min.js"></script>'

Also add a link to to your login.js file:

x += '<script src="login.js"></script>'

Additional security can also be achieved by creating multiple versions of the login file with different names for the parameters (username/password/plugins). The backend code that handles the post request would be expecting the fields defined in login file that was served. Anything different that is posted means the request is most probably a from a Bot. Another option is to serve the login.js file on the server side with random parameters and set up a route for it to be downloaded. For simplicity I won't add the extra code for this, but implementation is quite straightforward.

Building a Profile of the Bot


It is quite important that you use a number of techniques to identify bots to make sure you do not have a case of mistaken identity. It is a good idea to use a score system and identify a level which will trigger corrective action.

In your index.js file that is run on your server update with the following code:

app.post('/login', function(request, response) {
  var botScore = 0
  if (request.body.plugins == 0) ++botScore // Empty plugins array
  if (!request.body.username) ++botScore // Clicked on decoy inputs
  if (!request.body.password) ++botScore
  if (getObjectKeyIndex(request.headers, 'host') != 0) ++botScore // Bot type header info
  if (getObjectKeyIndex(request.headers, 'accept') == 0) ++botScore
  if (getObjectKeyIndex(request.headers, 'referer') == 1) ++botScore
  if (getObjectKeyIndex(request.headers, 'origin') == 2) ++botScore
  console.log('Bot score = ' + botScore)
  if (botScore > 4) {
    console.log('Destroy Bot')
    response.send('fail')
  }
  else {
    response.send('Logged in ' + request.body.username)
  }
})

We are now building a bot score and deciding whether to allow access based on that score.

Attacking the Bot with Javascript

The following technique needs to be used with caution.

If you want to ensure the Bot is destroyed for the good of the internet community then you can look at launching a counterattack on the bot with your own malicious script to crash the browser.

This can be achieved quite simply using an endless loop that reloads the page. This will stop execution of the browser and eventually cause it to crash. Update your client side code in login.js with:


function submitForm(username, password) {
  loginURL = 'http://54.197.212.141' + '/login'
  user = $('#' + username).val()
  pass = $('#' + password).val()
  $.post( loginURL, { 
    username: 'user',
    password: 'pass',
    plugins:   navigator.plugins.length
  })
    .done(function( result ) {
      if (result == 'fail')
        while(true) location.reload(true) // Crash the Bot
      else {
        alert(result)
      }
    })
}


In my next post I will look at the different types and benefits of captcha and how to enable them on your site.

Be sure to subscribe to the blog so that you can get the latest updates.

For more AWS training and tutorials check out backspace.academy

New Classroom Website




We have just started updating our classroom website:

  • We now accept PayPal and use Paypal for all transactions including credit cards (no more Stripe).
  • Runs on all browsers.
  • One click login using Facebook (more options will be added soon)
Login is no longer with username and password so please login using your facebook account that is linked to the email you used with BackSpace. If your Facebook account uses a different email then please send me an email on info@backspace.academy and I will update the database with your Facebook email.

Shared Responsibility 2 - Using Dynamic CSS Selectors to stop the bots.


In my last post I talked about techniques to stop malicious web automation services at the source before they reach AWS infrastructure. Now we will get our hands dirty with some code to put it into action. Don't worry if you are not an experienced coder, you should still be able to follow along.

How do Bot scripts work?

A rendered web page contains a Document Object Model (DOM). The DOM defines all the elements on the page such as forms and input fields. Bots mimic a real user that enters information in fields, clicks on buttons etc. To do this the bot needs to identify the relevant elements in the DOM. DOM elements are identified using CSS selectors. Bot scripts consist of a series of steps that detail CSS selectors and what action to perform on them.

The DOM structure and elements of a page can be quickly identified using a browser. Pressing F12 in your browser will launch developer tools with this information:


To see specific details of a DOM element simply right click on the element on the page and select 'inspect':


This will open up the developer tools with the element identified. You can get the CSS selector for the element easily by again right clicking on the element in the developer tools:



Note that this will be only one representation of the element as a CSS selector (generally the shortest one). There are a number of ways an element can be defined as a CSS selector including:

  • id name
  • input name
  • class names
  • DOM traversal e.g. defining its chain of parent elements in the DOM
  • Text inside the element using Jquery ':contains'.

Dynamic CSS Selectors

To make life difficult to develop bot scripts you can use dynamic CSS selectors. Instead of creating the same CSS selectors each time your page is rendered, you can look at changing these randomly each time.

When using NodeJS and Express this is quite straightforward as your are already rendering pages on the server. Simply introduce some code to mix this up a bit.

Let's Start


First of all set up an EC2 instance with NodeJS and Express set up to render pages. If you are unsure you can view the video below:

https://vimeo.com/145017165

To save you typing, the code is available at Gist (also Blogger tends to screw up code when it is published).

Now let's change index.js to create a simple login form.

Point your browser to the public IP address of your instance to check everything is ok. e.g. xxx.xxx.xxx.xxx:8080

Now change the index.js file to include a dynamicCSS function :

app.get('/', function(request, response) {
  response.send(dynamicCSS())
})

function dynamicCSS(){
 x = ''
 x += '
'; x += '

Please Login

; x += ''; x += ''; x += '' x += '
'; return x }


Now do npm start at the command line of your ec2 instance and refresh the browser page. You will now see our very simple login form:


The problem with this form is that it is really easy to identify the dom elements required to login. The id, name, placeholder all refer to username or password.

Now let's change our code and introduce dynamically created CSS selectors.


var loginElements = {
 username: '',
 password: ''
 }

function dynamicCSS(){
 var username = randomString()
 var password = randomString()
 loginElements.username = username
 loginElements.password = password 
 x = ''
 x += '
' x += '

Please Login

' x += '' x += '' x += '' x += '
' return x } function randomString(){ chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXTZabcdefghiklmnopqrstuvwxyz'.split('') chars.sort(function() { return 0.5 - Math.random() }) return chars.splice(0, 8).toString().replace(/,/g, '') }



This now generates a random string for the id and name tags of the input elements. This makes it not possible to use these in a reliable bot script. If you do npm start again and view the the view the element in developer tools you can see the random strings.

We now need to look at the other ways our elements can be identified as CSS selectors. As you can see the text "username" and "password" is still used in the placeholders and input type tag. Also the DOM structure itself doesn't change dynamically, making it possible to reference the element through traversing the DOM structure.

We will address both problems by creating random decoy input elements with the same parameters. The CSS position property will allow us to stack them on top of each other so that the decoy elements are not visible on the page:

app.get('/', function(request, response) {
  response.send(dynamicCSS())
})

var loginElements = {
  username: '',
  password: ''
}

function dynamicCSS(){
  var username, password
  x = ''
  x += '
' x += '

Please Login

' y = Math.floor((Math.random()*5)) + 2 for (var a=0; a' x += '' loginElements.username = username loginElements.password = password } x += '' x += '
' return x } function randonString(){ chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXTZabcdefghiklmnopqrstuvwxyz'.split('') chars.sort(function() { return 0.5 - Math.random() }) return chars.splice(0, 8).toString().replace(/,/g, '') }


Now when you view the DOM in your browser developer tools,  you can see the decoy input elements created underneath the real input element. If you refresh your browser you will see a different number of elements created each time (between 1 and 5 created).



The bot creator can no longer use the username and password placeholders or input types to identify the elements. They can also not use the DOM structure to traverse through the DOM as this is changing also. As pointed out by a reader of this post (thanks Vadim!), you should also put some random inputs after to handle jquery ":last". A good place would be underneath your logo.
y = Math.floor((Math.random()*5)) + 2
for (var a=0; a'
  x += ''
  loginElements.username = username
  loginElements.password = password
}
for (var a=0; a'
  x += ''
  document.getElementById(username).style.visibility = "hidden";
  document.getElementById(password).style.visibility = "hidden";
}

The next thing a bot script can do is click on an x-y position on the screen. We can handle this by randomly changing the position of the elements.

var loginElements = {
 username: '',
 password: ''
 }

function dynamicCSS(){
  var username, password
  x = ''
  if ((Math.random()*2) > 1)
    x += ''
  else
    x += ''
  x += '
' x += '

Please Login

' y = Math.floor((Math.random()*5)) + 2 for (var a=0; a' x += '' loginElements.username = username loginElements.password = password } x += '' x += '
' return x } function randomString(){ chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXTZabcdefghiklmnopqrstuvwxyz'.split('') chars.sort(function() { return 0.5 - Math.random() }) return chars.splice(0, 8).toString().replace(/,/g, '') }


The position of the input elements is now random. This currently only has two positions but you can elaborate on this to create many possible combinations of positions. You may also make your login form inside a modal window that changes position on the screen.

If want to go further you can look having two login forms, username followed by password. Or even better, randomly change between the two.

We have now addressed the possible techniques a bot creator can use to identify your input elements and login to your site.

Congratulations, you made it to the end!

What's next?

In my next post I will introduce techniques to identify bots and then look at launching a counter attack on the bot to crash it after it has been positively identified.

Be sure to subscribe to the blog so that you can get the latest updates.

For more AWS training and tutorials check out backspace.academy