Lesson2 done

This commit is contained in:
gauthiier 2015-02-17 17:50:46 +01:00
parent 829425c362
commit 9222e0469d
7 changed files with 273 additions and 31 deletions

92
Lesson1.html Normal file
View File

@ -0,0 +1,92 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title></title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link rel="stylesheet" href="style/style.css">
</head>
<body>
<content>
<h2 id="text-encoding">Text Encoding</h2>
<p>We believe in approaching text writing by first understanding the core inscription mechanism upheld by modern computing machine. In this lesson we will hence look at how text and characters are inscribed and represented internally within computers. More specifically we will look at standards of text encoding (and decoding) and see how text editors can decode such encoding.</p>
<h3 id="goals">Goals</h3>
<p>The aim of this lesson is to present the various ways that computers represent text internally, that is, characters as digits. The lesson is tailored in giving the reader the basic knowledge of standards that establish the quanta of text (data). Our hope in doing so is to give a feel of a kind of materiality of text and present the ways in which various levels of abstraction are applied to it.</p>
<p>In a nutshell, the goals of this lesson are:</p>
<ol style="list-style-type: decimal">
<li>Understand the various ways in which characters are represented and encoded as digits.</li>
<li>Derive a basic understanding of how proprietary text formats are encoded.</li>
<li>Develop a critical stance on why proprietary text formats might be problematic.</li>
<li>Develop a critical stance on why standardized open text formats are important and ubiquitous.</li>
<li>Learn how to use a plain text editor to write, view and inspect different open standards encodings of a given text file.</li>
</ol>
<h3 id="history">History</h3>
<p>As everyone heard of the byte format? If you didn't it's about time you do as you employ this legacy format daily when using your computer. A byte is the most basic quanta of computing and is composed of 8 bits, where a bit stands for what is commonly represented by a 0 or 1. Hence a byte is a 8-bits &quot;packet&quot; which can represent decimal numbers ranging from 0 to 255 (or -128 to 127). In this lesson we will use the <a href="https://en.wikipedia.org/wiki/Hexadecimal">Hexadecimal</a> notation to represent bytes. A byte is an historical format and encapsulate the most basic data structure in computing machinery, a standard introduced by IBM for its flagship <a href="http://www.computermuseum.li/Testpage/IBM-360-1964.htm">IBM/360</a> mainframe machine in 1964.<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a></p>
<p>Roughly at the same time (1963) another (updated) standard was devised for the encoding of characters: ASCII [ref]. ASCII conceived a 7-bit format for characters that was factorised into an 8-bit format on the IBM/360. With a 7-bit format, ASCII had the possibility to encode 127 characters. However, the IBM/360 opted to use the legacy <a href="https://en.wikipedia.org/wiki/EBCDIC">EBCDIC</a> 8-bit format as default character set (dubbed &quot;charset&quot;) on all software developed for the IBM/360<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a>. Hence the mass adoption of ASCII as main default charset in computing systems came years after mainly with the advent of PCs.</p>
<p>Is ASCII still in use today? Yes and no. ASCII has some important limitations as it was designed for Latin-based languages and does not support non-Latin characters (hence a 7-bit format for an Latin alphabet). With the wide spread of PCs around the world and the rise of the Internet as main communication infrastructure, the need for a single character format (albeit a Universal Format) accounting for both Latin and non-Latin characters (Cyrillic, Hebrew, Arabic, Turkish to name a few) was imminent at the beginning for the 90s.</p>
<p>Hence the establishment of the Unicode standard which aim is to devise and maintain a Universal Character Set (UCS) composed of special codes points for each character (a kind of &quot;meta&quot;-charset if you want, composed of specific unicode codes)<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>. Unicode does not specify specific encodings for its code points. Rather, encodings are part of specific implementations of the UCS such as UTF (UCS Transformation Format). The most notable UTF being UTF-8.<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a> The special feature of UTF-8 is that it is directly backward compatible with ASCII (an 8-bit ASCII character as the same encoding as its UTF-8 version) and has the property of being variable in length, meaning that Latin characters are encoded with a single byte while other non-Latin characters may be encoded with up to 4 bytes.<a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a> Nowadays, UTF-8 is one of the most (if not <em>the</em> most) mass adopted / ubiquitous character encoding format.<a href="#fn6" class="footnoteRef" id="fnref6"><sup>6</sup></a></p>
<h3 id="how">How</h3>
<p>Let's start with a very simple example to illustrate how text is encoded.</p>
<p>The following sentence</p>
<pre><code>this is a sentence encoded in UTF-8.</code></pre>
<p>is equivalent to UTF-8</p>
<pre><code>7468 6973 2069 7320 6120 7365 6e74 656e 6365 2065 6e63 6f64 6564 2069 6e20 5554 462d 382e</code></pre>
<p>and Unicode</p>
<pre><code>U+0074 U+0068 U+0069 U+0073 U+0020 U+0069 U+0073 U+0020 U+0061 U+0020 U+0073 U+0065 U+006e U+0074 U+0065 U+006e U+0063 U+0065 U+0020 U+0065 U+006e U+0063 U+006f U+0064 U+0065 U+0064 U+0020 U+0069 U+006e U+0020 U+0055 U+0054 U+0046 U+002d U+0038 U+002e</code></pre>
<p>Now the same sentence in Vietnamese</p>
<pre><code>đây là một câu UTF-8.</code></pre>
<p>is equivalent to UTF-8</p>
<pre><code>c491 c3a2 7920 6cc3 a020 6de1 bb99 7420 63c3 a275 2055 5446 2d38 2e</code></pre>
<p>and Unicode</p>
<pre><code>U+0111 U+00e2 U+0079 U+0020 U+006c U+00e0 U+0020 U+006d U+1ed9 U+0074 U+0020 U+0063 U+00e2 U+0075 U+0020 U+0055 U+0054 U+0046 U+002d U+0038 U+002e</code></pre>
<p>A few observations from the examples above are worth noting:</p>
<ol style="list-style-type: decimal">
<li><p>For the English sentence, the UTF-8 encoding and Unicode representation are basically the same. This is because what we are looking at is basically straight ASCII! Unicode's UCS was designed to integrate ASCII into it's core coding scheme and hence used the same codes for its Latin-based subset. UTF-8 implements this (obviously) into its encoding scheme.</p></li>
<li><p>UTF-8 encoding of the English sentence is far more compact then the Unicode UCS. For a single character, UTF-8 utilises two bytes rather than four from the UCS.</p></li>
<li><p>For the Vietnamese sentence, things get a little more interesting. Here the UTF-8 encoding and Unicode representation are <em>not</em> the same. As explained in the last section, UTF-8 encoding format and Unicode code points are not meant to be equivalent, one is a standard (Unicode) while the other is the implementation of this standard (UTF-8).</p></li>
<li><p>UTF-8 encoding of the Vietnamese sentence is <em>not</em> necessarily more compact then Unicode's UCS. In fact we see UTF-8 utilising four bytes to encode some characters (remember that UTF-8 is of variable-length). For example the character 'â' is 'U+00e2' in UCS (two significant bytes) while 'c3a2' in UTF-8 (four significant bytes). A great chart to look at the various codes and encoding can be found here: <a href="http://utf8-chartable.de">http://utf8-chartable.de</a></p></li>
</ol>
<p>At this point, we should stress the fact that what is inscribed in computing memory is the <em>encoding</em> of text and not its Unicode representation. In other words, UTF-8 is the scheme from which computers inscribe text to physical memory using their read/write mechanisms. What is inscribed physically are single bits following the UTF-8 encodings scheme that gives meaning to 8-bit &quot;packets&quot; as characters. In the example above we have employed the hexadecimal notation to represent such &quot;packets&quot;/data. This is, of course, an kind of abstraction from the physical layer where text is actually inscribed, a convenient way for us humans to decipher and group bits. It nonetheless gives us a feel for the type of &quot;materiality&quot; of text inscribed on and manipulated by computing machine. For a more in depth analysis of physical inscription mechanisms, we refer the forensics work of Kirschenbaum <span class="citation">(Kirschenbaum 2012)</span> on the subject.</p>
<h4 id="plain-text-editors">(Plain) Text Editors</h4>
<p>But how do I go about and start looking up the encoding of a particular text? Well it is pretty simple: don't use word processing software; use a <a href="https://en.wikipedia.org/wiki/Text_editor">plain text editor</a>. For the examples above we've used <a href="http://www.sublimetext.com">Sublime Text</a><a href="#fn7" class="footnoteRef" id="fnref7"><sup>7</sup></a> to manipulate and reveal encodings of our sentences. The idea with plain text editors is that they give it all, they are usually very basic in appearance yet usually have striking features that are central to the practice of computer programming. There exist a panoply of good and powerful text editors<a href="#fn8" class="footnoteRef" id="fnref8"><sup>8</sup></a> and some are even legacy editors such as <a href="https://www.gnu.org/software/emacs/">Emacs</a> and <a href="http://www.vim.org">Vi(m)</a>. In this lesson (and the remaining ones) we will use Sublime to illustrate techniques and concept, yet any other editors would suffice without a doubt.</p>
<p>Now equipped with an editor, let's look up what a word processing file looks like:</p>
<div class="figure">
<img src="img/garbage.png" />
</div>
<p>To select the encoding of a file using Sublime: <strong>Menu</strong> -&gt; <strong>File</strong> -&gt; <strong>Reopen with Encoding</strong></p>
<p>The above file is an Apple Pages file that we have opened using Sublime with UTF-8 decoding.</p>
<p>As you can see there is many characters that do not read properly, that is, not human readable. In fact, we can see that UTF-8 decodes the bytes in the file and maps their content to some Unicode &quot;control&quot; character. These &quot;control&quot; characters are part of the UCS and are characters representing computer commands if you like, rather than elements of an alphabet. For example a &quot;new line&quot; character representing a new line in a text (when the &quot;return&quot; key is pressed on a keyboard) has a &quot;LF&quot; (Line Feed) symbol with UCS U+000A value. There exists a vaietry of such characters.<a href="#fn9" class="footnoteRef" id="fnref9"><sup>9</sup></a></p>
<p>However, in the case of the Apple Pages file, these &quot;control&quot; characters are meaningless as they do not obviously follow the Unicode standard. Instead Pages inserts into it's text specific commands that only have meaning for the Apple Pages program. In short, these are bytes that have meaning only to Apple and their specific regime of encoding files. Such commands may refer to specific ways to display certain types of characters, or perhaps signify the beginning of a paragraph, or specify a font to render text, or even be the data of an image (who knows?). Pages is not a standard format but a proprietary one, therefore it is not possible to instruct my text editor on how to decode the bytes found in the Pages document. In a sense, in having all data part of a single file (information about the design, layout, font, etc.) it makes the files overly complex compared to plain text format. As a result, word processing files tend to be larger in size than plain UTF-8 encoded ones. The text from the file above has 1 389 characters. Its Apples Pages file is composed of 179 759 bytes while its plain UTF-8 version only 1 391 bytes (two extra bytes for the &quot;EOF&quot; control character).</p>
<p>In turn the obvious unreadability of proprietary word processing file formats (such as Apple Pages, MS Word) coupled with their tendency to bloat file, makes them problematic in terms of politics of encoding, usability and efficiency. Hence, standards like UTF-8 and the use of plain text editors are viable alternative for writing academic text and sustained by a practice that is unbounded by obfuscating interests and techniques. What is human-readable is human-understandable.</p>
<h3 id="extra">Extra</h3>
<p>As an exercise for this lesson, please feel free to experiment with <a href="http://www.sublimetext.com">Sublime Text</a> (especially its encoding features): <strong>Menu</strong> -&gt; <strong>File</strong> -&gt; <strong>Reopen with Encoding</strong></p>
<p><a href="http://www.sublimetext.com/docs/2/">Sublime Text 2 Official Documentation</a></p>
<p><a href="http://sublime-text-unofficial-documentation.readthedocs.org/en/sublime-text-2/">Sublime Text 2 Non-Official Documentation</a></p>
<p>Note: If you are a plain text editing novice, please make sure you understand just-enough basics to start. Some of the tutorials only are tailored for advanced programmers and their jargon may be confusing at times.</p>
<div class="references">
<h3>References</h3>
<p>Amdahl, G.M., G.A. Blaauw, and F.P. Brooks. 1964. “Architecture of the IBM System/360.” <em>IBM Journal of Research and Development</em> 8 (2): 87101. doi:<a href="http://dx.doi.org/10.1147/rd.82.0087">10.1147/rd.82.0087</a>.</p>
<p>Kirschenbaum, Matthew G. 2012. <em>Mechanisms: new Media and the Forensic Imagination</em>. Cambridge, Mass.; London: MIT Press.</p>
</div>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>The IBM/360 is one of the most sold computer of its time. For a discussion about the 8-bit byte format see <em>Data Format</em> section in <span class="citation">(Amdahl, Blaauw, and Brooks 1964)</span>.<a href="#fnref1"></a></p></li>
<li id="fn2"><p>See section <em>ASCII vs BCD codes</em> in <span class="citation">(Amdahl, Blaauw, and Brooks 1964)</span> and for more information about the history of the ASCII format see the writings of <a href="http://www.bobbemer.com/BYTE.HTM">Bob Bemer</a>.<a href="#fnref2"></a></p></li>
<li id="fn3"><p>Unicode codes are represented with a 'U' prefix before their numerical codes. For a table of all the codes, refer to <a href="http://unicode-table.com/">http://unicode-table.com/</a><a href="#fnref3"></a></p></li>
<li id="fn4"><p>See also UTF-16, UTF-32 and the Unicode <a href="http://www.unicode.org/faq/utf_bom.html">FAQ</a> for disambiguation.<a href="#fnref4"></a></p></li>
<li id="fn5"><p>UTF-8 was conceived by <a href="https://en.wikipedia.org/wiki/Ken_Thompson">Ken Thompson</a> and <a href="https://en.wikipedia.org/wiki/Rob_Pike">Rob Pike</a> on a placemat in a <a href="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">New Jersey diner in 1992</a>.<a href="#fnref5"></a></p></li>
<li id="fn6"><p>Especially on the Internet -- see character encodings historical trend <a href="http://w3techs.com/technologies/history_overview/character_encoding/ms/y">chart</a>.<a href="#fnref6"></a></p></li>
<li id="fn7"><p>Although moving to <a href="https://atom.io">Atom</a> eminently.<a href="#fnref7"></a></p></li>
<li id="fn8"><p>For a list of such editors please refer to <a href="https://en.wikipedia.org/wiki/Comparison_of_text_editors">this article</a>.<a href="#fnref8"></a></p></li>
<li id="fn9"><p>For a comprehensible explanation of these codes, please refer to historical <a href="https://tools.ietf.org/html/rfc20">RFC20</a>. The concept of control codes was introduced by legacy <a href="https://en.wikipedia.org/wiki/Baudot_code">Baudot (1870) and Murray codes (1901)</a> who were standard coding techniques up until the advent of aforementioned EBCDIC and ASCII.<a href="#fnref9"></a></p></li>
</ol>
</div>
</content>
</body>
</html>

View File

@ -68,7 +68,7 @@ A few observations from the examples above are worth noting:
4. UTF-8 encoding of the Vietnamese sentence is _not_ necessarily more compact then Unicode's UCS. In fact we see UTF-8 utilising four bytes to encode some characters (remember that UTF-8 is of variable-length). For example the character 'â' is 'U+00e2' in UCS (two significant bytes) while 'c3a2' in UTF-8 (four significant bytes). A great chart to look at the various codes and encoding can be found here: [http://utf8-chartable.de](http://utf8-chartable.de) 4. UTF-8 encoding of the Vietnamese sentence is _not_ necessarily more compact then Unicode's UCS. In fact we see UTF-8 utilising four bytes to encode some characters (remember that UTF-8 is of variable-length). For example the character 'â' is 'U+00e2' in UCS (two significant bytes) while 'c3a2' in UTF-8 (four significant bytes). A great chart to look at the various codes and encoding can be found here: [http://utf8-chartable.de](http://utf8-chartable.de)
At this point, we should stress the fact that what is inscribed in computing memory is the _encoding_ of text and not its Unicode representation. In other words, UTF-8 is the scheme from which computers inscribe text to physical memory using their read/write mechanisms. What is inscribed physically are single bits following the UTF-8 encodings scheme that gives meaning to 8-bit "packets" as characters. In the example above we have employed the hexadecimal notation to represent such "packets"/data. This is, of course, an kind of abstraction from the physical layer where text is actually inscribed, a convenient way for us humans to decipher and group bits. It nonetheless gives us a feel for the type of "materiality" of text inscribed on and manipulated by computing machine. For a more in depth analysis of physical inscription mechanisms, we refer the forensics work of Kirschenbaum [@kirschenbaum_mechanisms:_2012] on the subject. At this point, we should stress the fact that what is inscribed in computing memory is the _encoding_ of text and not its Unicode representation. In other words, UTF-8 is the scheme from which computers inscribe text to physical memory using their read/write mechanisms. What is inscribed physically are single bits following the UTF-8 encodings scheme that gives meaning to 8-bit "packets" as characters. In the example above we have employed the hexadecimal notation to represent such "packets"/data. This is, of course, an kind of abstraction from the physical layer where text is actually inscribed, a convenient way for us humans to decipher and group bits. It nonetheless gives us a feel for the type of "materiality" of text inscribed on and manipulated by computing machine. For a more in depth analysis of physical inscription mechanisms, we refer the forensics work of Kirschenbaum [@kirschenbaum_mechanisms_2012] on the subject.
#### (Plain) Text Editors #### (Plain) Text Editors

98
Lesson2.html Normal file
View File

@ -0,0 +1,98 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title></title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link rel="stylesheet" href="style/style.css">
</head>
<body>
<content>
<h2 id="cli-or-the-command-line-interface">CLI or the Command Line Interface</h2>
<p>The Command Line Interface is the most common and pervasive interface directly linking fingers typing on a keyboard (text) and the computer (commands). The CLI is a legacy mode of operating computing system which can be traced back to early telegraphic devices. In this lesson we will look at your computer's own CLI and present ways in which you can use it to write, manipulate, analyse and transform text on your own computer system.</p>
<div class="figure">
<img src="img/rkwk101.gif" />
</div>
<h3 id="goals">Goals</h3>
<p>The aim of this lesson is for readers to develop an appreciation of the advantages of using the CLI for certain types of work involving text editing on a computer. As the CLI itself is text based, our goal is to present the history of the CLI and discuss how text-based computer interfaces are still up to this day on of the most important ways to communicate with the computer systems.</p>
<p>The goals of the lesson are:</p>
<ol style="list-style-type: decimal">
<li>Acquire basic knowledge on how to operate the CLI of your own computer.</li>
<li>Acquire just-enough basic CLI vocabulary to be used in future work.</li>
</ol>
<h3 id="how">How</h3>
<p>To access to the Command Line Interface of your computer you need a Command Line Interpreter. Every mdoern Operating System (OS) have such interpreter built-in. In fact, the Command Line Interpreter are legacy systems on most OS (OSX, Windows, Linux, Unix, etc.) because there was a time when interfacing with a computer was solely done typing commands on a terminal.<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> Most computer programmers, even nowadays, use the computer CLI on a daily basis to write and run software and even to debug hardware.</p>
<p>Depending on which OS you are using, accessing its CLI is quite simple:</p>
<ul>
<li><p>On OSX, the &quot;Terminal&quot; application resides in the &quot;Utilities&quot; folder under &quot;Applications&quot;.</p></li>
<li><p>On Windows, you access the CMD.EXE command prompt by typing &quot;cmd&quot; in the search bar of the &quot;Start&quot; menu</p></li>
</ul>
<p>On OSX your CLI should look like:</p>
<div class="figure">
<img src="img/cli0.png" />
</div>
<p>Bingo! Say hello to the computer's CLI!</p>
<p>Now in order to utilise the CLI in a productive way you need to learn a couple of fundamental commands.</p>
<ol style="list-style-type: decimal">
<li><p>&quot;ls&quot; (OSX, Linux, Unix) and &quot;dir&quot; (Windows): lists all files and folders inside the directory your CLI's current working directory.</p></li>
<li><p>&quot;cd&quot;: changes the CLI's current working directory</p></li>
</ol>
<p>Using both commands, you can basically navigate your whole filesystem. It is important at this point to understand the idea of a &quot;working directory&quot; as commands issued on the CLI usually depends in the files present in its &quot;working directory&quot;.</p>
<p>We are now going to illustrate some useful commands (under OSX) that deal with text files and the likes. Hence, we will point our &quot;shell&quot; (another common name for the CLI) to the folder containing the files of this site.</p>
<p>Issuing the &quot;ls&quot; command results in:</p>
<pre><code>Gauthiier:wwwriting gauthiier$ ls
Lesson1.html Lesson2.md Lesson5.md index.html wwwrite.bib
Lesson1.md Lesson3.md Lesson6.md index.md
Lesson2.html Lesson4.md img/ style/</code></pre>
<p>As you can see, directories are denoted with a leading &quot;/&quot; while files are not. Hence, &quot;img/&quot; is a directory and &quot;Lesson1.md&quot; is a file.</p>
<p>It is possible to list the content of directories using &quot;ls&quot; without changing the &quot;working directory&quot;. For example, let's list the content of the directory &quot;style/&quot;:</p>
<pre><code>Gauthiier:wwwriting gauthiier$ ls style/
style.css template.html5</code></pre>
<p>Ok now let's play with file (meta)data.</p>
<p>The command &quot;file&quot; can tell you what type a file is. For example:</p>
<pre><code>Gauthiier:wwwriting gauthiier$ file wwwrite.bib
wwwrite.bib: UTF-8 Unicode English text, with very long lines</code></pre>
<p>The command &quot;wc&quot; (word count) returns information about the content of the file:</p>
<pre><code>Gauthiier:wwwriting gauthiier$ wc Lesson1.md
118 1995 13100 Lesson1.md</code></pre>
<p>where (1) the first column indicates the number of lines, (2) the second column the number of words and (3) the third column the number of bytes.</p>
<p>The command &quot;grep&quot; matches words to files that contain them (search). &quot;grep&quot; can search a specific file or lookup files recursively in a directory.</p>
<p>Let's look up the word: wwword.</p>
<pre><code>Gauthiier:wwwriting gauthiier$ grep &quot;wwword&quot; Lesson2.md
Let&#39;s look up the word: wwword.</code></pre>
<p>&quot;grep&quot; gives the exact line-text where the word in found in the file.</p>
<p>Similarly we can look up the number of times a word appears in each files ending with .md or .html in the current working directory:</p>
<pre><code>Gauthiier:wwwriting gauthiier$ grep -rc --include=*.{md,html} &quot;wwword&quot; .
./index.html:0
./index.md:0
./Lesson1.html:0
./Lesson1.md:0
./Lesson2.html:3
./Lesson2.md:3
./Lesson3.md:0
./Lesson4.md:0
./Lesson5.md:0
./Lesson6.md:0</code></pre>
<p>As you can see the command &quot;grep&quot; can be instructed, using certain command parameters, to perform quite advanced searches. In general all commands from the CLI have specific parameters that can be set to specify a specific ways in which to conduct their operations.</p>
<p>It is, of course, out of the scope of this lesson to present all possible commands one can use to manipulate files and directories. For the remaining lessons, you only have to remember how to point your CLI to a specific working directory.</p>
<h3 id="extra">Extra</h3>
<p><a href="http://acad.coloradocollege.edu/dept/PC/sciCompLab/UnixTutorial/">OSX Unix Tutorial for Beginners</a></p>
<p><a href="http://www.codejacked.com/a-beginners-guide-to-the-command-prompt">A beginners guide to the Command Prompt (Windows)</a></p>
<p>A list of all commands: <a href="http://ss64.com/osx/">OSX</a> - <a href="http://ss64.com/nt/">Windows</a></p>
<div class="references">
</div>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Something that is easily forgotten in the era of ubiquitous computer Graphical User Interface (GUI). The CLI is to some degree reminescent of the Teletype (TTY).<a href="#fnref1"></a></p></li>
</ol>
</div>
</content>
</body>
</html>

View File

@ -13,49 +13,101 @@ The aim of this lesson is for readers to develop an appreciation of the advantag
The goals of the lesson are: The goals of the lesson are:
1. Understand the historical precedents leading to the development of modern CLI. 1. Acquire basic knowledge on how to operate the CLI of your own computer.
2. Acquire basic knowledge on how to operate the CLI of your own computer. 2. Acquire just-enough basic CLI vocabulary to be used in future work.
3. Develop the ability to recognize where and when the CLI is a better alternative than other types of computer interfaces (mainly graphical).
4. Develop a critical perspective on why the CLI matters in some situation and when it does not.
5. Acquire just-enough basic CLI vocabulary to be used in future (research) work.
### How ### How
Command Line Interface ---> Command Line Interpreter (shell) To access to the Command Line Interface of your computer you need a Command Line Interpreter. Every mdoern Operating System (OS) have such interpreter built-in. In fact, the Command Line Interpreter are legacy systems on most OS (OSX, Windows, Linux, Unix, etc.) because there was a time when interfacing with a computer was solely done typing commands on a terminal.[^1] Most computer programmers, even nowadays, use the computer CLI on a daily basis to write and run software and even to debug hardware.
Prompt Depending on which OS you are using, accessing its CLI is quite simple:
Commands - On OSX, the "Terminal" application resides in the "Utilities" folder under "Applications".
<!-- - On Windows, you access the CMD.EXE command prompt by typing "cmd" in the search bar of the "Start" menu
<ls>
<mv> On OSX your CLI should look like:
<cp> ![](img/cli0.png)
<file> Bingo! Say hello to the computer's CLI!
<fmt> <fold> Now in order to utilise the CLI in a productive way you need to learn a couple of fundamental commands.
<iconv> 1. "ls" (OSX, Linux, Unix) and "dir" (Windows): lists all files and folders inside the directory your CLI's current working directory.
2. "cd": changes the CLI's current working directory
<wc> + <nl> Using both commands, you can basically navigate your whole filesystem. It is important at this point to understand the idea of a "working directory" as commands issued on the CLI usually depends in the files present in its "working directory".
<grep> We are now going to illustrate some useful commands (under OSX) that deal with text files and the likes. Hence, we will point our "shell" (another common name for the CLI) to the folder containing the files of this site.
Issuing the "ls" command results in:
Gauthiier:wwwriting gauthiier$ ls
Lesson1.html Lesson2.md Lesson5.md index.html wwwrite.bib
Lesson1.md Lesson3.md Lesson6.md index.md
Lesson2.html Lesson4.md img/ style/
As you can see, directories are denoted with a leading "/" while files are not. Hence, "img/" is a directory and "Lesson1.md" is a file.
It is possible to list the content of directories using "ls" without changing the "working directory". For example, let's list the content of the directory "style/":
Gauthiier:wwwriting gauthiier$ ls style/
style.css template.html5
Ok now let's play with file (meta)data.
The command "file" can tell you what type a file is. For example:
Gauthiier:wwwriting gauthiier$ file wwwrite.bib
wwwrite.bib: UTF-8 Unicode English text, with very long lines
The command "wc" (word count) returns information about the content of the file:
Gauthiier:wwwriting gauthiier$ wc Lesson1.md
118 1995 13100 Lesson1.md
where (1) the first column indicates the number of lines, (2) the second column the number of words and (3) the third column the number of bytes.
The command "grep" matches words to files that contain them (search). "grep" can search a specific file or lookup files recursively in a directory.
Let's look up the word: wwword.
Gauthiier:wwwriting gauthiier$ grep "wwword" Lesson2.md
Let's look up the word: wwword.
"grep" gives the exact line-text where the word in found in the file.
Similarly we can look up the number of times a word appears in each files ending with .md or .html in the current working directory:
Gauthiier:wwwriting gauthiier$ grep -rc --include=*.{md,html} "wwword" .
./index.html:0
./index.md:0
./Lesson1.html:0
./Lesson1.md:0
./Lesson2.html:3
./Lesson2.md:3
./Lesson3.md:0
./Lesson4.md:0
./Lesson5.md:0
./Lesson6.md:0
As you can see the command "grep" can be instructed, using certain command parameters, to perform quite advanced searches. In general all commands from the CLI have specific parameters that can be set to specify a specific ways in which to conduct their operations.
It is, of course, out of the scope of this lesson to present all possible commands one can use to manipulate files and directories. For the remaining lessons, you only have to remember how to point your CLI to a specific working directory.
Results
### Extra ### Extra
<banner> [OSX Unix Tutorial for Beginners](http://acad.coloradocollege.edu/dept/PC/sciCompLab/UnixTutorial/)
-->
[^1]: Something that is easily forgotten in the era of ubiquitous computer screens. For a discussion on the topic see Nick Montfort's essay [Continuous Paper: The Early Materiality and Workings of Electronic Literature](http://nickm.com/writing/essays/continuous_paper_mla.html). [A beginners guide to the Command Prompt (Windows)](http://www.codejacked.com/a-beginners-guide-to-the-command-prompt)
[^2]: [Datapoint 3300 brochure](http://archive.computerhistory.org/resources/text/Computer_Terminal_Corporation/ComputerTerminalCorporation.Datapoint3300.1969.102646159.pdf).
[^3]: In fact Victor Poor from CTC devised the architecture and instruction set. The instruction set is, to this day, found (a revised version of course) on Intel's flagship x86 architecture, the most pervasive microprocessor architecture of all time (typing this text was processed by a x86 microprocessor). A list of all commands: [OSX](http://ss64.com/osx/) - [Windows](http://ss64.com/nt/)
[^4]: And subsequently the legacy 8-bit Intel 8080, 16-bit Intel 8086 and the whole x86 family of microprocessors. For all the details of the development of the Intel 8008, please refer to [this document](http://archive.computerhistory.org/resources/access/text/2012/07/102657982-05-01-acc.pdf).
[^1]: Something that is easily forgotten in the era of ubiquitous computer Graphical User Interface (GUI). The CLI is to some degree reminescent of the Teletype (TTY).

BIN
img/cli0.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

View File

@ -33,9 +33,9 @@
<h2 id="scheme">Scheme</h2> <h2 id="scheme">Scheme</h2>
<p>The current site is segmented in six lessons covering the (very) basics of writing academic texts on a computer.<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a> The overall composition of these lessons is by no mean derived from obscure &quot;Principles&quot; or (even worst) &quot;Best-practices&quot; but rather stand as loosely coupled set of lessons that can be traversed all together (or not) in a short period of time. The site is neither a manual nor a manifesto and should be seen as a starting point into further developing (creative) technics and methods in text writing.</p> <p>The current site is segmented in six lessons covering the (very) basics of writing academic texts on a computer.<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a> The overall composition of these lessons is by no mean derived from obscure &quot;Principles&quot; or (even worst) &quot;Best-practices&quot; but rather stand as loosely coupled set of lessons that can be traversed all together (or not) in a short period of time. The site is neither a manual nor a manifesto and should be seen as a starting point into further developing (creative) technics and methods in text writing.</p>
<ul> <ul>
<li><p><a href="/">Lesson 1: Text Encoding</a></p> <li><p><a href="Lesson1.html">Lesson 1: Text Encoding</a></p>
<p>Covers fundamentals of representation of text looking up how text is encoded/decoded as data.</p></li> <p>Covers fundamentals of representation of text looking up how text is encoded/decoded as data.</p></li>
<li><p><a href="/">Lesson 2: CLI or the Command Line Interface</a></p> <li><p><a href="Lesson2.html">Lesson 2: CLI or the Command Line Interface</a></p>
<p>Presents how one can manipulate files and issue computing commands using what is known as a terminal.</p></li> <p>Presents how one can manipulate files and issue computing commands using what is known as a terminal.</p></li>
<li><p><a href="/">Lesson 3: Markup / Markdown</a></p> <li><p><a href="/">Lesson 3: Markup / Markdown</a></p>
<p>Introduces a markup language (<a href="http://daringfireball.net/projects/markdown/">Markdown</a>) that is used to format and annotate text.</p></li> <p>Introduces a markup language (<a href="http://daringfireball.net/projects/markdown/">Markdown</a>) that is used to format and annotate text.</p></li>

View File

@ -28,11 +28,11 @@ The idea in compiling this site-lesson is two fold:
The current site is segmented in six lessons covering the (very) basics of writing academic texts on a computer.[^3] The overall composition of these lessons is by no mean derived from obscure "Principles" or (even worst) "Best-practices" but rather stand as loosely coupled set of lessons that can be traversed all together (or not) in a short period of time. The site is neither a manual nor a manifesto and should be seen as a starting point into further developing (creative) technics and methods in text writing. The current site is segmented in six lessons covering the (very) basics of writing academic texts on a computer.[^3] The overall composition of these lessons is by no mean derived from obscure "Principles" or (even worst) "Best-practices" but rather stand as loosely coupled set of lessons that can be traversed all together (or not) in a short period of time. The site is neither a manual nor a manifesto and should be seen as a starting point into further developing (creative) technics and methods in text writing.
* [Lesson 1: Text Encoding](/) * [Lesson 1: Text Encoding](Lesson1.html)
Covers fundamentals of representation of text looking up how text is encoded/decoded as data. Covers fundamentals of representation of text looking up how text is encoded/decoded as data.
* [Lesson 2: CLI or the Command Line Interface](/) * [Lesson 2: CLI or the Command Line Interface](Lesson2.html)
Presents how one can manipulate files and issue computing commands using what is known as a terminal. Presents how one can manipulate files and issue computing commands using what is known as a terminal.
@ -46,7 +46,7 @@ The current site is segmented in six lessons covering the (very) basics of writi
* [Lesson 5: Bibliographer](/) * [Lesson 5: Bibliographer](/)
Looks at how to compile and maintain a bibliography using open source software ([Zotero](https://www.zotero.org)) and export references into a document Looks at how to compile and maintain a bibliography using open source software ([Zotero](https://www.zotero.org)) and export references into a document.
* [Lesson 6: Styling](/) * [Lesson 6: Styling](/)